Discussion:
[40tude] Edit -> Find regular expression?
(too old to reply)
mike
2021-01-19 04:33:52 UTC
Permalink
When searching for a character to replace in 40Tude Dialog, is there a way
to search for more than one disjoint character at a time?

As an example I might want to replace all starting left side curly
doublequotes and all ending right side curly doublequotes with straight
doublequotes.

Can the 40Tude Dialog "Edit -> Find" be made to see both types of opening
and ending doublequotes at the same time to replace both types with
straight quotes in a single Find command?
VanguardLH
2021-01-19 08:41:35 UTC
Permalink
Post by mike
When searching for a character to replace in 40Tude Dialog, is there a
way to search for more than one disjoint character at a time?
As an example I might want to replace all starting left side curly
doublequotes and all ending right side curly doublequotes with
straight doublequotes.
Can the 40Tude Dialog "Edit -> Find" be made to see both types of
opening and ending doublequotes at the same time to replace both
types with straight quotes in a single Find command?
Why do you think Dialog's Edit -> Find function has a replace operation?
It just finds. It does not replace. To replace means another field
would have to be included to show with just what to replace the found
substring. Edit -> Find has only one input field: "Text to find".
There is no "[Text to] replace with" field.

Or do you mean you'll use the Find function, and then manually edit the
found string? You could do a regex search using:

“.*”


The dot (.) means any character, and the * means zero, or more, of any
character. The above would find:

“”
“X”
“abc def. Right you are!”

However, regex rules are plain text, so you won't be able to add the
curly quotes to a rule. You'll need to use the encoded or numeric value
for the characters. My recollection is regex in Dialog uses the PCRE
variant.

https://www.regular-expressions.info/unicode.html

You would have to replace the curly quotes with their Unicode
equivalents. From the above article, PCRE does not handle \uFFFF to
specify the hexidecimal number for a Unicode character. Instead you use
\x{FFFF} (include the curly braces).

201C = left curly quote character
201D = right curly quote character

So, my guess is you would use the following regex to find (not replace)
any string of zero, or more, characters where the order is left curly,
followed by zero or more characters, followed by a right curly, and
probably looks like:

\x{201C}.*\x{201D}

Since all the headers are ASCII characters (I think there is some
encoding prefix identifier for the Subject header, but the string itself
is all ASCII), you must be trying to find the curly quoted strings in
the body of message. Make sure when you use Edit -> Find that you
select the "Article body pane" tab in the Find dialog.

I don't know of any posts that use the double curly quote characters,
but as a test I did a Find on the straight double quote characters, like
the "Edit -> Find" string in your post. I opened the Find dialog,
selected the "Article body pane" tab, and searched on:

\x{0022}.*\x{0022}

0022 is the hex Unicode value of the double straight quote ("). It
found your string in your post. I could've used " to make it easier,
but I wanted to test using the Unicode encoded format to specify the
straight double quote character.
mike
2021-01-21 00:07:41 UTC
Permalink
VanguardLH
2021-01-21 00:53:13 UTC
Permalink
Post by VanguardLH
Since all the headers are ASCII characters (I think there is some
encoding prefix identifier for the Subject header, but the string itself
is all ASCII), you must be trying to find the curly quoted strings in
the body of message. Make sure when you use Edit -> Find that you
select the "Article body pane" tab in the Find dialog.
Yes. Body. When I cut and paste from various sources (usually web pages) I
want consistency in the quoting so that every cut and paste uses the
simplest consistent type of straight quotes and single dashes (why they
think we need even bigger dashes is lost on me).
Here's an example of something I might cut and paste into the body
https://www.lifewire.com/typing-quotes-apostrophes-and-primes-1074104
https://typographyforlawyers.com/straight-and-curly-quotes.html
https://usefulangle.com/post/217/html-curly-quotes
https://www.enotes.com/homework-help/what-are-two-quotes-from-curley-in-of-mice-and-men-302038
Post by VanguardLH
Why do you think Dialog's Edit -> Find function has a replace operation?
It just finds. It does not replace. To replace means another field
would have to be included to show with just what to replace the found
substring. Edit -> Find has only one input field: "Text to find".
There is no "[Text to] replace with" field.
Or do you mean you'll use the Find function, and then manually edit the
found string?
My mistake for not being clear that I press control F to find and then I
repeatedly press F3 as many times as needed to find more, and then as it is
finding these things, I replace manually by typing the straight doublequote
(or the straight singlequote) to replace the curly characters (and then move
on using F3 until I get to the end).
Post by VanguardLH
´.*¡
The dot (.) means any character, and the * means zero, or more, of any
´¡
´X¡
´abc def. Right you are!¡
However, regex rules are plain text, so you won't be able to add the
curly quotes to a rule. You'll need to use the encoded or numeric value
for the characters. My recollection is regex in Dialog uses the PCRE
variant.
https://www.regular-expressions.info/unicode.html
You would have to replace the curly quotes with their Unicode
equivalents. From the above article, PCRE does not handle \uFFFF to
specify the hexidecimal number for a Unicode character. Instead you use
\x{FFFF} (include the curly braces).
201C = left curly quote character
201D = right curly quote character
So, my guess is you would use the following regex to find (not replace)
any string of zero, or more, characters where the order is left curly,
followed by zero or more characters, followed by a right curly, and
\x{201C}.*\x{201D}
When I typed "control + f" and then pasted that "\x{201C}.*\x{201D}"
(without the quotes), it didn't find the curly quotes.
What's the key sequence to enter those special characters?
Post by VanguardLH
I don't know of any posts that use the double curly quote characters,
but as a test I did a Find on the straight double quote characters, like
the "Edit -> Find" string in your post. I opened the Find dialog,
\x{0022}.*\x{0022}
I pasted something from here
https://typographyforlawyers.com/straight-and-curly-quotes.html
and then I tried to type what you suggested but I must be missing a special
character as I cut and pasted exactly what you wrote above and it found only
what you wrote above but not what I pasted.
control
f
(then I pasted)\x{0022}.*\x{0022}
(then I pressed)OK
Dialog said - Search term "\x{0022}.*\x{0022}" not found.
Post by VanguardLH
0022 is the hex Unicode value of the double straight quote ("). It
found your string in your post. I could've used " to make it easier,
but I wanted to test using the Unicode encoded format to specify the
straight double quote character.
Can you just let me know HOW to type the \x{0022} stuff?
it but it found only itself. I must have missed a critical step.
When you hit F to bring up the Find dialog, and made sure to pick the
"Article body pane" tab, did you also make sure to select the "Regular
expressions" option? Without that option, you would be searching on the
ASCII string of \x{201C} instead of a searching by a regex specifying an
escaped x followed by the braced numeric value for the Unicode char.

F (to show Find dialog).
Select "Article body" tab.
Select the "Regular expressions" option.
Enter "\x{201C}.*\x{201D}" (sans quotes) in the "Text to find" field.
Pick where to search: selected groups, all groups, selected body.

The search proceeds forward, not from the top. If you are past the
group or message with the doubled quotes, the search finds the *next*
article, if any, that has those characters. The search does not loop
around back to the top after reaching the bottom of the list.
mike
2021-01-21 13:15:01 UTC
Permalink
mike
2021-01-21 13:28:47 UTC
Permalink
(1) In a browser I visit this page & copy the Demo sentence to my clipboard
https://www.lifewire.com/typing-quotes-apostrophes-and-primes-1074104
I made a mistake on the URL that I copied the "Demo" sentence from!

The "Demo" sentence came from this web page.
https://usefulangle.com/post/217/html-curly-quotes

I'm not a programmer but do you think a Dialog script might be able to
substitute curly quotes to straight quotes automatically just before sending
the message to the nntp server?
Bernd Rose
2021-01-21 16:56:59 UTC
Permalink
Post by mike
I'm not a programmer but do you think a Dialog script might be able to
substitute curly quotes to straight quotes automatically just before sending
the message to the nntp server?
Shouldn't be too hard to adjust this script:

http://web.archive.org/web/20120127150224/http://dialog.datalist.org/scripts/ScriptreplaceUmlaut.html

/If/ you have other conversation scripts installed, that deal with charset
manipulation, you may need a more sophisticated approach. Then I suggest
you to take the boxquote script as basis:

http://4d.vollmeier.at/scripte/ereignisscripte/onbeforesendingmessage/boxquote.html

HTH.
Bernd
mike
2021-01-21 22:44:19 UTC
Permalink
mike
2021-01-22 06:50:16 UTC
Permalink
Bernd Rose
2021-01-23 08:52:02 UTC
Permalink
On Fri, 22nd Jan 2021 12:20:16 +0530, mike wrote:

Hm. Where to start? In your message Message-ID: <rud3db$ggn$***@solani.org>
you used a function name that differs from the program name. (Maybe a
copy/paste error?) But this way, the main function will never be called.

Using OnBeforeSavingMessage will /not/ accomplish, what you have in
mind, because it only fires when saving /incoming/ message to the
Dialog database. It does /not/ work when saving drafts. (Drafts are
saved from editor window as you type and are neither checked nor
altered when saving them without sending.)

You get compile error for the (seemingly identical) OnBeforeSavingMessage
script (compared to OnBeforeSendingMessage), because the latter program
is written to be able to be canceled, while the latter is not. Instead
of using a main function with return value as in OnBeforeSendingMessage,
you need to use a main /procedure/ with OnBeforeSendingMessage:

procedure OnBeforeSavingMessage(var Message: TStringlist; Servername:string; IsEmail: boolean);

And this procedure must /not/ have a line like:
result:=true;

The number 2147483647 is the maximum string length for the copy
function. It prevents buffer overflow. Larger strings will be truncated.


Back to your original question:

What has to be replaced depends on the encoding of your message. The
replacement occurs right before sending. Therefore it is done on the raw
outgoing message (including all headers!!).

To replace the German umlaut "ä" in several encodings, you'd need to
adjust the search/replace strings with sth. like:

s:=stringreplace(s,'ä','ae');
s:=stringreplace(s,'=E4','ae');
s:=StringReplace(s,'=C3=A4','ae');

Be aware, that this kind of alteration on the raw message must only be
used for characters or strings, which may /not/ be found inside the
header texts!! Else, anything can happen: From posting invalid messages
to stray messages reaching the wrong recipient!

On another thought: Maybe you can use the LastMessageCheck script to see,
how the characters are encoded in your outgoing messages. This script
opens a preview of the raw message right before sending (if it included
correctly). In a case of error, you can cancel the sending process. Be
aware, though, that everything written in this message will be lost on
cancel. So you need to start typing from scratch. (Or keep the whole
message in clipboard right before sending.) The (German) download site
for the LastMessageCheck script is:

http://4d.vollmeier.at/scripte/ereignisscripte/onbeforesendingmessage/lastmessagecheck.html

Hopefully, I have addressed all matters...
Bernd

VanguardLH
2021-01-21 22:00:45 UTC
Permalink
But since I'm not in an editable window, I can't fix anything using that
"long" Find Box. I can only fix text when using the "short" Find Box.
Didn't know you were in compose mode when using the short Find dialog.
Yeah, that doesn't let you specify using regex to let you use Unicode to
find characters in your compose window.

Seems if you're pasting into your compose window, that you could first
paste into something else, like a word processor, where you could
specify the string to find and what to replace it with, copy the result,
and paste that into your compose window.
mike
2021-01-21 23:40:27 UTC
Permalink
Post by VanguardLH
Seems if you're pasting into your compose window, that you could first
paste into something else, like a word processor, where you could
specify the string to find and what to replace it with, copy the result,
and paste that into your compose window.
This is a test of OnBeforeSendingMessage with ä, ö, ü

If I can't get the script Bernd suggested to work, then I have to manually
clean out the body as you said, by pasting into my gVIM editor on Windows.
:%s/[control+-q+-147,control+-q+-148]/"/g
That will replace all instances of opening or closing curly doublequotes
with straight doublequotes.

But I was hoping this program that Bernd Rose suggested would work
(where I replaced my existing OnBeforeSendingMessage in its entirety)
with this which Bernd had suggested.
http://web.archive.org/web/20120127150224/http://dialog.datalist.org/scripts/ScriptreplaceUmlaut.html

But it doesn't work if you still see the special characters I'm pasting
below from https://www.studying-in-germany.org/german-umlauts/

This is a test of OnBeforeSendingMessage with ä, ö, ü

program OnBeforeSendingMessage;

function StringReplace(S, OldPattern, NewPattern: string): string;
var
SearchStr, Patt, NewStr: string;
Offset: Integer;
begin
SearchStr := S;
Patt := OldPattern;
NewStr := S;
Result := '';
while SearchStr <> '' do
begin
Offset := AnsiPos(Patt, SearchStr);
if Offset = 0 then
begin
Result := Result + NewStr;
Break;
end;
Result := Result + Copy(NewStr, 1, Offset - 1) + NewPattern;
NewStr := Copy(NewStr, Offset + Length(OldPattern), 2147483647);
SearchStr := Copy(SearchStr, Offset + Length(Patt), 2147483647);
end;
end;

function OnBeforeSending(var Message: TStringlist; Servername: string;
IsEmail: boolean):boolean;
var s:string;
begin
result:=true;
s:=message.text;
s:=StringReplace(s,'ü','ue');
s:=stringreplace(s,'ö','oe');
s:=stringreplace(s,'ä','ae');
s:=stringreplace(s,'Ü','Ue');
s:=stringreplace(s,'Ö','Oe');
s:=stringreplace(s,'Ä','Ae');
s:=stringreplace(s,'ß','ss');
message.text:=s;
end;

begin
end.
Loading...