Discussion:
Using logical opeartors in Dialog Filters
(too old to reply)
Sqwertz
2021-01-18 20:05:30 UTC
Permalink
Can somebody remind me how to use logical operators in Dialog
filters? Something such as:

!delete From asshole and message-id assnews.com

-sw
VanguardLH
2021-01-18 23:48:47 UTC
Permalink
Post by Sqwertz
Can somebody remind me how to use logical operators in Dialog
!delete From asshole and message-id assnews.com
-sw
!delete From {<criteria1>} -@Header:{<header>: <notcriteria2>}
+@Header:{<header>: <addcriteria3>}

The +@ means to include matching criteria from another header, and -@
means to exclude matching criteria from another header. You can use a
header name (or use Header: if not one of the pre-defined ones) to
preface the first conditional. If you use a pre-defined header name,
add a space between its name and the criteria inside the braces. For
additional header criteria, you can use the pre-defined header names.
If you're testing on a non-predefined header name, you have to use
+@Header:{^headername: ...}. That is, it's not a standard overview
header, so you have to use Header and in the criteria specify the header
name.

From {joker}
+@From:{joker}
+@Header:(^From: joker}


As I was told, the +@ and -@ syntax are taken from the Hamster NNTP
server's syntax. Alas, you can't use From, Subject, Message-ID, or
other pre-defined header names when using inclusion or exclusion, and
instead have to use +@Header:{headername: criteria}. You need to not
only use +@ or -@ followed by "Header" (to denote the include/exclude is
looking for a header), but also need to add the colon (and no space)
unlike using the pre-defined headers where you give the name and then a
space character. The header name and criteria must be enclosed in
braces. So, for example, you could use +@Header:{^From: ...} instead of
From criteria (where criteria is optionally braced, but I include them
to better delineate the criteria, especially if more headers are added.

To me, I remember +@ as AND and -@ as AND NOT. To do OR'ing, just
define multiple rules, one for each condition. Note that Dialog does 2
passes when filtering. The first pass only exercises those rules that
use the pre-defined headers, like From, Subject, Message-ID, etc. Any
message in the 1st pass gets flagged. For any non-flagged messages, it
then runs through its 2nd pass to test on all the other headers. This
way, it first tests on the overview headers that are always available
when messages are downloaded. On the 2nd pass, it tests on non-overview
headers on the messages not flagged in the 1st pass. If you have rules
that test on both overview and non-overview headers, they have to wait
until the 2nd pass.

Overview headers (available when message is downloaded):
From
Subject
Date
Message-ID
Newsgroups
Organization
Lines
References
Xref
(those returned by sending LIST OVERVIEW.FMT command to server)
Non-overview headers (only available when download complete message):
Path
Content-Type
NNTP-Posting-Host
X-headers
User-Agent
and lots more (anything not an overview header)

What constitutes an overview header may not be true across different
NNTP server. Since I won't know which headers are considered overview
headers by different NNTP servers, I configure Dialog to download the
complete message. That gives my ALL headers and the body, not just
whatever are the overview headers at a particular server.

For your example:

!delete From {^asshole\s*<} +@Message-ID:{<(\S|_)+@((\S|_)+\.)assnews\.com>$}
or
!delete From {(\W|_)*asshole(\W|_)*<} +@Header:{^Message-ID: <(\S|_)+@((\S|_)+\.)assnews\.com>$}

Dialog provide a regex tester where you can enter the regex string and a
sample string to see if the regex matches. Unless you are specifying
the header name in the search criteria, don't include it in the test
regex string since you're just testing the regex, not that the string
came from a particular header.

I had to assume the "asshole" string started at column one, and not as a
substring, like "HassleHoleman". If not, you'll have to figure out how
to delimite the "asshole" substring to prevent false positives. For
example, if they use asshole, _/asshole/_, or something that can be
delimited by non-word characters around the asshole substring, I'd
probably use "(\W|_)*asshole(\W|_)\s*<" to look for non-word characters
around the "asshole" substring. I also assumed "asshole" was part of
the poster's name (in the comment field) and not inside the address
field (between the angle brackets); that is, "asshole <***@ddd.com>"
and not "tipsy <***@invalid.com>".

The ^ anchors the string identifying the header name (Message-ID) to
column 1 where all headers must start. Headers are supposed to have one
space to delimit the header name from its value, but I've seen where
more than one space was used hence the use of \s+. The (\W|_)+ means
means to find one, or more, non-space characters after the mandatory
space delimiter for the header name, after the < character that starts
the MID's value, and before the @ character (the left token cannot be
blank or missing), and that "assnews.com" was a domain in the right
token of the MID. The period for .com is escaped (backslashed) to make
sure to match on the period character instead of intrepreted as a
masking character that means any 1 character in that position (which
could be something other than just a period). The angle brackets are
included because they are included in the MID value. The $ anchors the
string to the end of the line, so "assnews.com> is at the end, not
somewhere before the end. Since you don't mention if just the domain is
specified in the right token of the MID value or maybe a hostname is
included, I added the ((\W|_)\.)* to allow [sub]hosts to be included,
like xxx.assnews.com, abc.xxx.assnews.com, or just assnews.com.

The above is off the top of my head. It's what I remember. Helps to
know just exactly what is the header(s) and string values on which you
want to define the filters.

The problem when testing on headers, especially when you have to use the
generic +@Header:{...} syntax, is the search doesn't stop at the end of
the physical line. That's because headers can span multiple physical
lines. The header line has the header name start in column 1, followed
by a colon and space, and then its value. The value can continue on
another line by adding a minimum of 1 leading space character.
Specifying the header name just means what to look at starting in column
1, not when the header ends. The result is the search can continue
looking forward past the matching header, across subsequent headers, and
even into the body of the message. You won't know what is the next
header to add it to the string to make sure the search stops at the next
header. Maybe there's a way in regex to stop the search on the first
line it encounters after the matching header line that has a space in
column 1. The pre-defined headers (From, Subject, etc) seem to know to
look only within a header value even if it spans multiple lines. The
generic +@Header syntax doesn't know when to stop, only when to start.
As a consequence, I've had +@Header:{header: value} keep searching and
finding the substring of value in the body of the message.

The only way I know of, so far, to limit a header string search is to
specify the (?-s) prefix switch on the criteria string. That says to
only scan one physical line. However, that means it won't scan a
multi-line header's value to the 2nd continuation line, or later.

For example, Altopia allows its users to insert whatever they want as
the string for the injection node (identifying from which server a post
originated). The /rule/ is that those who inject are /supposed/ to
append alt.net to their own string, but they don't force the requirement
by using a scanner to ensure posts submitted to alt.net actually end
with that string in the PATH header. Because Altopia allows its users
to lie regarding what is the injection node, I don't accept any posts
submitted from there or sent through there (because the user can lie by
making a multi-host inject node string that makes it look like the post
was submitted to a prior server).

Message-ID {<\S+@(\S+\.)?alt\.net>} +@Header:{(?-s)^Path: \S+\balt\.net(!?\.?POSTED)?(!not-for-mail)?$}

That looks for Altopia's server adding its own MID header (only if the
poster's client didn't add one), and if the injection node in the PATH
header shows it came from Altopia, but the (?-s) switch ensures the scan
is only in one physical line. For this header, I think PATH can only
span one physical line, so the switch may not be necessary. However,
Dialog can only scan overview headers by default -- unless you configure
it to download complete messages (all headers + body) where all headers
means both overview and non-overview headers; else, Dialog (and other
NNTP clients) can only test on overview headers.

On overview headers, the NNTP clients seem to know they should continue
a search past the matching header line for its continuation lines, and
then stop when the next header (non-blank string starting in column 1
and followed by a colon and space). For non-overview headers,
especially when having to use +@Header:{...} or -@Header:{...}, Dialog
(and maybe others) don't know to only include continuation lines, and
NOT keep scanning into further headers or into the body. For a rule of
"From {criteria}", the scan ends at the last continuation line, if any.
For a rule of +@Header:{^From: criteria}, the scan starts at the From
header line, and proceeds through every header thereafter and into the
body of the message; i.e., doesn't know when to stop, and why I have to
something use the (?-s) switch (aka modifier) before the regex criteria.
I don't know of a way to stop the generic header scan when the next (and
unknown) header starts, or the blank line (just a newline) is
encountered that delimits the header and body sections.

Regex is one of those languages that keeps burgeoning every time you
want to learn some more. When you think you can't do something, someone
more expert chuckles and might show you how. So far, I've use (?-s) to
just make sure the scan is from the start of the matching header line
(or start of its value string) to the end of the line, and not go
further because it won't stop at the logical end of the header.

As I recall, there used to be a poster named BernD, or similar, that was
far more expert on how Dialog works, including how to write the Pascal
code for the scripting functions. While I got some scripts from an
online library, and adapted some to my own use, the context or scope of
those scripts is not defined anywhere, so they can use variables that
defined outside the script that you would have no means of discovering
without digging into the Delphi code for the program. I think it was
him that told me the syntax for Dialog's rules are similar to those the
Hamster server uses. Its site was a subpage under http://home.arcor.de,
but it's gone, and I don't know if there is an active site for
supporting the Hamster program. Wikipedia says the home site is at:

http://www.tglsoft.de/freeware_hamster.html

You'll need to run it through Google Translate since the site is written
in the author's native language of German. I didn't see a link to
documentation for Hamster. Maybe it's lurking elsewhere.
Frank Slootweg
2021-01-19 13:56:54 UTC
Permalink
VanguardLH <***@nguard.lh> wrote:
[...]
Post by VanguardLH
As I recall, there used to be a poster named BernD, or similar, that was
far more expert on how Dialog works, including how to write the Pascal
code for the scripting functions. While I got some scripts from an
online library, and adapted some to my own use, the context or scope of
those scripts is not defined anywhere, so they can use variables that
defined outside the script that you would have no means of discovering
without digging into the Delphi code for the program. I think it was
him that told me the syntax for Dialog's rules are similar to those the
Hamster server uses. Its site was a subpage under http://home.arcor.de,
but it's gone, and I don't know if there is an active site for
http://www.tglsoft.de/freeware_hamster.html
You'll need to run it through Google Translate since the site is written
in the author's native language of German. I didn't see a link to
documentation for Hamster. Maybe it's lurking elsewhere.
The downloads on that page are .zip files, which include the English
Help file Hamster_en.hlp. (Guess which local personal newsserver I'm
running!? :-))

Of course you'll need the Windows Help program to use such a file and
- dependent on your Windows version and (32/64) bitness - that maybe
'difficult' or impossible [1].

For my 64-bit Windows 8.1 system, I used this page to get the correct
version of WinHlp32.exe:

'Error opening Help in Windows-based programs: "Feature not included" or
"Help not supported"'

<https://support.microsoft.com/en-us/help/917607/feature-not-included-help-not-supported-error-opening-help-windows>

[1] N.B. The webpage says "The Windows Help program is not supported in
Windows 10". It does not say it does not *work* on Windows 10.
VanguardLH
2021-01-19 17:50:10 UTC
Permalink
Post by Frank Slootweg
Post by VanguardLH
As I recall, there used to be a poster named BernD, or similar, that
was far more expert on how Dialog works, including how to write the
Pascal code for the scripting functions. ...
I think it was him that told me the syntax for Dialog's rules are
similar to those the Hamster server uses. Its site was a subpage
under http://home.arcor.de, but it's gone, and I don't know if there
is an active site for supporting the Hamster program. Wikipedia
http://www.tglsoft.de/freeware_hamster.html
... I didn't see a link to documentation for Hamster. Maybe it's
lurking elsewhere.
The downloads on that page are .zip files, which include the English
Help file Hamster_en.hlp. (Guess which local personal newsserver I'm
running!? :-))
Of course you'll need the Windows Help program to use such a file and
- dependent on your Windows version and (32/64) bitness - that maybe
'difficult' or impossible [1].
For my 64-bit Windows 8.1 system, I used this page to get the correct
'Error opening Help in Windows-based programs: "Feature not included" or
"Help not supported"'
<https://support.microsoft.com/en-us/help/917607/feature-not-included-help-not-supported-error-opening-help-windows>
[1] N.B. The webpage says "The Windows Help program is not supported in
Windows 10". It does not say it does not *work* on Windows 10.
I'll let the OP install an old version of the .hlp viewer if he wants to
check what Hamster uses for multiple header criteria in a rule. The:

+@headername {stringcriteria}
+@Header:{^headername: stringcriteria}

and their -@ negatives of them have worked for me which, I think, was
described by BernD who seems very intimate on the innards of Dialog.

In my Sent folder in Dialog, I searched on:
(?-s)^bern.* wrote:$
hoping I had him in an attribution line in a reply of mine. Looks like
it was Bernd Rose (*).

(*) Don't know if that's his real name (Bernhard aka Bernard) or a
psuedonym taken from https://en.wikipedia.org/wiki/Rose_Bernd.
Frank Slootweg
2021-01-19 19:11:36 UTC
Permalink
VanguardLH <***@nguard.lh> wrote:
[...]
Post by VanguardLH
described by BernD who seems very intimate on the innards of Dialog.
(?-s)^bern.* wrote:$
hoping I had him in an attribution line in a reply of mine. Looks like
it was Bernd Rose (*).
(*) Don't know if that's his real name (Bernhard aka Bernard) or a
psuedonym taken from https://en.wikipedia.org/wiki/Rose_Bernd.
It's probably him real name, as he user 'b.rose' in his e-mail
address. Don't we all post with our real names!? :-)

FWIW, he posted in this group as recently as November 2.
VanguardLH
2021-01-19 21:02:37 UTC
Permalink
Post by Frank Slootweg
Post by VanguardLH
(?-s)^bern.* wrote:$
hoping I had him in an attribution line in a reply of mine. Looks like
it was Bernd Rose (*).
(*) Don't know if that's his real name (Bernhard aka Bernard) or a
psuedonym taken from https://en.wikipedia.org/wiki/Rose_Bernd.
It's probably him real name, as he user 'b.rose' in his e-mail
address. Don't we all post with our real names!? :-)
FWIW, he posted in this group as recently as November 2.
I'm hoping that me mentioning him will prod him to respond in this
thread. Maybe the OP will have to start a new thread titled "Ping Bernd
Rose: How ..." to get his attention.

I found a Google Groups copy of a thread where he participated at:

https://groups.google.com/g/news.software.readers/c/VqReWvG4LA8/m/kD-fDAjsAQAJ
(Nov 2, 2020)

Alas, Google is getting an even bigger asshole in destroying their
Usenet archive by eliminating the "Show original message" option, so I
cannot see what is his e-mail address (well, what he shows in the
comment field in his From header). It's not an expiration thing. I
just looked at a Google Groups copy of a post I submitted all of 14
minutes ago, and the "3dot -> Show original message" menu entry is
disabled. Guess I'll have to up the retention (purging of old messages)
in my Dialog client from 60 days to something much longer, like to 365.
Sn!pe
2021-01-19 22:24:53 UTC
Permalink
Post by VanguardLH
Post by Frank Slootweg
Post by VanguardLH
(?-s)^bern.* wrote:$
hoping I had him in an attribution line in a reply of mine. Looks like
it was Bernd Rose (*).
(*) Don't know if that's his real name (Bernhard aka Bernard) or a
psuedonym taken from https://en.wikipedia.org/wiki/Rose_Bernd.
It's probably him real name, as he user 'b.rose' in his e-mail
address. Don't we all post with our real names!? :-)
FWIW, he posted in this group as recently as November 2.
I'm hoping that me mentioning him will prod him to respond in this
thread. Maybe the OP will have to start a new thread titled "Ping Bernd
Rose: How ..." to get his attention.
https://groups.google.com/g/news.software.readers/c/VqReWvG4LA8/m/kD-fDAjsAQAJ
(Nov 2, 2020)
Alas, Google is getting an even bigger asshole in destroying their
Usenet archive by eliminating the "Show original message" option, so I
cannot see what is his e-mail address (well, what he shows in the
comment field in his From header). It's not an expiration thing. I
just looked at a Google Groups copy of a post I submitted all of 14
minutes ago, and the "3dot -> Show original message" menu entry is
disabled. Guess I'll have to up the retention (purging of old messages)
in my Dialog client from 60 days to something much longer, like to 365.
PMFJI

This page from Howard Knight's Usenet Lookup service shows the last
article that I have here, dated Date: Mon, 2 Nov 2020 18:36:40 +0100
it is Message-ID: <1lw7rf5ay1imh$***@b.rose.tmpbox.news.arcor.de>
and shows BR's From: address. I would post that address myself, except
I feel that might be bad form.

Here is the Link to the article in the lookup service:
<http://al.howardknight.net/?ID=161109466800>

HTH
--
^Ï^


My pet rock Gordon just is.
Bernd Rose
2021-01-20 05:05:49 UTC
Permalink
Post by VanguardLH
Post by Frank Slootweg
Post by VanguardLH
(?-s)^bern.* wrote:$
hoping I had him in an attribution line in a reply of mine. Looks like
it was Bernd Rose (*).
(*) Don't know if that's his real name (Bernhard aka Bernard) or a
psuedonym taken from https://en.wikipedia.org/wiki/Rose_Bernd.
It's probably him real name, as he user 'b.rose' in his e-mail
address. Don't we all post with our real names!? :-)
Real name. Bernd is a common shortcut of Bernhard in Germany, wich isn't
used as nickname, anymore, but as a perfectly normal first name. - And
quite often with people of surname Rose, as I encountered meeting quite
a few name twins over the years. I don't think, Hauptmanns play Rose
Bernd is the sole reason for this. With a short surname, people usually
look for a short first name, as well. - And Bernd has been in the Top30
(sometimes in the Top10) of male first names in Germany after WW2 till
about the mid-seventies. Afterwards, it became a bit less often used.
Post by VanguardLH
Post by Frank Slootweg
FWIW, he posted in this group as recently as November 2.
I'm hoping that me mentioning him will prod him to respond in this
thread.
Read this thread until now and saw no reason to pop in. As long as
Squertz doesn't need further clarification, what you wrote should be
more than enough pointer to solve the OP's question.

Bernd
Frank Slootweg
2021-01-20 11:23:49 UTC
Permalink
Post by Bernd Rose
Post by Frank Slootweg
Post by VanguardLH
(?-s)^bern.* wrote:$
hoping I had him in an attribution line in a reply of mine. Looks like
it was Bernd Rose (*).
(*) Don't know if that's his real name (Bernhard aka Bernard) or a
psuedonym taken from https://en.wikipedia.org/wiki/Rose_Bernd.
It's probably him real name, as he user 'b.rose' in his e-mail
address. Don't we all post with our real names!? :-)
Real name. Bernd is a common shortcut of Bernhard in Germany, wich isn't
used as nickname, anymore, but as a perfectly normal first name. - And
quite often with people of surname Rose, as I encountered meeting quite
a few name twins over the years. I don't think, Hauptmanns play Rose
Bernd is the sole reason for this. With a short surname, people usually
look for a short first name, as well. - And Bernd has been in the Top30
(sometimes in the Top10) of male first names in Germany after WW2 till
about the mid-seventies. Afterwards, it became a bit less often used.
Yeah, for me - a Dutchie - 'Bernd Rose' sounded/sounds quite normal,
especially your first name. Have met/communicated_with my fair share of
Bernd's over the year! :-)

[...]
VanguardLH
2021-01-20 18:31:23 UTC
Permalink
Post by Bernd Rose
Read this thread until now and saw no reason to pop in. As long as
Squertz doesn't need further clarification, what you wrote should be
more than enough pointer to solve the OP's question.
I was pondering if what I wrote about the +@ and -@ where correct. I
know how I use them, but going by effect might not be an accurate
description of how they're supposed to work.

I have run across where a multiple header rule doesn't fire, everything
looked correct, testing on individual header clauses in the rule would
work in the regex tester on the header string, so I had to come up with
different rule(s) to effect what I was trying to accomplish within one
rule. Mostly I try to stick to just one additional header, like:

From {...} -@Message-ID:{....}

because sometimes I have used 3, or more, headers in a rule, like:

From {...} -@Message-ID:{...} +Header:{^Path: ...}

and the rule doesn't fire. Seems if it gets too complicated then the
rule gets missed. Or maybe it's the 2-pass scheme when exercising
rules, and I'm mixing overview and non-overview rules in the same rule.
I've not narrowed down why a 3+ multiple-header rule doesn't fire
despite in header clause with the rule works by itself.
Bernd Rose
2021-01-21 16:51:42 UTC
Permalink
Yes. The +@ can be considered as "AND" and -@ as "AND NOT". To get "OR",
several separate scoring lines have to be written.
Post by VanguardLH
I have run across where a multiple header rule doesn't fire, everything
looked correct, testing on individual header clauses in the rule would
work in the regex tester on the header string, so I had to come up with
different rule(s) to effect what I was trying to accomplish within one
and the rule doesn't fire. Seems if it gets too complicated then the
rule gets missed. Or maybe it's the 2-pass scheme when exercising
rules, and I'm mixing overview and non-overview rules in the same rule.
I've not narrowed down why a 3+ multiple-header rule doesn't fire
despite in header clause with the rule works by itself.
You missed the "@" char between "+" and "Header". In general, multiple
conditions /can/ be defined. As an example, the following works in this
group:

!setcolor(green;yellow) From "Bernd Rose" -@Message-ID: {b\.rose[^ ]*\.(uk|com)} +@Header: "User-Agent: 40tude_Dialog"

Too complicated rules /are/ prone to error, though. ;-)

HTH.
Bernd
VanguardLH
2021-01-22 03:25:33 UTC
Permalink
Post by Bernd Rose
several separate scoring lines have to be written.
Post by VanguardLH
I have run across where a multiple header rule doesn't fire, everything
looked correct, testing on individual header clauses in the rule would
work in the regex tester on the header string, so I had to come up with
different rule(s) to effect what I was trying to accomplish within one
and the rule doesn't fire. Seems if it gets too complicated then the
rule gets missed. Or maybe it's the 2-pass scheme when exercising
rules, and I'm mixing overview and non-overview rules in the same rule.
I've not narrowed down why a 3+ multiple-header rule doesn't fire
despite in header clause with the rule works by itself.
That was a typo.
Post by Bernd Rose
In general, multiple
conditions /can/ be defined. As an example, the following works in this
Too complicated rules /are/ prone to error, though. ;-)
I thought the +@ (or -@) syntax required no space between +@<hdrname>:
or +@Header: and the following curly brace. I also see you sometimes
don't delineate the regex criteria, or used double-quote instead of
curly braces. The parsing syntax is indeed, um, abundant.

Yeah, sometimes the regex or clauses get a bit overwhelming in trying to
define a rule that eliminate false positive, and only fires on the posts
or posters that I intend to flag.

I also found out there is a max line length limit, but find it by
accident. A rule might get so long that the parser won't handle it.
I've had to split a rule across multiple rules. Not sure what it the
maximum line length for a rule. Currently the longest rule I have is
411 characters long, but I've tried longer, found they didn't fire,
sliced them up into multiple rules, and those work. That one was for a
troll that nymshifts on nearly every submission, but there was still
enough unique in his headers to identify the troll (but a lot were on
non-overview headers).

Because of false positives when using pattern matching via regex, I
don't delete "bad" messages. I flag them as Ignored, and use a default
view of "Hide Ignored Messages". I also configured Dialog to hide any
replies (subthreads) to the ignore-flagged messages since I don't want
to see replies to "bad" messages. Occasionally I use "Show All Message"
to check if some messages should not have gotten flagged, or someone
makes reference to an otherwise hidden post where I need to see what was
said in the hidden subthread.
Bernd Rose
2021-01-23 08:58:54 UTC
Permalink
The rule is: No space between @ and the header name (including the
meta-header "header").
I also see you sometimes don't delineate the regex criteria, or used
double-quote instead of curly braces.
Only text inside curly braces is an regular expression. The text between
double-quotes is a pure text search. (No need to use RegEx for a simple
search.) ;-)
I also found out there is a max line length limit
This is most likely the case. Can't tell the limit without testing, though.
Because of false positives when using pattern matching via regex, I
don't delete "bad" messages. I flag them as Ignored, and use a default
view of "Hide Ignored Messages". I also configured Dialog to hide any
replies (subthreads) to the ignore-flagged messages since I don't want
to see replies to "bad" messages. Occasionally I use "Show All Message"
to check if some messages should not have gotten flagged, or someone
makes reference to an otherwise hidden post where I need to see what was
said in the hidden subthread.
Reasonable approach.

Bernd

Loading...