Discussion:
[xnews] italian accented vowels in subject
(too old to reply)
Ammammata
2020-04-27 14:29:42 UTC
Permalink
sometimes it happens that if there is an accented vowel in the subject I
got an error message

now I just put one in the subject ( è )

lets try to send...

ERROR
441 Invalid syntax encountered in Subject: header field body (unexpected
byte or empty content line)

Now I remove the letter é and try again to send
--
/-\ /\/\ /\/\ /-\ /\/\ /\/\ /-\ T /-\
-=- -=- -=- -=- -=- -=- -=- -=- - -=-
........... [ al lavoro ] ...........
Ammammata
2020-04-27 14:30:02 UTC
Permalink
Il giorno Mon 27 Apr 2020 04:29:42p, *Ammammata* ha inviato su
Post by Ammammata
Now I remove the letter é and try again to send
sent :/
--
/-\ /\/\ /\/\ /-\ /\/\ /\/\ /-\ T /-\
-=- -=- -=- -=- -=- -=- -=- -=- - -=-
........... [ al lavoro ] ...........
Ammammata
2020-04-27 14:32:24 UTC
Permalink
Post by Ammammata
Il giorno Mon 27 Apr 2020 04:29:42p, *Ammammata* ha inviato su
Post by Ammammata
Now I remove the letter é and try again to send
sent :/
trying again with mesnews, I put again accented letters in subject
--
/-\ /\/\ /\/\ /-\ /\/\ /\/\ /-\ T /-\
Ammammata
2020-04-27 14:33:13 UTC
Permalink
Post by Ammammata
Post by Ammammata
Il giorno Mon 27 Apr 2020 04:29:42p, *Ammammata* ha inviato su
Post by Ammammata
Now I remove the letter é and try again to send
sent :/
trying again with mesnews, I put again accented letters in subject
sent, so it's a xnews problem, ok: any tip?
--
/-\ /\/\ /\/\ /-\ /\/\ /\/\ /-\ T /-\
Grant Taylor
2020-04-27 15:49:21 UTC
Permalink
Post by Ammammata
sent, so it's a xnews problem, ok: any tip?
Does xnews support encoding non-ASCII characters in Subjects?

That ability is what's required to support éàè and friends.
--
Grant. . . .
unix || die
Michael Bäuerle
2020-04-27 15:57:05 UTC
Permalink
Post by Ammammata
Post by Ammammata
Post by Ammammata
Post by Ammammata
Now I remove the letter é and try again to send
sent :/
trying again with mesnews, I put again accented letters in subject
sent, so it's a xnews problem, ok: any tip?
Non-ASCII characters in header fields (like "Subject") must be MIME
encoded according to RFC 2047 [1].

Wikipedia says in [2]:
|
| Xnews does not support UTF-8 (or any other character set encoding),
| making it difficult or even impossible to use for reading or posting
| articles in languages other than English. It is, however, possible
| to run Xnews with "Mime-proxy" to at least partially work around this
| issue.

Looks like there is no internal MIME support in Xnews and you need an
external utility for this to work.


______________
[1] <https://tools.ietf.org/html/rfc2047>
[2] <https://en.wikipedia.org/wiki/Xnews>
Adam H. Kerman
2020-04-27 16:21:10 UTC
Permalink
Post by Michael Bäuerle
Post by Ammammata
Post by Ammammata
Post by Ammammata
Post by Ammammata
Now I remove the letter é and try again to send
sent :/
trying again with mesnews, I put again accented letters in subject
sent, so it's a xnews problem, ok: any tip?
Non-ASCII characters in header fields (like "Subject") must be MIME
encoded according to RFC 2047 [1].
Please say "encoded-word" as most people think of MIME as the header
describing the character set used in the article, not other header
lines. Yeah, I know MIME was a set of RFCs.
Post by Michael Bäuerle
|
| Xnews does not support UTF-8 (or any other character set encoding),
| making it difficult or even impossible to use for reading or posting
| articles in languages other than English. It is, however, possible
| to run Xnews with "Mime-proxy" to at least partially work around this
| issue.
Looks like there is no internal MIME support in Xnews and you need an
external utility for this to work.
______________
[1] <https://tools.ietf.org/html/rfc2047>
[2] <https://en.wikipedia.org/wiki/Xnews>
Oh my gawd.

You've got UTF-8 characters in your article, but both your MIME headers
and encoded-word on Subject and From indicate ISO-8859-1 Latin-1.

I'm not using MIME headers in this followup article to avoid
contributing even more confusion.
Adam H. Kerman
2020-04-27 16:02:53 UTC
Permalink
Post by Ammammata
Post by Ammammata
Post by Ammammata
Il giorno Mon 27 Apr 2020 04:29:42p, *Ammammata* ha inviato su
Now I remove the letter [e with accent] and try again to send
sent :/
trying again with mesnews, I put again accented letters in subject
sent, so it's a xnews problem, ok: any tip?
The letter e with accent is a two-byte UTF-8 sequence. You MUST NOT put
such characters on Subject. Header lines like Subject and From require
the use of encoded word to represent such letters.

This is your Subject:

Subject: Re: =?ISO-8859-15?Q?[mesnews]_italian_accented_vowels_in_subject_?=
=?ISO-8859-15?Q?=E9=E0=E8?=

It's using encoded word DESPITE having nothing to encode. That's just
wrong.

Your MIME headers for the article itself indicate the use of
ISO-8859-15:1999 Latin-9, the 8-bit character set that differs from
Latin-1 by just a few character codes.

But you're using a two-byte sequence for the e with accent and NOT an
8-bit character.

In a Latin-X character set, you MUST NOT use two-byte UTF-8 character
codes.
Michael Bäuerle
2020-04-27 16:52:10 UTC
Permalink
Post by Adam H. Kerman
Subject: Re: =?ISO-8859-15?Q?[mesnews]_italian_accented_vowels_in_subject_?=
=?ISO-8859-15?Q?=E9=E0=E8?=
It's using encoded word DESPITE having nothing to encode. That's just
wrong.
Obviously it is more complicated then required, but the syntax looks
correct. It would only be "wrong" if RFC 2047 forbids it.
Post by Adam H. Kerman
Your MIME headers for the article itself indicate the use of
ISO-8859-15:1999 Latin-9, the 8-bit character set that differs from
Latin-1 by just a few character codes.
But you're using a two-byte sequence for the e with accent and NOT an
8-bit character.
In a Latin-X character set, you MUST NOT use two-byte UTF-8 character
codes.
Please look at the article <news:***@tiscali.it>
again. I can see no Unicode in this article, neither in the header nor
in the body.

The same for my articles in this thread (that use ISO 8859-1 encoding,
as specified in the MIME labels).
Michael Bäuerle
2020-04-27 16:55:37 UTC
Permalink
Post by Adam H. Kerman
Subject: Re: =?ISO-8859-15?Q?[mesnews]_italian_accented_vowels_in_subject_?=
=?ISO-8859-15?Q?=E9=E0=E8?=
It's using encoded word DESPITE having nothing to encode. That's just
wrong.
Obviously it is more complicated than required, but the syntax looks
correct. It would only be "wrong" if RFC 2047 forbids it.
Post by Adam H. Kerman
Your MIME headers for the article itself indicate the use of
ISO-8859-15:1999 Latin-9, the 8-bit character set that differs from
Latin-1 by just a few character codes.
But you're using a two-byte sequence for the e with accent and NOT an
8-bit character.
In a Latin-X character set, you MUST NOT use two-byte UTF-8 character
codes.
Please look at the article <news:***@tiscali.it>
again. I can see no Unicode in this article, neither in the header nor
in the body.

The same for my articles in this thread (that use ISO 8859-1 encoding,
as specified in the MIME labels).
Adam H. Kerman
2020-04-27 20:56:11 UTC
Permalink
Post by Michael Bäuerle
Post by Adam H. Kerman
Subject: Re: =?ISO-8859-15?Q?[mesnews]_italian_accented_vowels_in_subject_?=
=?ISO-8859-15?Q?=E9=E0=E8?=
It's using encoded word DESPITE having nothing to encode. That's just
wrong.
Obviously it is more complicated than required, but the syntax looks
correct. It would only be "wrong" if RFC 2047 forbids it.
RFC 2047 does not control what the best practice is for posting a
conventional News article. I didn't say "nonstandard with respect to RFC
2047". I said it was wrong.

Posting a followup with mismatches between attribution lines and quoting
levels wouldn't be nonstandard with respect to RFC 2047 either.
Nevertheless, it's the wrong thing to do.

Your point is not well taken.
Post by Michael Bäuerle
Post by Adam H. Kerman
Your MIME headers for the article itself indicate the use of
ISO-8859-15:1999 Latin-9, the 8-bit character set that differs from
Latin-1 by just a few character codes.
But you're using a two-byte sequence for the e with accent and NOT an
8-bit character.
In a Latin-X character set, you MUST NOT use two-byte UTF-8 character
codes.
again. I can see no Unicode in this article, neither in the header nor
in the body.
The same for my articles in this thread (that use ISO 8859-1 encoding,
as specified in the MIME labels).
I'm just going by what the translation in the terminal emulator shows
me. When I'm translating as UTF-8, it doesn't display the desired glyph
for 8-bit character codes.

When I have to change the translation to UTF-8 to display a specific
character, then that tells me that a two-byte Unicode character code has
been used.

I typically have my translation set to ISO-8859-1.

Oddly, the precursor article is no longer on my News server. Did you
cancel it?
Michael Bäuerle
2020-04-28 08:35:11 UTC
Permalink
Post by Adam H. Kerman
Post by Michael Bäuerle
Post by Adam H. Kerman
[...]
Your MIME headers for the article itself indicate the use of
ISO-8859-15:1999 Latin-9, the 8-bit character set that differs from
Latin-1 by just a few character codes.
But you're using a two-byte sequence for the e with accent and NOT an
8-bit character.
In a Latin-X character set, you MUST NOT use two-byte UTF-8 character
codes.
again. I can see no Unicode in this article, neither in the header nor
in the body.
The same for my articles in this thread (that use ISO 8859-1 encoding,
as specified in the MIME labels).
I'm just going by what the translation in the terminal emulator shows
me. When I'm translating as UTF-8, it doesn't display the desired glyph
for 8-bit character codes.
No surprise, because there are no UTF-8 sequences.
Post by Adam H. Kerman
When I have to change the translation to UTF-8 to display a specific
character, then that tells me that a two-byte Unicode character code has
been used.
I recommend to use a hex viewer/editor on the raw article instead.
Output from hexedit for the relevant body part:
|
| [...]
| 00000624 0A 3E 3E 3E 20 0D 0A 3E 3E 3E 20 4E .>>> ..>>> N
| 00000630 6F 77 20 49 20 72 65 6D 6F 76 65 20 ow I remove
| 0000063C 74 68 65 20 6C 65 74 74 65 72 20 E9 the letter .
^^ ^
| 00000648 20 61 6E 64 20 74 72 79 20 61 67 61 and try aga
| [...]

The character at index 647h is the 'é' and is encoded as E9h (not
with an UTF-8 sequence).

The content type was declared in the header as:
|
| Content-Type: text/plain; charset="iso-8859-15"; format=flowed

According to ISO 8859-15 [1] the codepoint E9h means 'é'.
Nothing wrong there with the article.
Post by Adam H. Kerman
I typically have my translation set to ISO-8859-1.
The codepoint E9h has the same meaning in ISO 8859-1 [2] and you
should nevertheless see 'é'. If not, there is something wrong with
your setup. Maybe your xterm (or whatever you use) runs with a
different locale configuration than the shell inside it.
Post by Adam H. Kerman
Oddly, the precursor article is no longer on my News server. Did you
cancel it?
I have superseded it because of a typo:
|
| Message-ID: <***@WStation5.stz-e.de>
| Supersedes: <AABepw26+***@WStation5.stz-e.de>


_______________
[1] <https://en.wikipedia.org/wiki/ISO/IEC_8859-15#Codepage_layout>
[2] <https://en.wikipedia.org/wiki/ISO/IEC_8859-1#Code_page_layout>
Adam H. Kerman
2020-04-28 15:34:52 UTC
Permalink
Post by Michael Bäuerle
Post by Adam H. Kerman
I'm just going by what the translation in the terminal emulator shows
me. When I'm translating as UTF-8, it doesn't display the desired glyph
for 8-bit character codes.
No surprise, because there are no UTF-8 sequences.
I was just pointing out in other cases that the terminal emulation does
what's expected.

I will look into this further. If what you're saying is correct,
something didn't translate to the proper glyph on my end.
Michael Bäuerle
2020-04-28 16:25:25 UTC
Permalink
Post by Adam H. Kerman
Post by Michael Bäuerle
Post by Adam H. Kerman
I'm just going by what the translation in the terminal emulator shows
me. When I'm translating as UTF-8, it doesn't display the desired glyph
for 8-bit character codes.
No surprise, because there are no UTF-8 sequences.
I was just pointing out in other cases that the terminal emulation does
what's expected.
I will look into this further. If what you're saying is correct,
something didn't translate to the proper glyph on my end.
Look at the locale used for the terminal emulation first.

Today I saw the same error while connected from a fresh installed
GNU/Linux to my NetBSD system (that is configured to use ISO 8859-1
locale). The GNU/Linux system uses UTF-8 locale by default and
therefore the terminal expected UTF-8 (but my NetBSD ssh-session
used ISO 8859-1, as configured there).
Adam H. Kerman
2020-04-28 17:36:02 UTC
Permalink
Post by Michael Bäuerle
Post by Adam H. Kerman
Post by Michael Bäuerle
Post by Adam H. Kerman
I'm just going by what the translation in the terminal emulator shows
me. When I'm translating as UTF-8, it doesn't display the desired glyph
for 8-bit character codes.
No surprise, because there are no UTF-8 sequences.
I was just pointing out in other cases that the terminal emulation does
what's expected.
I will look into this further. If what you're saying is correct,
something didn't translate to the proper glyph on my end.
Look at the locale used for the terminal emulation first.
Today I saw the same error while connected from a fresh installed
GNU/Linux to my NetBSD system (that is configured to use ISO 8859-1
locale). The GNU/Linux system uses UTF-8 locale by default and
therefore the terminal expected UTF-8 (but my NetBSD ssh-session
used ISO 8859-1, as configured there).
Hm. It's UTF-8, but I thought the terminal emulation setting overrode that.

I guess both have to match at all times.
Michael Bäuerle
2020-04-28 18:41:52 UTC
Permalink
Post by Adam H. Kerman
Post by Michael Bäuerle
Post by Adam H. Kerman
I will look into this further. If what you're saying is correct,
something didn't translate to the proper glyph on my end.
Look at the locale used for the terminal emulation first.
Today I saw the same error while connected from a fresh installed
GNU/Linux to my NetBSD system (that is configured to use ISO 8859-1
locale). The GNU/Linux system uses UTF-8 locale by default and
therefore the terminal expected UTF-8 (but my NetBSD ssh-session
used ISO 8859-1, as configured there).
Hm. It's UTF-8, but I thought the terminal emulation setting overrode that.
I guess both have to match at all times.
Yes. The side that creates the data must use the same encoding as the
receiving side is using for interpretation.

Otherwise the displayed data will be broken or only correct by
coincidence (when some guessing algorithm has taken the right choice).
Michael Bäuerle
2020-04-27 16:20:56 UTC
Permalink
Post by Ammammata
Post by Ammammata
Post by Ammammata
Post by Ammammata
Now I remove the letter é and try again to send
sent :/
trying again with mesnews, I put again accented letters in subject
sent, so it's a xnews problem, ok: any tip?
Non-ASCII characters in header fields (like "Subject") must be MIME
encoded according to RFC 2047 [1].

Wikipedia says in [2]:
|
| Xnews does not support UTF-8 (or any other character set encoding),
| making it difficult or even impossible to use for reading or posting
| articles in languages other than English. It is, however, possible
| to run Xnews with "Mime-proxy" to at least partially work around this
| issue.

Looks like there is no internal MIME support in Xnews and you need an
external utility for this to work.

Supersede: Some "Mime-proxy" related stuff in italian:
<https://digilander.libero.it/xnews/mimeproxy/mime.html>


______________
[1] <https://tools.ietf.org/html/rfc2047>
[2] <https://en.wikipedia.org/wiki/Xnews>
Ammammata
2020-04-28 08:08:26 UTC
Permalink
Il giorno Mon 27 Apr 2020 04:33:13p, *Ammammata* ha inviato su
Post by Ammammata
sent, so it's a xnews problem, ok: any tip?
I installed MIME proxy and put again some accented vowels in the subject
--
/-\ /\/\ /\/\ /-\ /\/\ /\/\ /-\ T /-\
-=- -=- -=- -=- -=- -=- -=- -=- - -=-
........... [ al lavoro ] ...........
Ammammata
2020-04-28 08:09:19 UTC
Permalink
Il giorno Tue 28 Apr 2020 10:08:26a, *Ammammata* ha inviato su
Post by Ammammata
I installed MIME proxy and put again some accented vowels in the subject
well well well, sounds good :)
--
/-\ /\/\ /\/\ /-\ /\/\ /\/\ /-\ T /-\
-=- -=- -=- -=- -=- -=- -=- -=- - -=-
........... [ al lavoro ] ...........
Michael Bäuerle
2020-04-28 08:42:57 UTC
Permalink
Post by Ammammata
Post by Ammammata
I installed MIME proxy and put again some accented vowels in the subject
well well well, sounds good :)
Yes. Does it work for the body too?

àéàéèàéèàéèàéèàèéàèéàèéàèé
Ammammata
2020-04-28 09:48:38 UTC
Permalink
Il giorno Tue 28 Apr 2020 10:42:57a, *Michael Bäuerle* ha inviato su
Post by Michael Bäuerle
Post by Ammammata
Post by Ammammata
I installed MIME proxy and put again some accented vowels in the subject
well well well, sounds good :)
Yes. Does it work for the body too?
àéàéèàéèàéèàéèàèéàèéàèéàèé
yes, I made a test on az.test
--
/-\ /\/\ /\/\ /-\ /\/\ /\/\ /-\ T /-\
-=- -=- -=- -=- -=- -=- -=- -=- - -=-
........... [ al lavoro ] ...........
Michael Bäuerle
2020-04-28 14:56:58 UTC
Permalink
Post by Ammammata
Il giorno Tue 28 Apr 2020 10:42:57a, *Michael Bäuerle* ha inviato su
Is this 4-line introduction intended?
Post by Ammammata
Post by Michael Bäuerle
Post by Ammammata
Post by Ammammata
I installed MIME proxy and put again some accented vowels in the subject
well well well, sounds good :)
Yes. Does it work for the body too?
àéàéèàéèàéèàéèàèéàèéàèéàèé
yes, I made a test on az.test
Encoding of the body looks good.

Mime-Proxy generated this header field for "Subject":
|
| Subject: Re: [mesnews]_italian_accented_vowels_in_subject
| =?iso-8859-1?Q?=E0=E9=E0=E9=E8=E0=E9=E8=E0=E9=E8=E0=E9=E8=E0=E8=E9=E0=E8=E9=E0=E8=E9=E0=E8=E9?=

The second line is too long, RFC 2047 says in section 2 [1]:
|
| While there is no limit to the length of a multiple-line header
| field, each line of a header field that contains one or more
| 'encoded-word's is limited to 76 characters.

I don't know if Mime-Proxy can be configured to do it as defined.
But it's likely good enough that most newsreaders will understand it.


______________
[1] <https://tools.ietf.org/html/rfc2047#section-2>
Ralph Fox
2020-04-27 18:52:13 UTC
Permalink
Post by Ammammata
sometimes it happens that if there is an accented vowel in the subject I
got an error message
now I just put one in the subject ( è )
lets try to send...
ERROR
441 Invalid syntax encountered in Subject: header field body (unexpected
byte or empty content line)
Now I remove the letter é and try again to send
Ammammata

Your new server news.solani.org is rejecting messages with a 441 error
when the message subject contains non-ASCII characters which are not
encoded.

ASCII us a 7-bit character set and éàè are not 7-bit ASCII characters.

Xnews does not know how to encode non-ASCII characters in the subject.
MesNews is able to do this.

If you use Mime-proxy with Xnews, then Mime-proxy has an option to encode
non-ASCII characters in the subject of outgoing messages.
--
Kind regards
Ralph
s|b
2020-04-27 19:51:00 UTC
Permalink
User-Agent: Xnews/??.01.30
Have you tried version 2006.08.24?
--
s|b
Loading...