Re: [AMaViS-user] ERROR: invalid byte sequence for encoding "UTF8"

This is a discussion on Re: [AMaViS-user] ERROR: invalid byte sequence for encoding "UTF8" within the Amavis User forums, part of the Anti-Spam and Anti-Virus Related Forums category; Valentin, > We use amavisd-new 2.5.3 with white/blacklists stored in a PostgreSQL > (UTF-8) database. ...


Go Back   Usenet Forums > Anti-Spam and Anti-Virus Related Forums > Amavis User

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 03-31-2008
Mark Martinec
 
Posts: n/a
Default Re: [AMaViS-user] ERROR: invalid byte sequence for encoding "UTF8"

Valentin,

> We use amavisd-new 2.5.3 with white/blacklists stored in a PostgreSQL
> (UTF-8) database. Mails with a From/To containing german umlauts in latin1
> will hang in the postfix queue forever. The mail-logs contains a lot of the
> following errors:
>
> ... relay=127.0.0.1[127.0.0.1]:10024, conn_use=2, delay=8955,
> delays=8955/0/0/0.14, dsn=4.5.0, status=deferred (host 127.0.0.1[127.0.0.1]
> said: 451-4.5.0 Error in processing, id=22244-16-2, spam-wb-list FAILED:
> sql exec: err=7, 22021, DBD::Pg::st execute failed: ERROR: invalid byte
> sequence for encoding "UTF8": 0xdf65


> Is there a way to set the postgres "client_encoding" in amavisd-new?
> Or if it is a bug: Is there a workaround for this problem?


I posted a fix to a postfix list about a week ago, please see:

http://marc.info/?t=120622960700003&r=1&w=2

It requires changing data type of users.email and mailaddr.email
to 'bytea', and a corresponding patch to amavisd. Although the
solution is general (regardless of SQL sw), it applies mainly
to PostgreSQL, as the MySQL does not check character validity
(nor does SQLite). This will go into 2.6.0.

> I noticed in the amavisd-new-2.5.3 release notes the following point:
> - sanitize 8-bit characters in In-Reply-To and References header fields
> before using them in Pen Pals SQL lookups to avoid UTF-8 errors like:
> penpals_check FAILED: sql exec: err=7, 22021, DBD::Pg::st execute
> failed: ERROR: invalid byte sequence for encoding "UTF8": 0xd864
> Should this probably also be fixed for spam-wb-list lookups?


It is related, but in case of SQL logging (and pen pals) I chose
a less intrusive but not quite accurate fix, while I believe it
is worth to do it right for lookups and white/blacklisting.

With 2.6.0 you'll have a choice for the r/w SQL tables (the maddr.email,
i.e. what is used by pen pals and SQL logging/quarantining) to choose
between a binary or a character string data type, probably as follows:

$sql_allow_8bit_address = 0; # maddr.email: 0 => VARCHAR,
# 1 => VARBINARY/BYTEA

I'm not sure whether it would be better to start assuming
the evelope addresses are UTF-8 (draft-ietf-eai-smtpext-11),
and still choking in some way for invalid UTF-8 byte sequences,
or just use string-of-bytes data type (no asociated character set).
I have yet to see a case of draft-ietf-eai-smtpext address in
practice - most 8-bit envelope addresses nowadays just inappropriately
assume latin-1 or similar single-byte encoding.

Mark

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216...et/marketplace
_______________________________________________
AMaViS-user mailing list
AMaViS-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/...fo/amavis-user
AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3
AMaViS-HowTos:http://www.amavis.org/howto/
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 07:05 AM.


Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0