This is a discussion on Re: [AMaViS-user] ERROR: invalid byte sequence for encoding "UTF8" within the Amavis User forums, part of the Anti-Spam and Anti-Virus Related Forums category; Valentin, > We use amavisd-new 2.5.3 with white/blacklists stored in a PostgreSQL > (UTF-8) database. ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Valentin,
> We use amavisd-new 2.5.3 with white/blacklists stored in a PostgreSQL > (UTF-8) database. Mails with a From/To containing german umlauts in latin1 > will hang in the postfix queue forever. The mail-logs contains a lot of the > following errors: > > ... relay=127.0.0.1[127.0.0.1]:10024, conn_use=2, delay=8955, > delays=8955/0/0/0.14, dsn=4.5.0, status=deferred (host 127.0.0.1[127.0.0.1] > said: 451-4.5.0 Error in processing, id=22244-16-2, spam-wb-list FAILED: > sql exec: err=7, 22021, DBD::Pg::st execute failed: ERROR: invalid byte > sequence for encoding "UTF8": 0xdf65 > Is there a way to set the postgres "client_encoding" in amavisd-new? > Or if it is a bug: Is there a workaround for this problem? I posted a fix to a postfix list about a week ago, please see: http://marc.info/?t=120622960700003&r=1&w=2 It requires changing data type of users.email and mailaddr.email to 'bytea', and a corresponding patch to amavisd. Although the solution is general (regardless of SQL sw), it applies mainly to PostgreSQL, as the MySQL does not check character validity (nor does SQLite). This will go into 2.6.0. > I noticed in the amavisd-new-2.5.3 release notes the following point: > - sanitize 8-bit characters in In-Reply-To and References header fields > before using them in Pen Pals SQL lookups to avoid UTF-8 errors like: > penpals_check FAILED: sql exec: err=7, 22021, DBD::Pg::st execute > failed: ERROR: invalid byte sequence for encoding "UTF8": 0xd864 > Should this probably also be fixed for spam-wb-list lookups? It is related, but in case of SQL logging (and pen pals) I chose a less intrusive but not quite accurate fix, while I believe it is worth to do it right for lookups and white/blacklisting. With 2.6.0 you'll have a choice for the r/w SQL tables (the maddr.email, i.e. what is used by pen pals and SQL logging/quarantining) to choose between a binary or a character string data type, probably as follows: $sql_allow_8bit_address = 0; # maddr.email: 0 => VARCHAR, # 1 => VARBINARY/BYTEA I'm not sure whether it would be better to start assuming the evelope addresses are UTF-8 (draft-ietf-eai-smtpext-11), and still choking in some way for invalid UTF-8 byte sequences, or just use string-of-bytes data type (no asociated character set). I have yet to see a case of draft-ietf-eai-smtpext address in practice - most 8-bit envelope addresses nowadays just inappropriately assume latin-1 or similar single-byte encoding. Mark ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://ad.doubleclick.net/clk;164216...et/marketplace _______________________________________________ AMaViS-user mailing list AMaViS-user@lists.sourceforge.net https://lists.sourceforge.net/lists/...fo/amavis-user AMaViS-FAQ:http://www.amavis.org/amavis-faq.php3 AMaViS-HowTos:http://www.amavis.org/howto/ |
![]() |
| Thread Tools | |
| Display Modes | |
|
|