View Single Post

  #2 (permalink)  
Old 01-08-2008
Martin Gregorie
 
Posts: n/a
Default Re: How to archive mails relayed by postfix?

Ray.SWC@gmail.com wrote:
> The architecture I would like to accomplish may sound silly, but it's
> what I would like to have, as follows. Basically the idea is to have
> Postfix as a mail gateway for anti-virus, anti-spam AND also mail
> archiver for the existing Exchange server on the dumb Windows.
>

You can do all that, though there is one snag that will need a workround.

> Simple view:
> External -> Posfix -> Exchange
>
> Detail view:
> External
> |
> Posfix smtpd
> |
> Amavisd-new
> \___ ClamAV and SpamAssassin
> /
> |
> Postfix qmgr ----- Postfix local
> |
> Postfix relay
> |
> Exchange
>
> That means I would like to have two copies of each mail: When a mail
> is received by the Postfix, it is scanned through Amavis. Then, the
> mail would "tee" into two copies and then deliver to both local and
> also relay to the Exchange server.
>

Simple.

Use "always_bcc" to send a copy of the mail to a special archive
mailbox. You'll need to provide some mechanism to deal with the mail
when it arrives in the mailbox.

I'm currently using procmail and a self developed shell script to store
mail in a set of mbox files in a directory structure: archive/yyyy/mbox
where yyyy is the year when the mail was sent. Mail will be discarded if
the mbox file hits the size defined by "mailbox_size_limit", so the
script monitors the mailbox size and renames when its approaching the
max size, so a set of files (mbox, mbox.1, mbox.2, ....) are built up in
the year directory.

I've recently written a database-based archiving system which has just
been loaded with the last three years' worth of archived mail and should
be in full time use by the end of this week following a minor tweak or
two. It indexes the mail and allows searches on any combination of
address, subject, date range and (last resort) text search of the
message's plain text part. It should be portable as its fairly
database-independent and written in Java. I'm using PostgreSQL as the
database, but anything with a JDBC driver that has a sequence generator
and can handle CLOB fields should work, i.e. I think Derby and MySQL
would be OK too.

> My current settings follows those guides and FAQs and forum posts
> everywhere on google and I have the following works:
>

The problem I mentioned is that "always_bcc" copies every message that
hits qmgr, so when I ran Spamassassin as a Postfix controlled service
two copies of each message got sent to the archive (one as the message
was received, the second as it was re-injected after being inspected by
Spamassassin.

I solved the problem by adapting my mail flow:

ISP --> fetchmail | spamc | sendmail --> Postfix --> the archive
|
v
dovecot --> users

Another approach would be for the archiving system to discard all
messages that don't contain the X-Spam-Status header. Spamassassin adds
this to every message it processes, so this mechanism would only archive
messages that have been looked at by Spamassassin.

My database archiver filters its input anyway to avoid archiving spam.
It discards:
- mail marked as spam
- mail that was retrieved from the archive and returned to the
search user
- mail whose sender domain doesn't exist (this traps some spam that
Spamassassin misses - notably 419 and the better constructed
phishing scams.

> (Although currently the SpamAssassin lets all mail pass and cannot
> distinguish spam yet.)
>

That's done by design. If you want to filter spam out of the stream
rather than using rules in mail clients to put it in a Spam mailbox
you'll have to write the filter yourself. Its not a totally trivial task
because you'll need to work out how to reliably handle false positives.

My to do list includes two enhancements:
- a program that sits downstream of spamc and filters out all messages
that Spamassassin has marked as spam
- a local rule for Spamassassin that forces mail from people in the
archive to be accepted. This should stop my filter from discarding
(very rare) false positives. So far I've only had mail from one(!)
correspondent that was flagged as spam, so the local rule is low
priority for me.

> Please advise if there is some workarounds or the scenario is totally
> stupid. Thanks for all of your help.
>

Sounds like a good plot to me.


--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |
Reply With Quote