Bluehost.com Web Hosting $6.95

[courier-users] courierimapkeywords = massive disk io

This is a discussion on [courier-users] courierimapkeywords = massive disk io within the Courier-Imap forums, part of the Mail Servers and Related category; Is the idea of limiting the rate at which a given IP can submit IMAP commands insane (both as a ...


Go Back   Usenet Forums > Mail Servers and Related > Courier-Imap

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 07-16-2008
Pavel May
 
Posts: n/a
Default [courier-users] courierimapkeywords = massive disk io

Is the idea of limiting the rate at which a given IP can submit IMAP commands insane (both as a general idea and in the context of dealing with a client generating a ton of I/O in a silly/obscene/stupid fashion)?

--
(Mulder talking into tape recorder)
Mulder: Deep Throat said 'Trust No One'. It's hard, Scully.
Suspecting everyone, everything. It wears you down.
You even begin to doubt what you know is the truth.
Before, I could only trust myself. Now, I can only
trust you. And they've taken you away from me.

"The X-Files: Little Green Men"

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.p...r_id=100&url=/
_______________________________________________
courier-users mailing list
courier-users@lists.sourceforge.net
Unsubscribe: https://lists.sourceforge.net/lists/.../courier-users

Reply With Quote
  #2 (permalink)  
Old 07-17-2008
Sam Varshavchik
 
Posts: n/a
Default Re: [courier-users]courierimapkeywords = massive disk io

Pavel May writes:

> Is the idea of limiting the rate at which a given IP can submit IMAP commands insane (both as a general idea and in the context of dealing with a client generating a ton of I/O in a silly/obscene/stupid fashion)?


Heavy I/O is not necessarily generated by a large number of individual IMAP
commands, but rather a "perfect storm" of: certain poor aspects of IMAP's
design that makes it difficult to efficiently implement every possible
permutation of an IMAP command with an existing mail store (IMAP is heavily
biased towards an IMAP-specific mail store, rather than a generic one like a
maildir); and badly-designed IMAP clients that do not take advantage of
IMAP-specific optimizations, but rather use an IMAP server as either a
glorified POP3 server, or as a remote file access server.

But there are IMAP clients which correctly implement IMAP and issue a large
number of IMAP commands that complete quickly with a negligible footprint.
They should not be penalized for the sins of others.

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.p...r_id=100&url=/
_______________________________________________
courier-users mailing list
courier-users@lists.sourceforge.net
Unsubscribe: https://lists.sourceforge.net/lists/.../courier-users

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEABECAAYFAkh+fS4ACgkQx9p3GYHlUOIHewCfbKYFFUxRF6 w50SD8EwCnx11s
EF4Ani0NQp2gDrZkg1CjxsuYERz+FpQH
=kfrT
-----END PGP SIGNATURE-----

Reply With Quote
  #3 (permalink)  
Old 07-17-2008
Gordon Messmer
 
Posts: n/a
Default Re: [courier-users] courierimapkeywords = massive disk io

Sam Varshavchik wrote:
> certain poor aspects of
> IMAP's design that makes it difficult to efficiently implement every
> possible permutation of an IMAP command with an existing mail store
> (IMAP is heavily biased towards an IMAP-specific mail store, rather than
> a generic one like a maildir)


I was thinking about possible designs for keyword implementations that
would reduce IO. I wondered if storing keywords in files were not a
design that suffers the same problems that mbox did.

What if, rather than the current implementation, each Maildir had an
"imapkeywords" directory. This directory may contain sub-directories
whose names represent the keywords that have been set on messages in the
Maildir. When a keyword is set on a message, a hard link (or symbolic
link?) is created in the appropriate directory; the name of the link
should be the same as the file containing the message, minus the flags
at the end of its name.

With this design, adding a keyword to a message is (usually) an atomic
operation. The IMAP server should first try to create the link. If the
attempt fails, it should then check for the presence of the keyword
directory, create it if missing, and attempt to link the file again.

Removing a keyword is also an atomic operation. Simply unlink the path
corresponding to the keyword link.

I would imagine that building a list of files with keywords should go
something like:
* scan the keywords directories and create a list of keywords in the Maildir
* scan each keyword directory and create a hash containing the names of
message file links
* scan the cur/ directory for message files. For each one, check all of
the keyword hashes for a match against the file name minus the flags at
the end. If there is a match, record that the message was tagged with
that keyword.

It should be possible to scan each directory using only readdir(), to
reduce the IO associated with calling stat() on an indefinite number of
message files.

Do you think that such a design is possible, Sam?

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.p...r_id=100&url=/
_______________________________________________
courier-users mailing list
courier-users@lists.sourceforge.net
Unsubscribe: https://lists.sourceforge.net/lists/.../courier-users

Reply With Quote
  #4 (permalink)  
Old 07-18-2008
Sam Varshavchik
 
Posts: n/a
Default Re: [courier-users]courierimapkeywords = massive disk io

Gordon Messmer writes:

> I was thinking about possible designs for keyword implementations that
> would reduce IO. I wondered if storing keywords in files were not a
> design that suffers the same problems that mbox did.


Not really. With mbox, any change to the status of any message in the
mailbox necessitates the rewrite of the entire mbox file. Some optimizations
were possible, but, on average, you'll need to rewrite half the file.
Furthermore, if the imapd process was killed, you're left with a corrupted
mbox file. That really is what kills mbox. It's not reliable and is subject
to corruption.

This is not applicable to Courier's keyword file due to the way the
imapkeywords file is rewritten. Furthermore, the keywords metadata only
needs to be updated whenever keywords are updated, which happens much less
often then updates of message status.

> What if, rather than the current implementation, each Maildir had an
> "imapkeywords" directory. This directory may contain sub-directories
> whose names represent the keywords that have been set on messages in the
> Maildir. When a keyword is set on a message, a hard link (or symbolic
> link?) is created in the appropriate directory; the name of the link
> should be the same as the file containing the message, minus the flags
> at the end of its name.


So, a FETCH STATUS on each message now needs to stat() each keyword
subdirectory, for the presence of the file.

> I would imagine that building a list of files with keywords should go
> something like:
> * scan the keywords directories and create a list of keywords in the Maildir
> * scan each keyword directory and create a hash containing the names of
> message file links
> * scan the cur/ directory for message files. For each one, check all of
> the keyword hashes for a match against the file name minus the flags at
> the end. If there is a match, record that the message was tagged with
> that keyword.
>
> It should be possible to scan each directory using only readdir(), to
> reduce the IO associated with calling stat() on an indefinite number of
> message files.
>
> Do you think that such a design is possible, Sam?


The problem with this is that, in practice:

1) you almost never need to retrieve the keywords of all messages in a
folder, just the keywords set for a specific message

2) The overhead of this is somewhat higher than just reading a small number
of files, and parsing them

3) The difference in overhead is magnified by the fact that you'll need to
repeat the process with every NOOP command, which the client sends to
request changes to the status of any message in the folder.

My gut feeling is that this approach actually results in more I/O. Message
filenames tend to be longer then keyword names. Given a message filename F,
and keywords K(1)..K(n), in your proposal, the baseline datum that
represents those keywords set for the message, excluding all other overhead,
is length(F)*n. Each keyword directory stores filename F, so that's how
many bytes there are to read. Right now, the baseline datum that represents
the same keywords would be, approximately: length(F) + n*2, which is
significantly less. Here's why. Here's a keyword file in one of my folders:

$Label1
$Label3

1214751906.M126193P4806V0000000000000901I000000000 0220B54_0.commodore.email-scan.com,S=5657:1
1214752505.M593345P5048V0000000000000901I000000000 0237DE5_0.commodore.email-scan.com,S=3460:0
1214770505.M648167P19061V0000000000000901I00000000 00237DEE_0.commodore.email-scan.com,S=3550:0
1214958905.M700616P29551V0000000000000901I00000000 00237E6E_0.commodore.email-scan.com,S=2455:0

The keyword file lists the names of all the keywords once, and the keywords
are assigned to messages by listing their index number, not name. In the
example above, the first message has $Label3 set (keyword 1), and the rest
have $Label1 set (keyword 0).

When keywords are in heavy use, this is a very compact mechanism for saving
the keyword metadata.

The I/O issue, I believe, is really not due to how the keyword metadata is
actually stored, but rather because of the overall logic. I made some tweaks
to the internal logic in 4.4 which should result in less keyword-related I/O
as a result of keyword updates.


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.p...r_id=100&url=/
_______________________________________________
courier-users mailing list
courier-users@lists.sourceforge.net
Unsubscribe: https://lists.sourceforge.net/lists/.../courier-users

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEABECAAYFAkh/zFMACgkQx9p3GYHlUOJOmwCeNh5jtezgk/cmuTyMlvv/Gg7m
yQQAn2v6J0tg39wSwU4eQZqxsmA3I8nN
=Id9H
-----END PGP SIGNATURE-----

Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT +1. The time now is 01:20 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0