This is a discussion on [courier-users] courierimapkeywords = massive disk io within the Courier-Imap forums, part of the Mail Servers and Related category; Is the idea of limiting the rate at which a given IP can submit IMAP commands insane (both as a ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Is the idea of limiting the rate at which a given IP can submit IMAP commands insane (both as a general idea and in the context of dealing with a client generating a ton of I/O in a silly/obscene/stupid fashion)?
-- (Mulder talking into tape recorder) Mulder: Deep Throat said 'Trust No One'. It's hard, Scully. Suspecting everyone, everything. It wears you down. You even begin to doubt what you know is the truth. Before, I could only trust myself. Now, I can only trust you. And they've taken you away from me. "The X-Files: Little Green Men" ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.p...r_id=100&url=/ _______________________________________________ courier-users mailing list courier-users@lists.sourceforge.net Unsubscribe: https://lists.sourceforge.net/lists/.../courier-users |
|
|||
|
Pavel May writes:
> Is the idea of limiting the rate at which a given IP can submit IMAP commands insane (both as a general idea and in the context of dealing with a client generating a ton of I/O in a silly/obscene/stupid fashion)? Heavy I/O is not necessarily generated by a large number of individual IMAP commands, but rather a "perfect storm" of: certain poor aspects of IMAP's design that makes it difficult to efficiently implement every possible permutation of an IMAP command with an existing mail store (IMAP is heavily biased towards an IMAP-specific mail store, rather than a generic one like a maildir); and badly-designed IMAP clients that do not take advantage of IMAP-specific optimizations, but rather use an IMAP server as either a glorified POP3 server, or as a remote file access server. But there are IMAP clients which correctly implement IMAP and issue a large number of IMAP commands that complete quickly with a negligible footprint. They should not be penalized for the sins of others. ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.p...r_id=100&url=/ _______________________________________________ courier-users mailing list courier-users@lists.sourceforge.net Unsubscribe: https://lists.sourceforge.net/lists/.../courier-users -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEABECAAYFAkh+fS4ACgkQx9p3GYHlUOIHewCfbKYFFUxRF6 w50SD8EwCnx11s EF4Ani0NQp2gDrZkg1CjxsuYERz+FpQH =kfrT -----END PGP SIGNATURE----- |
|
|||
|
Sam Varshavchik wrote:
> certain poor aspects of > IMAP's design that makes it difficult to efficiently implement every > possible permutation of an IMAP command with an existing mail store > (IMAP is heavily biased towards an IMAP-specific mail store, rather than > a generic one like a maildir) I was thinking about possible designs for keyword implementations that would reduce IO. I wondered if storing keywords in files were not a design that suffers the same problems that mbox did. What if, rather than the current implementation, each Maildir had an "imapkeywords" directory. This directory may contain sub-directories whose names represent the keywords that have been set on messages in the Maildir. When a keyword is set on a message, a hard link (or symbolic link?) is created in the appropriate directory; the name of the link should be the same as the file containing the message, minus the flags at the end of its name. With this design, adding a keyword to a message is (usually) an atomic operation. The IMAP server should first try to create the link. If the attempt fails, it should then check for the presence of the keyword directory, create it if missing, and attempt to link the file again. Removing a keyword is also an atomic operation. Simply unlink the path corresponding to the keyword link. I would imagine that building a list of files with keywords should go something like: * scan the keywords directories and create a list of keywords in the Maildir * scan each keyword directory and create a hash containing the names of message file links * scan the cur/ directory for message files. For each one, check all of the keyword hashes for a match against the file name minus the flags at the end. If there is a match, record that the message was tagged with that keyword. It should be possible to scan each directory using only readdir(), to reduce the IO associated with calling stat() on an indefinite number of message files. Do you think that such a design is possible, Sam? ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.p...r_id=100&url=/ _______________________________________________ courier-users mailing list courier-users@lists.sourceforge.net Unsubscribe: https://lists.sourceforge.net/lists/.../courier-users |
|
|||
|
Gordon Messmer writes:
> I was thinking about possible designs for keyword implementations that > would reduce IO. I wondered if storing keywords in files were not a > design that suffers the same problems that mbox did. Not really. With mbox, any change to the status of any message in the mailbox necessitates the rewrite of the entire mbox file. Some optimizations were possible, but, on average, you'll need to rewrite half the file. Furthermore, if the imapd process was killed, you're left with a corrupted mbox file. That really is what kills mbox. It's not reliable and is subject to corruption. This is not applicable to Courier's keyword file due to the way the imapkeywords file is rewritten. Furthermore, the keywords metadata only needs to be updated whenever keywords are updated, which happens much less often then updates of message status. > What if, rather than the current implementation, each Maildir had an > "imapkeywords" directory. This directory may contain sub-directories > whose names represent the keywords that have been set on messages in the > Maildir. When a keyword is set on a message, a hard link (or symbolic > link?) is created in the appropriate directory; the name of the link > should be the same as the file containing the message, minus the flags > at the end of its name. So, a FETCH STATUS on each message now needs to stat() each keyword subdirectory, for the presence of the file. > I would imagine that building a list of files with keywords should go > something like: > * scan the keywords directories and create a list of keywords in the Maildir > * scan each keyword directory and create a hash containing the names of > message file links > * scan the cur/ directory for message files. For each one, check all of > the keyword hashes for a match against the file name minus the flags at > the end. If there is a match, record that the message was tagged with > that keyword. > > It should be possible to scan each directory using only readdir(), to > reduce the IO associated with calling stat() on an indefinite number of > message files. > > Do you think that such a design is possible, Sam? The problem with this is that, in practice: 1) you almost never need to retrieve the keywords of all messages in a folder, just the keywords set for a specific message 2) The overhead of this is somewhat higher than just reading a small number of files, and parsing them 3) The difference in overhead is magnified by the fact that you'll need to repeat the process with every NOOP command, which the client sends to request changes to the status of any message in the folder. My gut feeling is that this approach actually results in more I/O. Message filenames tend to be longer then keyword names. Given a message filename F, and keywords K(1)..K(n), in your proposal, the baseline datum that represents those keywords set for the message, excluding all other overhead, is length(F)*n. Each keyword directory stores filename F, so that's how many bytes there are to read. Right now, the baseline datum that represents the same keywords would be, approximately: length(F) + n*2, which is significantly less. Here's why. Here's a keyword file in one of my folders: $Label1 $Label3 1214751906.M126193P4806V0000000000000901I000000000 0220B54_0.commodore.email-scan.com,S=5657:1 1214752505.M593345P5048V0000000000000901I000000000 0237DE5_0.commodore.email-scan.com,S=3460:0 1214770505.M648167P19061V0000000000000901I00000000 00237DEE_0.commodore.email-scan.com,S=3550:0 1214958905.M700616P29551V0000000000000901I00000000 00237E6E_0.commodore.email-scan.com,S=2455:0 The keyword file lists the names of all the keywords once, and the keywords are assigned to messages by listing their index number, not name. In the example above, the first message has $Label3 set (keyword 1), and the rest have $Label1 set (keyword 0). When keywords are in heavy use, this is a very compact mechanism for saving the keyword metadata. The I/O issue, I believe, is really not due to how the keyword metadata is actually stored, but rather because of the overall logic. I made some tweaks to the internal logic in 4.4 which should result in less keyword-related I/O as a result of keyword updates. ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.p...r_id=100&url=/ _______________________________________________ courier-users mailing list courier-users@lists.sourceforge.net Unsubscribe: https://lists.sourceforge.net/lists/.../courier-users -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEABECAAYFAkh/zFMACgkQx9p3GYHlUOJOmwCeNh5jtezgk/cmuTyMlvv/Gg7m yQQAn2v6J0tg39wSwU4eQZqxsmA3I8nN =Id9H -----END PGP SIGNATURE----- |