This is a discussion on which is the better option for directory hashing to store large number of image files? within the PHP Language forums, part of the PHP Programming Forums category; Hi All, I am not sure if this is the right place to ask this question but i am very ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Hi All,
I am not sure if this is the right place to ask this question but i am very sure you may have faced this problem, i have already found some post related to this but not the answer i am looking for. My problem is that i have to upload images and store them. I am using filesystem for that. setup is something like this, their will be items/groups/user each can have upto 6 images which needs to be scaled to 4 different sizes ie every item can have upto 24 images of varying sizes. now the standard way of storing these files would be to store them in subdirectories based on some hash. my partial solution is to split the four types of files into four fixed base folders for each dimension, since filename is in format "YmdHis" i decided to use directory structure as Y/m/d/<filename>. but i realize that even this could be inefficient. so now i am thinking about going one more level by creating Y/m/d/H/i/ <filename> directory structure. now my question is how to go about creating subdirectories below base folders, will my scheme hold or should i use md5 hash as suggested by others, over the filename and then take 2-3 characters and create one or two level of directory structure and then store the files? Regards, Amit |
|
|||
|
theCancerus wrote:
> Hi All, > > I am not sure if this is the right place to ask this question but i am > very sure you may have faced this problem, i have already found some > post related to this but not the answer i am looking for. > > My problem is that i have to upload images and store them. I am using > filesystem for that. > > setup is something like this, their will be items/groups/user each can > have upto 6 images which needs to be scaled to 4 different sizes ie > every item can have upto 24 images of varying sizes. > > now the standard way of storing these files would be to store them in > subdirectories based on some hash. > > my partial solution is to split the four types of files into four > fixed base folders for each dimension, > > since filename is in format "YmdHis" i decided to use directory > structure as Y/m/d/<filename>. > but i realize that even this could be inefficient. > > so now i am thinking about going one more level by creating Y/m/d/H/i/ > <filename> directory structure. > > now my question is how to go about creating subdirectories below base > folders, will my scheme hold or should i use md5 hash as suggested by > others, over the filename and then take 2-3 characters and create one > or two level of directory structure and then store the files? > > Regards, > Amit > I use databases for this. -- ================== Remove the "x" from my email address Jerry Stuckle JDS Computer Training Corp. jstucklex@attglobal.net ================== |
|
|||
|
I personally use something like /images/front/controller/row_id/ -
that way I can only store the name of the image. On Sep 17, 2:49 pm, Jerry Stuckle <jstuck...@attglobal.net> wrote: > theCancerus wrote: > > Hi All, > > > I am not sure if this is the right place to ask this question but i am > > very sure you may have faced this problem, i have already found some > > post related to this but not the answer i am looking for. > > > My problem is that i have to upload images and store them. I am using > > filesystem for that. > > > setup is something like this, their will be items/groups/user each can > > have upto 6 images which needs to be scaled to 4 different sizes ie > > every item can have upto 24 images of varying sizes. > > > now the standard way of storing these files would be to store them in > > subdirectories based on some hash. > > > my partial solution is to split the four types of files into four > > fixed base folders for each dimension, > > > since filename is in format "YmdHis" i decided to use directory > > structure as Y/m/d/<filename>. > > but i realize that even this could be inefficient. > > > so now i am thinking about going one more level by creating Y/m/d/H/i/ > > <filename> directory structure. > > > now my question is how to go about creating subdirectories below base > > folders, will my scheme hold or should i use md5 hash as suggested by > > others, over the filename and then take 2-3 characters and create one > > or two level of directory structure and then store the files? > > > Regards, > > Amit > > I use databases for this. > > -- > ================== > Remove the "x" from my email address > Jerry Stuckle > JDS Computer Training Corp. > jstuck...@attglobal.net > ================== |
|
|||
|
I didn't understand what you were asking at first, but I think I do now.
What I would do in your is to use a combination of file structure and database entries. The first question you need to ask yourself is how will you typically be accessing these files. For example, I store a list of images for a given order. In that case, I create a folder under photos with the name of the order number. I then place the images in that folder -- but then I only plan to access it via order number. That is simple. What you see to need is a multiple method of finding the files. They might be between certain dates, certain owners, certain names, etc. In that case you would want to put all those as fields in a database table and have the full file name (including path) in another field. You would search the database however you wish and that would yield [near] immediate access to the file location. Moral: Programming, as well as life, is not always an either-or. Sometimes a compromise/hybrid is the best solution. -- Shelly "NoDude" <nodude@gmail.com> wrote in message news:1190035565.874728.46600@50g2000hsm.googlegrou ps.com... >I personally use something like /images/front/controller/row_id/ - > that way I can only store the name of the image. > > On Sep 17, 2:49 pm, Jerry Stuckle <jstuck...@attglobal.net> wrote: >> theCancerus wrote: >> > Hi All, >> >> > I am not sure if this is the right place to ask this question but i am >> > very sure you may have faced this problem, i have already found some >> > post related to this but not the answer i am looking for. >> >> > My problem is that i have to upload images and store them. I am using >> > filesystem for that. >> >> > setup is something like this, their will be items/groups/user each can >> > have upto 6 images which needs to be scaled to 4 different sizes ie >> > every item can have upto 24 images of varying sizes. >> >> > now the standard way of storing these files would be to store them in >> > subdirectories based on some hash. >> >> > my partial solution is to split the four types of files into four >> > fixed base folders for each dimension, >> >> > since filename is in format "YmdHis" i decided to use directory >> > structure as Y/m/d/<filename>. >> > but i realize that even this could be inefficient. >> >> > so now i am thinking about going one more level by creating Y/m/d/H/i/ >> > <filename> directory structure. >> >> > now my question is how to go about creating subdirectories below base >> > folders, will my scheme hold or should i use md5 hash as suggested by >> > others, over the filename and then take 2-3 characters and create one >> > or two level of directory structure and then store the files? >> >> > Regards, >> > Amit >> >> I use databases for this. >> >> -- >> ================== >> Remove the "x" from my email address >> Jerry Stuckle >> JDS Computer Training Corp. >> jstuck...@attglobal.net >> ================== > > |
|
|||
|
> Moral: Programming, as well as life, is not always an either-or.
> Sometimes a compromise/hybrid is the best solution. > > -- > Shelly ahhh, but shelly, the thing i like most is that in programming, it is always either/or: on/off. to say otherwise is to not know programming. the same holds true for life. you either do or do not. any notions about the nobility or superiority of human action in his contemplation of life are simply false, save the fact that there is none of either. do or do not is all that remains and that directly linked to his own survivability - as is the impetous of all animals. compromise. chuckle. |
|
|||
|
"Steve" <no.one@example.com> wrote in message news:3rwHi.805$3C.788@newsfe05.lga... >> Moral: Programming, as well as life, is not always an either-or. >> Sometimes a compromise/hybrid is the best solution. >> >> -- >> Shelly > > ahhh, but shelly, the thing i like most is that in programming, it is > always either/or: on/off. to say otherwise is to not know programming. the > same holds true for life. you either do or do not. any notions about the > nobility or superiority of human action in his contemplation of life are > simply false, save the fact that there is none of either. do or do not is > all that remains and that directly linked to his own survivability - as is > the impetous of all animals. > > compromise. chuckle. So, I take it that if you fed a meal which is a wonderfully prepared, 10 pound, filet mignon you either (a) eat all of it or (b) eat none of it? or, If you are faced with a court appearance for excessive speeding in your car you should either be acquitted or should get the death sentence? On one project about 25 years ago I needed to modify a very large application that was written in Fortran. I needed dynamic allocation. According to you, I should have been faced with two choices. One was to emulate dynamic allocation by setting aside a large part of memory and doing my own allocation from that memory heap. A second would have been to totally rewrite that entire (largggggeeeee) application in C. I chose a "compromise". I wrote a small module in C and used that in conjunction with the rest of the Fortran code. The point here is that there are two extremes in handling his situation. Either avoid a database and just use the file system, or avoid the file system and put all of the contents of the file into a blob field in the database. Often, the better way is to use the database as a rapid search engine for a file in the file system. I guess you aren't married? I have been for over four decades. Believe me, "all or nothing" just doesn't work. Even with a swich for the lights you can always add a dimmer. By the way, I have been programming four over forty years. We are not talking ones and zeros, true or false, here. We are talking design philosophy -- and that if usually a compromise among various alternatives to achieve the most efficient results in the shortest time for the least cost. Shelly |
|
|||
|
"Shelly" <sheldonlg.news@asap-consult.com> wrote in message news:13etcs11ug57rb6@corp.supernews.com... > > "Steve" <no.one@example.com> wrote in message > news:3rwHi.805$3C.788@newsfe05.lga... >>> Moral: Programming, as well as life, is not always an either-or. >>> Sometimes a compromise/hybrid is the best solution. >>> >>> -- >>> Shelly >> >> ahhh, but shelly, the thing i like most is that in programming, it is >> always either/or: on/off. to say otherwise is to not know programming. >> the same holds true for life. you either do or do not. any notions about >> the nobility or superiority of human action in his contemplation of life >> are simply false, save the fact that there is none of either. do or do >> not is all that remains and that directly linked to his own >> survivability - as is the impetous of all animals. >> >> compromise. chuckle. > > So, I take it that if you fed a meal which is a wonderfully prepared, 10 > pound, filet mignon you either (a) eat all of it or (b) eat none of it? no, i'd eat enough so that i was sustained - not so much that i could not defend myself if attacked, or so much that i could not drink, or so much that i could not shelter myself. i would eat what was appropriate for my survival. if it were rotted, yet wonderfully prepaired, i probably wouldn't eat it because i would become ill. all of which affects my survivability. > or, > > If you are faced with a court appearance for excessive speeding in your > car you should either be acquitted or should get the death sentence? i should not speed if i don't like the consequences. however, your example is completely non sequitur, as my appearance in court is not tied to the judgement in the sentence. but in order to indulge, if the court deems acquittal or death, it will do so based on the circumstances and how my actions effected the survivability (well being) of the group under which the judge(s) serve(s). > On one project about 25 years ago I needed to modify a very large > application that was written in Fortran. I needed dynamic allocation. > According to you, I should have been faced with two choices. One was to > emulate dynamic allocation by setting aside a large part of memory and > doing my own allocation from that memory heap. A second would have been > to totally rewrite that entire (largggggeeeee) application in C. I chose > a "compromise". I wrote a small module in C and used that in conjunction > with the rest of the Fortran code. according to me? your options are your options. you made a choice. that choice did not involve programming. it involved architecture. if you chose to emulate dynamic allocation, you would have done so concretely and there would be no compromise, no choice in how that code was interpreted by the server. your instructions would have been "either or", not "when you feel like it". even bugs or the omission of logic are concrete and predictable if the inputs are known. 3/4 of the code i write (or don't write, specifically) are from logical omissions; handling only what i must in order to get inputs where they can either be thrown out or processed. > The point here is that there are two extremes in handling his situation. > Either avoid a database and just use the file system, or avoid the file > system and put all of the contents of the file into a blob field in the > database. Often, the better way is to use the database as a rapid search > engine for a file in the file system. choices, whether deemed extreme or simple, are still just options. when you program, you do so concretely. > I guess you aren't married? I have been for over four decades. Believe > me, "all or nothing" just doesn't work. Even with a swich for the lights > you can always add a dimmer. my marital status has no bearing on my thought processes. if you've "compromised" on who you are or in what you believe because you decided to take a spouse, you ought to have demanded more from your spouse...and your life. again though, my choices (all of them) should be concrete regarless of how many options there are. whether i account my spouse into the equation of which i shall select, the one chosen will most definitely be from selfishness born of survival - what is in my best interest. hell, "selfless" acts are the most overtly selfish acts of all, endearing the actor to his society and thus making his likelihood to survive all the more certain - and if dead because of such an act, marked in that culture's history...extending his 'life' much further than if he'd have led a 'normal' life. > By the way, I have been programming four over forty years. We are not > talking ones and zeros, true or false, here. We are talking design > philosophy -- and that if usually a compromise among various alternatives > to achieve the most efficient results in the shortest time for the least > cost. oh but we are talking about ones and zeros. that's programming. design is about options, not the act of programming itself. but just like design and life, there are always alternatives. whatever the context, seeing the presence or blending of options as a compromise is a faulty premise/perspective, one from which the best advantages thereof are often overlooked. my point: programming is vastly different than life. it is completely black and white, sharing only with it an array of perspectives from which it will be engaged...ultimately leaving a single mark in one of two states; a one or a zero, do or do not. |
|
|||
|
On Mon, 17 Sep 2007 00:09:14 -0700, theCancerus <thecancerus@gmail.com> wrote:
>My problem is that i have to upload images and store them. I am using >filesystem for that. > >setup is something like this, their will be items/groups/user each can >have upto 6 images which needs to be scaled to 4 different sizes ie >every item can have upto 24 images of varying sizes. > >now the standard way of storing these files would be to store them in >subdirectories based on some hash. > >my partial solution is to split the four types of files into four >fixed base folders for each dimension, > >since filename is in format "YmdHis" i decided to use directory >structure as Y/m/d/<filename>. >but i realize that even this could be inefficient. > >so now i am thinking about going one more level by creating Y/m/d/H/i/ ><filename> directory structure. > >now my question is how to go about creating subdirectories below base >folders, will my scheme hold or should i use md5 hash as suggested by >others, over the filename and then take 2-3 characters and create one >or two level of directory structure and then store the files? Splitting the files by date (down to whatever resolution) is potentially still susceptible to a large number arriving at the same time, and ending up with a large number of files in a single directory. If the goal is to spread the files across a number of directories, then you probably want the value that determines the directories to be approximately randomly distributed, and to have a bounded and resonable number of possible directory names. md5 of some property (name? or even contents?) likely fits this reasonably well. The number of bytes you use for subdirectories depends on however many images you have. If you don't actually expose the hash-used-for-storage-directory in the URL, then you're free to re-hash the images' directories if you end up needing more levels to split the directories (if it was in the URL, then it would change the URLs of all your images, which is something to be avoided). Substrings of just the name may work as well, although there could be a bias to particular letters or numbers depending on where the names come from and what language they're in. There's more than one way to do it, as ever, and the way to go depends on what exactly you're doing. Have you checked whether your initial assumption is true, though? Whilst "large number of entries in a directory is slow" is true in many filesystems, it's not a universal truth. What's the threshold for your filesystem, and are you planning on getting anywhere close to it in the forseeable future? (after overestimating it a bit to be safely pessimistic) -- Andy Hassall :: andy@andyh.co.uk :: http://www.andyh.co.uk http://www.andyhsoftware.co.uk/space :: disk and FTP usage analysis tool |
|
|||
|
Shelly wrote:
> "Steve" <no.one@example.com> wrote in message > news:3rwHi.805$3C.788@newsfe05.lga... >>> Moral: Programming, as well as life, is not always an either-or. >>> Sometimes a compromise/hybrid is the best solution. >>> >>> -- >>> Shelly >> ahhh, but shelly, the thing i like most is that in programming, it is >> always either/or: on/off. to say otherwise is to not know programming. the >> same holds true for life. you either do or do not. any notions about the >> nobility or superiority of human action in his contemplation of life are >> simply false, save the fact that there is none of either. do or do not is >> all that remains and that directly linked to his own survivability - as is >> the impetous of all animals. >> >> compromise. chuckle. > > So, I take it that if you fed a meal which is a wonderfully prepared, 10 > pound, filet mignon you either (a) eat all of it or (b) eat none of it? > (a). (b) is not even an option! > or, > > If you are faced with a court appearance for excessive speeding in your car > you should either be acquitted or should get the death sentence? > No, but I should either be acquitted or found guilty. And if found guilty, I should receive the appropriate punishment. The death sentence is not appropriate for all infractions. > On one project about 25 years ago I needed to modify a very large > application that was written in Fortran. I needed dynamic allocation. > According to you, I should have been faced with two choices. One was to > emulate dynamic allocation by setting aside a large part of memory and doing > my own allocation from that memory heap. A second would have been to > totally rewrite that entire (largggggeeeee) application in C. I chose a > "compromise". I wrote a small module in C and used that in conjunction with > the rest of the Fortran code. > What is your point? > The point here is that there are two extremes in handling his situation. > Either avoid a database and just use the file system, or avoid the file > system and put all of the contents of the file into a blob field in the > database. Often, the better way is to use the database as a rapid search > engine for a file in the file system. > Sure, there are extremes. But have you actually tried storing the data in a blob field and tuning your database for it? I thought not. Access is quite fast - virtually always faster than a mix of the two, because you don't have to make both a database and a file system call. Less overhead - the database returns the blob just as effectively as it does a file name. > I guess you aren't married? I have been for over four decades. Believe me, > "all or nothing" just doesn't work. Even with a swich for the lights you > can always add a dimmer. > Sure it does. If I don't let my wife have her own way ALL the time, I get "nothing". :-) > By the way, I have been programming four over forty years. We are not > talking ones and zeros, true or false, here. We are talking design > philosophy -- and that if usually a compromise among various alternatives to > achieve the most efficient results in the shortest time for the least cost. > > Shelly > > Sure we are. Everything in programming comes down to ones and zeros. It's just the approach to getting there that differs. -- ================== Remove the "x" from my email address Jerry Stuckle JDS Computer Training Corp. jstucklex@attglobal.net ================== |
|
|||
|
On Sep 17, 11:29 pm, Andy Hassall <a...@andyh.co.uk> wrote:
> On Mon, 17 Sep 2007 00:09:14 -0700, theCancerus <thecance...@gmail.com> wrote: > >My problem is that i have to upload images and store them. I am using > >filesystem for that. > > >setup is something like this, their will be items/groups/user each can > >have upto 6 images which needs to be scaled to 4 different sizes ie > >every item can have upto 24 images of varying sizes. > > >now the standard way of storing these files would be to store them in > >subdirectories based on some hash. > > >my partial solution is to split the four types of files into four > >fixed base folders for each dimension, > > >since filename is in format "YmdHis" i decided to use directory > >structure as Y/m/d/<filename>. > >but i realize that even this could be inefficient. > > >so now i am thinking about going one more level by creating Y/m/d/H/i/ > ><filename> directory structure. > > >now my question is how to go about creating subdirectories below base > >folders, will my scheme hold or should i use md5 hash as suggested by > >others, over the filename and then take 2-3 characters and create one > >or two level of directory structure and then store the files? > > Splitting the files by date (down to whatever resolution) is potentially still > susceptible to a large number arriving at the same time, and ending up with a > large number of files in a single directory. If the goal is to spread the files > across a number of directories, then you probably want the value that > determines the directories to be approximately randomly distributed, and to > have a bounded and resonable number of possible directory names. > > md5 of some property (name? or even contents?) likely fits this reasonably > well. The number of bytes you use for subdirectories depends on however many > images you have. If you don't actually expose the > hash-used-for-storage-directory in the URL, then you're free to re-hash the > images' directories if you end up needing more levels to split the directories > (if it was in the URL, then it would change the URLs of all your images, which > is something to be avoided). > > Substrings of just the name may work as well, although there could be a bias > to particular letters or numbers depending on where the names come from and > what language they're in. > > There's more than one way to do it, as ever, and the way to go depends on what > exactly you're doing. Have you checked whether your initial assumption is true, > though? Whilst "large number of entries in a directory is slow" is true in many > filesystems, it's not a universal truth. What's the threshold for your > filesystem, and are you planning on getting anywhere close to it in the > forseeable future? (after overestimating it a bit to be safely pessimistic) > > -- > Andy Hassall :: a...@andyh.co.uk ::http://www.andyh.co.ukhttp://www.and....co.uk/space:: disk and FTP usage analysis tool hi Andy, thanks for sensible reply. we need to upload around 2.5 million images as seed data for the website. we are using linux system(centos ) so any ideas what would be the reasonable number of files per directory? and unless thousands of users want to upload images at the same time i am sure it will never happen that their are large number of files in one directory every minute. anyways i have decided to go with MD5 as 3/3 leter combination gives me good spread for long time :) |