This is a discussion on blobs or files using NDB cluster within the MySQL Database forums, part of the Database Forums category; Hi, In develloping a *large* high-availability website that will be storing lots of both images/video and text, what ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Hi,
In develloping a *large* high-availability website that will be storing lots of both images/video and text, what would be the best way to go about storing the images and video? I can see at least two options: a) store the images as BLOBs in the database and let the server app retrieve them and serve them to the client. b) store the images as individual files, and put in only http links to the files into the database, so the client browser will fetch them, this would leave millions of files in the filesystem though (performance?). Both will be used in a NDB environment, and is meant to be scalable to be storing extreme amounts of images (TBs) but will start out smaller. I am looking mainly for high speed in serving the images and scalability of the whole system. How does yahoo, google, youtube, myspace etc etc do it?? A secondary question: how do you store more data than can fit in a single node of an NDB cluster? i.e. if each node has a 1 TB drive, but you want to store 3 TBs? Thanks! Regards, Tobias |
|
|||
|
nsa.usa@gmail.com wrote:
> Hi, > > In develloping a *large* high-availability website that will be > storing lots of both images/video and text, what would be the best way > to go about storing the images and video? > I can see at least two options: > a) store the images as BLOBs in the database and let the server app > retrieve them and serve them to the client. > b) store the images as individual files, and put in only http links to > the files into the database, so the client browser will fetch them, > this would leave millions of files in the filesystem though > (performance?). > > Both will be used in a NDB environment, and is meant to be scalable to > be storing extreme amounts of images (TBs) but will start out smaller. > I am looking mainly for high speed in serving the images and > scalability of the whole system. > > How does yahoo, google, youtube, myspace etc etc do it?? > > A secondary question: how do you store more data than can fit in a > single node of an NDB cluster? i.e. if each node has a 1 TB drive, but > you want to store 3 TBs? > > Thanks! > Regards, > Tobias > Tobias, Some people will tell you to never store this information in a database. But before you believe them, I'd suggest you find out if they've even tried. I've been doing it for over 20 years (starting with documents using DB2 on IBM mainframes back in the 80's). It works quite well; databases are well suited for quick retrieval of such information. Databases are quite efficient at selecting from hundreds of thousands (or even millions) of rows. Can you imagine having that many files in one directory? Other advantages include ensuring data consistency (if you're using INNODB database) - you can ensure you can't delete the blob while it's still being referenced. You can't do that if they're stored in files. Backup is much easier. You can move the database to a different server without changing anything except the host name and not have to worry about NFS, etc. There are a lot of advantages to doing it this way. And people who claim retrieval is too slow either haven't tried it or don't know how to tune their system. -- ================== Remove the "x" from my email address Jerry Stuckle JDS Computer Training Corp. jstucklex@attglobal.net ================== |
|
|||
|
On Apr 22, 10:44 pm, Jerry Stuckle <jstuck...@attglobal.net> wrote:
> nsa....@gmail.com wrote: > > Hi, > > > In develloping a *large* high-availability website that will be > > storing lots of both images/video and text, what would be the best way > > to go about storing the images and video? > > I can see at least two options: > > a) store the images as BLOBs in the database and let the server app > > retrieve them and serve them to the client. > > b) store the images as individual files, and put in only http links to > > the files into the database, so the client browser will fetch them, > > this would leave millions of files in the filesystem though > > (performance?). > > > Both will be used in a NDB environment, and is meant to be scalable to > > be storing extreme amounts of images (TBs) but will start out smaller. > > I am looking mainly for high speed in serving the images and > > scalability of the whole system. > > > How does yahoo, google, youtube, myspace etc etc do it?? > > > A secondary question: how do you store more data than can fit in a > > single node of an NDB cluster? i.e. if each node has a 1 TB drive, but > > you want to store 3 TBs? > > > Thanks! > > Regards, > > Tobias > > Tobias, > > Some people will tell you to never store this information in a database. > But before you believe them, I'd suggest you find out if they've even > tried. > > I've been doing it for over 20 years (starting with documents using DB2 > on IBM mainframes back in the 80's). It works quite well; databases are > well suited for quick retrieval of such information. Databases are > quite efficient at selecting from hundreds of thousands (or even > millions) of rows. Can you imagine having that many files in one directory? > > Other advantages include ensuring data consistency (if you're using > INNODB database) - you can ensure you can't delete the blob while it's > still being referenced. You can't do that if they're stored in files. > Backup is much easier. You can move the database to a different server > without changing anything except the host name and not have to worry > about NFS, etc. > > There are a lot of advantages to doing it this way. And people who > claim retrieval is too slow either haven't tried it or don't know how to > tune their system. > > -- > ================== > Remove the "x" from my email address > Jerry Stuckle > JDS Computer Training Corp. > jstuck...@attglobal.net > ================== MySQL has (had?) a problem that caused the blob data to be loaded into memory even if the field was not being accessed. My solution is to have a simple table dedicated to the blobs (primary key and blob) and have the meta data in a seperate table with a foreign key to its blob data. You might also want to consider that you might need to resize/ thumbnail your images. Tigger |
|
|||
|
Tigger wrote:
> On Apr 22, 10:44 pm, Jerry Stuckle <jstuck...@attglobal.net> wrote: >> nsa....@gmail.com wrote: >>> Hi, >>> In develloping a *large* high-availability website that will be >>> storing lots of both images/video and text, what would be the best way >>> to go about storing the images and video? >>> I can see at least two options: >>> a) store the images as BLOBs in the database and let the server app >>> retrieve them and serve them to the client. >>> b) store the images as individual files, and put in only http links to >>> the files into the database, so the client browser will fetch them, >>> this would leave millions of files in the filesystem though >>> (performance?). >>> Both will be used in a NDB environment, and is meant to be scalable to >>> be storing extreme amounts of images (TBs) but will start out smaller. >>> I am looking mainly for high speed in serving the images and >>> scalability of the whole system. >>> How does yahoo, google, youtube, myspace etc etc do it?? >>> A secondary question: how do you store more data than can fit in a >>> single node of an NDB cluster? i.e. if each node has a 1 TB drive, but >>> you want to store 3 TBs? >>> Thanks! >>> Regards, >>> Tobias >> Tobias, >> >> Some people will tell you to never store this information in a database. >> But before you believe them, I'd suggest you find out if they've even >> tried. >> >> I've been doing it for over 20 years (starting with documents using DB2 >> on IBM mainframes back in the 80's). It works quite well; databases are >> well suited for quick retrieval of such information. Databases are >> quite efficient at selecting from hundreds of thousands (or even >> millions) of rows. Can you imagine having that many files in one directory? >> >> Other advantages include ensuring data consistency (if you're using >> INNODB database) - you can ensure you can't delete the blob while it's >> still being referenced. You can't do that if they're stored in files. >> Backup is much easier. You can move the database to a different server >> without changing anything except the host name and not have to worry >> about NFS, etc. >> >> There are a lot of advantages to doing it this way. And people who >> claim retrieval is too slow either haven't tried it or don't know how to >> tune their system. >> >> -- >> ================== >> Remove the "x" from my email address >> Jerry Stuckle >> JDS Computer Training Corp. >> jstuck...@attglobal.net >> ================== > > > MySQL has (had?) a problem that caused the blob data to be loaded into > memory even if the field was not being accessed. > > My solution is to have a simple table dedicated to the blobs (primary > key and blob) and have the meta data in a seperate table with a > foreign key to its blob data. > > You might also want to consider that you might need to resize/ > thumbnail your images. > > Tigger > Tigger, Good point, and one I should have mentioned. Yes, you are correct. But it's not just MySQL. Every RDBM I've seen reads the entire row into memory before selecting the columns to be returned to the application. And even if they didn't, they would have to read data from the disk in the block size being used by the file system. Putting the blob in a separate table significantly increases speed when not accessing the BLOB, while slowing the system almost unnoticeably when accessing the BLOB. -- ================== Remove the "x" from my email address Jerry Stuckle JDS Computer Training Corp. jstucklex@attglobal.net ================== |
|
|||
|
On Apr 22, 3:44 pm, Jerry Stuckle <jstuck...@attglobal.net> wrote:
> Tobias, > > Some people will tell you to never store this information in a database. > But before you believe them, I'd suggest you find out if they've even > tried. > > I've been doing it for over 20 years (starting with documents using DB2 > on IBM mainframes back in the 80's). It works quite well; databases are > well suited for quick retrieval of such information. Databases are > quite efficient at selecting from hundreds of thousands (or even > millions) of rows. Can you imagine having that many files in one directory? > > Other advantages include ensuring data consistency (if you're using > INNODB database) - you can ensure you can't delete theblobwhile it's > still being referenced. You can't do that if they're stored in files. > Backup is much easier. You can move the database to a different server > without changing anything except the host name and not have to worry > about NFS, etc. > > There are a lot of advantages to doing it this way. And people who > claim retrieval is too slow either haven't tried it or don't know how to > tune their system. > > -- > ================== > Remove the "x" from my email address > Jerry Stuckle > JDS Computer Training Corp. > jstuck...@attglobal.net > ================== Hi Jerry, Thanks for the info much appreciated. I like the idea of storing in the DB as it just seems neater, and millions of files in a FS needs some special considerations probably :-) I think I will try to do some testing with storing in the db and I'll post the results when I'm done. Regarding putting the images in its own table I would definently do this, I stumbled on 'vertical partitioning' and this seems to be it and it seems to be very necessary. Regarding my secondary question, if anybody's interrested I allready found the answer and its: horizontal partitioning, so a table can be split with some of the rows on a different FS. Regards, Tobias |
|
|||
|
On Apr 23, 10:15 am, Tigger <t...@grunt.tv> wrote:
> > MySQL has (had?) a problem that caused theblobdata to be loaded into > memory even if the field was not being accessed. > > My solution is to have a simple table dedicated to the blobs (primary > key andblob) and have the meta data in a seperate table with a > foreign key to itsblobdata. > > You might also want to consider that you might need to resize/ > thumbnail your images. > > Tigger Hi Tigger, Thanks. I think it still has this 'problem' but I don't think its considered a problem as such as the way to do it properly apparently is how you describe. See also my answer to Jerry. What do you mean with the resizing? This makes me wonder if there is a performance penalty if the images have different filesizes? Should I put f.ex. all 1024x768 images in one table and all 640x480 images in a different table and so on? would this make any difference or no difference? hmm... Cheers, Tobias |
|
|||
|
On Apr 24, 2:42 am, nsa....@gmail.com wrote:
> On Apr 23, 10:15 am, Tigger <t...@grunt.tv> wrote: > > > > > MySQL has (had?) a problem that caused theblobdata to be loaded into > > memory even if the field was not being accessed. > > > My solution is to have a simple table dedicated to the blobs (primary > > key andblob) and have the meta data in a seperate table with a > > foreign key to itsblobdata. > > > You might also want to consider that you might need to resize/ > > thumbnail your images. > > > Tigger > > Hi Tigger, > > Thanks. I think it still has this 'problem' but I don't think its > considered a problem as such as the way to do it properly apparently > is how you describe. See also my answer to Jerry. > > What do you mean with the resizing? This makes me wonder if there is a > performance penalty if the images have different filesizes? Should I > put f.ex. all 1024x768 images in one table and all 640x480 images in a > different table and so on? would this make any difference or no > difference? hmm... > > Cheers, > Tobias I've just implemented an image provider that uses images stored in a database. One issue we've always had which I tried to address with this implementation was that the client had to process and upload multiple versions of an images to cover each use. e.g. thumbnails, icons and other sizes. And then the designer would change the sizes on us!!! In this implementation I let them upload original unprocessed image and the system creates the variations for them. I did this by creating a data structure that allowed me to define "image styles" and to store versions of the image related to each style. A style would define dimensions and sizing rules. Not sure on the use of different tables for different sizes. If you have fixed, pre-defined sizes it might be worth doing that anyhow to reduce the number of rows on each table. How are you planning on feeding the images to the website? I'm using custom 404 redirect handling (IIS & asp) which streams data from a back end HttpHandler (.Net). This cannot be fast but I'm restricted to the user<->asp<->.Net<->database architecture! I could skip the asp part but its realy important that these images have urls from the correct domain and are friendly to search engines and administrators. |