Bluehost.com Web Hosting $6.95

blobs or files using NDB cluster

This is a discussion on blobs or files using NDB cluster within the MySQL Database forums, part of the Database Forums category; Hi, In develloping a *large* high-availability website that will be storing lots of both images/video and text, what ...


Go Back   Usenet Forums > Database Forums > MySQL Database

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-22-2007
nsa.usa@gmail.com
 
Posts: n/a
Default blobs or files using NDB cluster

Hi,

In develloping a *large* high-availability website that will be
storing lots of both images/video and text, what would be the best way
to go about storing the images and video?
I can see at least two options:
a) store the images as BLOBs in the database and let the server app
retrieve them and serve them to the client.
b) store the images as individual files, and put in only http links to
the files into the database, so the client browser will fetch them,
this would leave millions of files in the filesystem though
(performance?).

Both will be used in a NDB environment, and is meant to be scalable to
be storing extreme amounts of images (TBs) but will start out smaller.
I am looking mainly for high speed in serving the images and
scalability of the whole system.

How does yahoo, google, youtube, myspace etc etc do it??

A secondary question: how do you store more data than can fit in a
single node of an NDB cluster? i.e. if each node has a 1 TB drive, but
you want to store 3 TBs?

Thanks!
Regards,
Tobias

Reply With Quote
  #2 (permalink)  
Old 04-22-2007
Jerry Stuckle
 
Posts: n/a
Default Re: blobs or files using NDB cluster

nsa.usa@gmail.com wrote:
> Hi,
>
> In develloping a *large* high-availability website that will be
> storing lots of both images/video and text, what would be the best way
> to go about storing the images and video?
> I can see at least two options:
> a) store the images as BLOBs in the database and let the server app
> retrieve them and serve them to the client.
> b) store the images as individual files, and put in only http links to
> the files into the database, so the client browser will fetch them,
> this would leave millions of files in the filesystem though
> (performance?).
>
> Both will be used in a NDB environment, and is meant to be scalable to
> be storing extreme amounts of images (TBs) but will start out smaller.
> I am looking mainly for high speed in serving the images and
> scalability of the whole system.
>
> How does yahoo, google, youtube, myspace etc etc do it??
>
> A secondary question: how do you store more data than can fit in a
> single node of an NDB cluster? i.e. if each node has a 1 TB drive, but
> you want to store 3 TBs?
>
> Thanks!
> Regards,
> Tobias
>


Tobias,

Some people will tell you to never store this information in a database.
But before you believe them, I'd suggest you find out if they've even
tried.

I've been doing it for over 20 years (starting with documents using DB2
on IBM mainframes back in the 80's). It works quite well; databases are
well suited for quick retrieval of such information. Databases are
quite efficient at selecting from hundreds of thousands (or even
millions) of rows. Can you imagine having that many files in one directory?

Other advantages include ensuring data consistency (if you're using
INNODB database) - you can ensure you can't delete the blob while it's
still being referenced. You can't do that if they're stored in files.
Backup is much easier. You can move the database to a different server
without changing anything except the host name and not have to worry
about NFS, etc.

There are a lot of advantages to doing it this way. And people who
claim retrieval is too slow either haven't tried it or don't know how to
tune their system.


--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
Reply With Quote
  #3 (permalink)  
Old 04-23-2007
Tigger
 
Posts: n/a
Default Re: blobs or files using NDB cluster

On Apr 22, 10:44 pm, Jerry Stuckle <jstuck...@attglobal.net> wrote:
> nsa....@gmail.com wrote:
> > Hi,

>
> > In develloping a *large* high-availability website that will be
> > storing lots of both images/video and text, what would be the best way
> > to go about storing the images and video?
> > I can see at least two options:
> > a) store the images as BLOBs in the database and let the server app
> > retrieve them and serve them to the client.
> > b) store the images as individual files, and put in only http links to
> > the files into the database, so the client browser will fetch them,
> > this would leave millions of files in the filesystem though
> > (performance?).

>
> > Both will be used in a NDB environment, and is meant to be scalable to
> > be storing extreme amounts of images (TBs) but will start out smaller.
> > I am looking mainly for high speed in serving the images and
> > scalability of the whole system.

>
> > How does yahoo, google, youtube, myspace etc etc do it??

>
> > A secondary question: how do you store more data than can fit in a
> > single node of an NDB cluster? i.e. if each node has a 1 TB drive, but
> > you want to store 3 TBs?

>
> > Thanks!
> > Regards,
> > Tobias

>
> Tobias,
>
> Some people will tell you to never store this information in a database.
> But before you believe them, I'd suggest you find out if they've even
> tried.
>
> I've been doing it for over 20 years (starting with documents using DB2
> on IBM mainframes back in the 80's). It works quite well; databases are
> well suited for quick retrieval of such information. Databases are
> quite efficient at selecting from hundreds of thousands (or even
> millions) of rows. Can you imagine having that many files in one directory?
>
> Other advantages include ensuring data consistency (if you're using
> INNODB database) - you can ensure you can't delete the blob while it's
> still being referenced. You can't do that if they're stored in files.
> Backup is much easier. You can move the database to a different server
> without changing anything except the host name and not have to worry
> about NFS, etc.
>
> There are a lot of advantages to doing it this way. And people who
> claim retrieval is too slow either haven't tried it or don't know how to
> tune their system.
>
> --
> ==================
> Remove the "x" from my email address
> Jerry Stuckle
> JDS Computer Training Corp.
> jstuck...@attglobal.net
> ==================



MySQL has (had?) a problem that caused the blob data to be loaded into
memory even if the field was not being accessed.

My solution is to have a simple table dedicated to the blobs (primary
key and blob) and have the meta data in a seperate table with a
foreign key to its blob data.

You might also want to consider that you might need to resize/
thumbnail your images.

Tigger

Reply With Quote
  #4 (permalink)  
Old 04-23-2007
Jerry Stuckle
 
Posts: n/a
Default Re: blobs or files using NDB cluster

Tigger wrote:
> On Apr 22, 10:44 pm, Jerry Stuckle <jstuck...@attglobal.net> wrote:
>> nsa....@gmail.com wrote:
>>> Hi,
>>> In develloping a *large* high-availability website that will be
>>> storing lots of both images/video and text, what would be the best way
>>> to go about storing the images and video?
>>> I can see at least two options:
>>> a) store the images as BLOBs in the database and let the server app
>>> retrieve them and serve them to the client.
>>> b) store the images as individual files, and put in only http links to
>>> the files into the database, so the client browser will fetch them,
>>> this would leave millions of files in the filesystem though
>>> (performance?).
>>> Both will be used in a NDB environment, and is meant to be scalable to
>>> be storing extreme amounts of images (TBs) but will start out smaller.
>>> I am looking mainly for high speed in serving the images and
>>> scalability of the whole system.
>>> How does yahoo, google, youtube, myspace etc etc do it??
>>> A secondary question: how do you store more data than can fit in a
>>> single node of an NDB cluster? i.e. if each node has a 1 TB drive, but
>>> you want to store 3 TBs?
>>> Thanks!
>>> Regards,
>>> Tobias

>> Tobias,
>>
>> Some people will tell you to never store this information in a database.
>> But before you believe them, I'd suggest you find out if they've even
>> tried.
>>
>> I've been doing it for over 20 years (starting with documents using DB2
>> on IBM mainframes back in the 80's). It works quite well; databases are
>> well suited for quick retrieval of such information. Databases are
>> quite efficient at selecting from hundreds of thousands (or even
>> millions) of rows. Can you imagine having that many files in one directory?
>>
>> Other advantages include ensuring data consistency (if you're using
>> INNODB database) - you can ensure you can't delete the blob while it's
>> still being referenced. You can't do that if they're stored in files.
>> Backup is much easier. You can move the database to a different server
>> without changing anything except the host name and not have to worry
>> about NFS, etc.
>>
>> There are a lot of advantages to doing it this way. And people who
>> claim retrieval is too slow either haven't tried it or don't know how to
>> tune their system.
>>
>> --
>> ==================
>> Remove the "x" from my email address
>> Jerry Stuckle
>> JDS Computer Training Corp.
>> jstuck...@attglobal.net
>> ==================

>
>
> MySQL has (had?) a problem that caused the blob data to be loaded into
> memory even if the field was not being accessed.
>
> My solution is to have a simple table dedicated to the blobs (primary
> key and blob) and have the meta data in a seperate table with a
> foreign key to its blob data.
>
> You might also want to consider that you might need to resize/
> thumbnail your images.
>
> Tigger
>


Tigger,

Good point, and one I should have mentioned.

Yes, you are correct. But it's not just MySQL. Every RDBM I've seen
reads the entire row into memory before selecting the columns to be
returned to the application. And even if they didn't, they would have
to read data from the disk in the block size being used by the file system.

Putting the blob in a separate table significantly increases speed when
not accessing the BLOB, while slowing the system almost unnoticeably
when accessing the BLOB.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
Reply With Quote
  #5 (permalink)  
Old 04-23-2007
nsa.usa@gmail.com
 
Posts: n/a
Default Re: blobs or files using NDB cluster

On Apr 22, 3:44 pm, Jerry Stuckle <jstuck...@attglobal.net> wrote:

> Tobias,
>
> Some people will tell you to never store this information in a database.
> But before you believe them, I'd suggest you find out if they've even
> tried.
>
> I've been doing it for over 20 years (starting with documents using DB2
> on IBM mainframes back in the 80's). It works quite well; databases are
> well suited for quick retrieval of such information. Databases are
> quite efficient at selecting from hundreds of thousands (or even
> millions) of rows. Can you imagine having that many files in one directory?
>
> Other advantages include ensuring data consistency (if you're using
> INNODB database) - you can ensure you can't delete theblobwhile it's
> still being referenced. You can't do that if they're stored in files.
> Backup is much easier. You can move the database to a different server
> without changing anything except the host name and not have to worry
> about NFS, etc.
>
> There are a lot of advantages to doing it this way. And people who
> claim retrieval is too slow either haven't tried it or don't know how to
> tune their system.
>
> --
> ==================
> Remove the "x" from my email address
> Jerry Stuckle
> JDS Computer Training Corp.
> jstuck...@attglobal.net
> ==================


Hi Jerry,

Thanks for the info much appreciated. I like the idea of storing in
the DB as it just seems neater, and millions of files in a FS needs
some special considerations probably :-)
I think I will try to do some testing with storing in the db and I'll
post the results when I'm done.
Regarding putting the images in its own table I would definently do
this, I stumbled on 'vertical partitioning' and this seems to be it
and it seems to be very necessary.
Regarding my secondary question, if anybody's interrested I allready
found the answer and its: horizontal partitioning, so a table can be
split with some of the rows on a different FS.

Regards,
Tobias

Reply With Quote
  #6 (permalink)  
Old 04-23-2007
nsa.usa@gmail.com
 
Posts: n/a
Default Re: blobs or files using NDB cluster

On Apr 23, 10:15 am, Tigger <t...@grunt.tv> wrote:

>
> MySQL has (had?) a problem that caused theblobdata to be loaded into
> memory even if the field was not being accessed.
>
> My solution is to have a simple table dedicated to the blobs (primary
> key andblob) and have the meta data in a seperate table with a
> foreign key to itsblobdata.
>
> You might also want to consider that you might need to resize/
> thumbnail your images.
>
> Tigger


Hi Tigger,

Thanks. I think it still has this 'problem' but I don't think its
considered a problem as such as the way to do it properly apparently
is how you describe. See also my answer to Jerry.

What do you mean with the resizing? This makes me wonder if there is a
performance penalty if the images have different filesizes? Should I
put f.ex. all 1024x768 images in one table and all 640x480 images in a
different table and so on? would this make any difference or no
difference? hmm...

Cheers,
Tobias

Reply With Quote
  #7 (permalink)  
Old 04-24-2007
Tigger
 
Posts: n/a
Default Re: blobs or files using NDB cluster

On Apr 24, 2:42 am, nsa....@gmail.com wrote:
> On Apr 23, 10:15 am, Tigger <t...@grunt.tv> wrote:
>
>
>
> > MySQL has (had?) a problem that caused theblobdata to be loaded into
> > memory even if the field was not being accessed.

>
> > My solution is to have a simple table dedicated to the blobs (primary
> > key andblob) and have the meta data in a seperate table with a
> > foreign key to itsblobdata.

>
> > You might also want to consider that you might need to resize/
> > thumbnail your images.

>
> > Tigger

>
> Hi Tigger,
>
> Thanks. I think it still has this 'problem' but I don't think its
> considered a problem as such as the way to do it properly apparently
> is how you describe. See also my answer to Jerry.
>
> What do you mean with the resizing? This makes me wonder if there is a
> performance penalty if the images have different filesizes? Should I
> put f.ex. all 1024x768 images in one table and all 640x480 images in a
> different table and so on? would this make any difference or no
> difference? hmm...
>
> Cheers,
> Tobias


I've just implemented an image provider that uses images stored in a
database. One issue we've always had which I tried to address with
this implementation was that the client had to process and upload
multiple versions of an images to cover each use. e.g. thumbnails,
icons and other sizes. And then the designer would change the sizes on
us!!!

In this implementation I let them upload original unprocessed image
and the system creates the variations for them. I did this by creating
a data structure that allowed me to define "image styles" and to store
versions of the image related to each style. A style would define
dimensions and sizing rules.

Not sure on the use of different tables for different sizes. If you
have fixed, pre-defined sizes it might be worth doing that anyhow to
reduce the number of rows on each table.

How are you planning on feeding the images to the website?

I'm using custom 404 redirect handling (IIS & asp) which streams data
from a back end HttpHandler (.Net). This cannot be fast but I'm
restricted to the user<->asp<->.Net<->database architecture!

I could skip the asp part but its realy important that these images
have urls from the correct domain and are friendly to search engines
and administrators.

Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT +1. The time now is 09:59 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0