Application that generates 162 million records per day

This is a discussion on Application that generates 162 million records per day within the MySQL Database forums, part of the Database Forums category; What's your take on an application that generates a minimum of 162 million records per day? I need your ...


Go Back   Usenet Forums > Database Forums > MySQL Database

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 12-09-2006
Erol Fornoles
 
Posts: n/a
Default Application that generates 162 million records per day

What's your take on an application that generates a minimum of 162
million records per day? I need your opinion with regards to this
because the app I'm developing behaves exactly like that.

Reply With Quote
  #2 (permalink)  
Old 12-09-2006
Michael Austin
 
Posts: n/a
Default Re: Application that generates 162 million records per day

Erol Fornoles wrote:

> What's your take on an application that generates a minimum of 162
> million records per day? I need your opinion with regards to this
> because the app I'm developing behaves exactly like that.
>


Is there a particular issue with generating that much data? What is the problem
you need to solve? Home use? Business use? No one can give their "take" as
there is insufficient data to make even the wildest of guess as to what you are
looking for.

I work with databases that generate 5TB+/mth. To do some quick calculations,
162M*30=4.86B records and depending on the record size - I would say you have a
pretty healthy size database. Let's say your nominal record length is 250
bytes, then you are storing ~1.22Tb /mth.

Hope you have LOTS of disc space.

--
Michael Austin.
Database Consultant
Reply With Quote
  #3 (permalink)  
Old 12-09-2006
Bill Karwin
 
Posts: n/a
Default Re: Application that generates 162 million records per day

Erol Fornoles wrote:
> What's your take on an application that generates a minimum of 162
> million records per day? I need your opinion with regards to this
> because the app I'm developing behaves exactly like that.


I'd recommend starting by reading this manual page:
http://dev.mysql.com/doc/refman/5.0/...ert-speed.html

There are several articles on the MySQL.com website that talk about
scalable architectures for MySQL.
http://dev.mysql.com/tech-resources/articles/

There are several case studies that talk about high-volume MySQL sites:
http://www.mysql.com/why-mysql/case-studies/

Regards,
Bill K.
Reply With Quote
  #4 (permalink)  
Old 12-10-2006
Erol Fornoles
 
Posts: n/a
Default Re: Application that generates 162 million records per day

The application is an advertising solution aimed for deployment on
computer-rental shops or cybercafes. The data is an ad transaction log
which not only records whether a particular ad was clicked, but also
when (timestamp) and where (lan hostname, lan ip, net ip) a particular
ad was displayed. The ads are cycled every 4 seconds.

Most of the time we're looking at the aggregate of the data, although
the other developer is saying that we keep the individual ad logs (the
one which records the time and location of each display) instead of
keeping the summaries only (how many times the ad was displayed). He is
contending that the data is absolutely required by the clients
(advertisers) and wrote a webapp that allow the clients to login and
view those individual records. Yep, all 162 million records per day.

Btw, the version of MySQL we're using is 4.0.27 MAX, due to the fact
that the other developer's web-development tools don't work with MySQL
5.0.

On Dec 10, 12:58 am, "Peter H. Coffin" <hell...@ninehells.com> wrote:
> On 9 Dec 2006 08:43:10 -0800, Erol Fornoles wrote:
>
> > What's your take on an application that generates a minimum of 162
> > million records per day? I need your opinion with regards to this
> > because the app I'm developing behaves exactly like that.My take? How many records it is far less important than how you want to

> look at the output. Are you looking at this stuff in the aggregate only?
> Need to look at individual event records often?
>
> --
> 99. Any data file of crucial importance will be padded to 1.45Mb in size.
> --Peter Anspach's list of things to do as an Evil Overlord


Reply With Quote
  #5 (permalink)  
Old 12-10-2006
Erol Fornoles
 
Posts: n/a
Default Re: Application that generates 162 million records per day

Sorry if my question was a bit vague.

What I'm looking for is the feasibility of the above and your opinions
given the following conditions:

1. We do not have a data center of our own and our server is co-located
instead.
2. Due to budgetary concerns, we only have 1 server at the moment and
the disc capacity is 160 GB.
3. The version of MySQL we're using is 4.0.27 MAX, since the other
developer's web-development tools don't work with MySQL 5.0.
4. Most of the time we are looking at the aggregate of the data. The
aggregate function is defined as the count of each record based on
whether a particular field's value is 1 (ad displayed) or 2 (ad
clicked).

Thanks!

On Dec 10, 3:45 am, Michael Austin <maus...@firstdbasource.com> wrote:
> Erol Fornoles wrote:
> > What's your take on an application that generates a minimum of 162
> > million records per day? I need your opinion with regards to this
> > because the app I'm developing behaves exactly like that.Is there a particular issue with generating that much data? What is the problem

> you need to solve? Home use? Business use? No one can give their "take" as
> there is insufficient data to make even the wildest of guess as to what you are
> looking for.
>
> I work with databases that generate 5TB+/mth. To do some quick calculations,
> 162M*30=4.86B records and depending on the record size - I would say you have a
> pretty healthy size database. Let's say your nominal record length is 250
> bytes, then you are storing ~1.22Tb /mth.
>
> Hope you have LOTS of disc space.
>
> --
> Michael Austin.
> Database Consultant


Reply With Quote
  #6 (permalink)  
Old 12-10-2006
Nicholas Sherlock
 
Posts: n/a
Default Re: Application that generates 162 million records per day

Erol Fornoles wrote:
> Sorry if my question was a bit vague.
>
> What I'm looking for is the feasibility of the above and your opinions
> given the following conditions:
>
> 1. We do not have a data center of our own and our server is co-located
> instead.
> 2. Due to budgetary concerns, we only have 1 server at the moment and
> the disc capacity is 160 GB.
> 3. The version of MySQL we're using is 4.0.27 MAX, since the other
> developer's web-development tools don't work with MySQL 5.0.
> 4. Most of the time we are looking at the aggregate of the data. The
> aggregate function is defined as the count of each record based on
> whether a particular field's value is 1 (ad displayed) or 2 (ad
> clicked).


With Michael Austin's estimate of 1.22Tb/mth, your hard drive will be
filled more than 8 times a month. You simply do not have enough space to
store all this information. Store summary data instead.

Cheers,
Nicholas Sherlock

--
http://www.sherlocksoftware.org
Reply With Quote
  #7 (permalink)  
Old 12-10-2006
howa
 
Posts: n/a
Default Re: Application that generates 162 million records per day


Erol Fornoles ¼g¹D¡G

> What's your take on an application that generates a minimum of 162
> million records per day? I need your opinion with regards to this
> because the app I'm developing behaves exactly like that.


My company experience this as well, we have a lot of web log to be
written into DB (more than 162 million records per day), The only way
is to store the summary of data

Reply With Quote
  #8 (permalink)  
Old 12-10-2006
Michael Austin
 
Posts: n/a
Default Re: Application that generates 162 million records per day

Nicholas Sherlock wrote:

> Erol Fornoles wrote:
>
>> Sorry if my question was a bit vague.
>>
>> What I'm looking for is the feasibility of the above and your opinions
>> given the following conditions:
>>
>> 1. We do not have a data center of our own and our server is co-located
>> instead.
>> 2. Due to budgetary concerns, we only have 1 server at the moment and
>> the disc capacity is 160 GB.
>> 3. The version of MySQL we're using is 4.0.27 MAX, since the other
>> developer's web-development tools don't work with MySQL 5.0.
>> 4. Most of the time we are looking at the aggregate of the data. The
>> aggregate function is defined as the count of each record based on
>> whether a particular field's value is 1 (ad displayed) or 2 (ad
>> clicked).

>
>
> With Michael Austin's estimate of 1.22Tb/mth, your hard drive will be
> filled more than 8 times a month. You simply do not have enough space to
> store all this information. Store summary data instead.
>
> Cheers,
> Nicholas Sherlock
>

that can be revised down a bit since the record size appears (I am guessing
based on the info provided by the OP) to be date(8bytes=64bits), host
name(50??), host ip (16bbytes?)2, client ip (16bytes?), clicked (1) displayed(1)
?? Again pure speculation but it looks like: 92bytes*162M = ~14.9Gb/day or
447.2GB/mth which means you are doing a LOT of purging... after the aggregate
functions complete.

--
Michael Austin.
Database Consultant
Reply With Quote
  #9 (permalink)  
Old 12-10-2006
Bill Karwin
 
Posts: n/a
Default Re: Application that generates 162 million records per day

Michael Austin wrote:
> date(8bytes=64bits), host name(50??), host ip (16bbytes?)2, client ip
> (16bytes?), clicked (1) displayed(1) ?? Again pure speculation but it
> looks like: 92bytes*162M = ~14.9Gb/day or 447.2GB/mth which means you
> are doing a LOT of purging... after the aggregate functions complete.



I wouldn't recommend storing both client hostname and client IP if space
and throughput are issues. Resolving the client hostname can be slow,
and storing it is likely to be redundant if the IP is stored.
Especially if aggregate data is the normal view mode, because individual
hostnames probably won't be displayed anyway.

If you do need to store the hostname as a string, read about the
tradeoffs between char and varchar. Char is a fixed-length datatype,
and if all your columns in a given table are of fixed length, access to
that table can be quicker.

Store IP addresses as unsigned integers, not strings. One should
convert the dotted quad representation to its integer equivalent with
the INET_ATON() function.

You could store the date as a UNIX-style timestamp in an unsigned
integer column, instead of a MySQL DATETIME datatype. That will reduce
it to 4 bytes instead of 8.

You can probably combine clicked and displayed into one byte-sized
column by making them bitfields of a single TINYINT.

There -- that reduced 92 bytes/row to 13 bytes/row! Now it requires
only 2008MB/day. You should be able to purge once a month and you'll be
fine.

Regards,
Bill K.
Reply With Quote
  #10 (permalink)  
Old 12-11-2006
Erol Fornoles
 
Posts: n/a
Default Re: Application that generates 162 million records per day

Thanks for the replies!

We've normalized the hostnames to another table, so only an id
(integer) gets saved along with the ips (integers encoded via
inet_aton).

Aside from the allowable date ranges, would there be a trade-off if I
used Unix-style timestamps? Is there a conversion function available
that converts the Unix-style timestamp to a normal timestamp or
datetime?

On Dec 11, 2:56 am, Bill Karwin <b...@karwin.com> wrote:
> Michael Austin wrote:
> > date(8bytes=64bits), host name(50??), host ip (16bbytes?)2, client ip
> > (16bytes?), clicked (1) displayed(1) ?? Again pure speculation but it
> > looks like: 92bytes*162M = ~14.9Gb/day or 447.2GB/mth which means you
> > are doing a LOT of purging... after the aggregate functions complete.I wouldn't recommend storing both client hostname and client IP if space

> and throughput are issues. Resolving the client hostname can be slow,
> and storing it is likely to be redundant if the IP is stored.
> Especially if aggregate data is the normal view mode, because individual
> hostnames probably won't be displayed anyway.
>
> If you do need to store the hostname as a string, read about the
> tradeoffs between char and varchar. Char is a fixed-length datatype,
> and if all your columns in a given table are of fixed length, access to
> that table can be quicker.
>
> Store IP addresses as unsigned integers, not strings. One should
> convert the dotted quad representation to its integer equivalent with
> the INET_ATON() function.
>
> You could store the date as a UNIX-style timestamp in an unsigned
> integer column, instead of a MySQL DATETIME datatype. That will reduce
> it to 4 bytes instead of 8.
>
> You can probably combine clicked and displayed into one byte-sized
> column by making them bitfields of a single TINYINT.
>
> There -- that reduced 92 bytes/row to 13 bytes/row! Now it requires
> only 2008MB/day. You should be able to purge once a month and you'll be
> fine.
>
> Regards,
> Bill K.


Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 09:15 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0