This is a discussion on Application that generates 162 million records per day within the MySQL Database forums, part of the Database Forums category; What's your take on an application that generates a minimum of 162 million records per day? I need your ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Erol Fornoles wrote:
> What's your take on an application that generates a minimum of 162 > million records per day? I need your opinion with regards to this > because the app I'm developing behaves exactly like that. > Is there a particular issue with generating that much data? What is the problem you need to solve? Home use? Business use? No one can give their "take" as there is insufficient data to make even the wildest of guess as to what you are looking for. I work with databases that generate 5TB+/mth. To do some quick calculations, 162M*30=4.86B records and depending on the record size - I would say you have a pretty healthy size database. Let's say your nominal record length is 250 bytes, then you are storing ~1.22Tb /mth. Hope you have LOTS of disc space. -- Michael Austin. Database Consultant |
|
|||
|
Erol Fornoles wrote:
> What's your take on an application that generates a minimum of 162 > million records per day? I need your opinion with regards to this > because the app I'm developing behaves exactly like that. I'd recommend starting by reading this manual page: http://dev.mysql.com/doc/refman/5.0/...ert-speed.html There are several articles on the MySQL.com website that talk about scalable architectures for MySQL. http://dev.mysql.com/tech-resources/articles/ There are several case studies that talk about high-volume MySQL sites: http://www.mysql.com/why-mysql/case-studies/ Regards, Bill K. |
|
|||
|
The application is an advertising solution aimed for deployment on
computer-rental shops or cybercafes. The data is an ad transaction log which not only records whether a particular ad was clicked, but also when (timestamp) and where (lan hostname, lan ip, net ip) a particular ad was displayed. The ads are cycled every 4 seconds. Most of the time we're looking at the aggregate of the data, although the other developer is saying that we keep the individual ad logs (the one which records the time and location of each display) instead of keeping the summaries only (how many times the ad was displayed). He is contending that the data is absolutely required by the clients (advertisers) and wrote a webapp that allow the clients to login and view those individual records. Yep, all 162 million records per day. Btw, the version of MySQL we're using is 4.0.27 MAX, due to the fact that the other developer's web-development tools don't work with MySQL 5.0. On Dec 10, 12:58 am, "Peter H. Coffin" <hell...@ninehells.com> wrote: > On 9 Dec 2006 08:43:10 -0800, Erol Fornoles wrote: > > > What's your take on an application that generates a minimum of 162 > > million records per day? I need your opinion with regards to this > > because the app I'm developing behaves exactly like that.My take? How many records it is far less important than how you want to > look at the output. Are you looking at this stuff in the aggregate only? > Need to look at individual event records often? > > -- > 99. Any data file of crucial importance will be padded to 1.45Mb in size. > --Peter Anspach's list of things to do as an Evil Overlord |
|
|||
|
Sorry if my question was a bit vague.
What I'm looking for is the feasibility of the above and your opinions given the following conditions: 1. We do not have a data center of our own and our server is co-located instead. 2. Due to budgetary concerns, we only have 1 server at the moment and the disc capacity is 160 GB. 3. The version of MySQL we're using is 4.0.27 MAX, since the other developer's web-development tools don't work with MySQL 5.0. 4. Most of the time we are looking at the aggregate of the data. The aggregate function is defined as the count of each record based on whether a particular field's value is 1 (ad displayed) or 2 (ad clicked). Thanks! On Dec 10, 3:45 am, Michael Austin <maus...@firstdbasource.com> wrote: > Erol Fornoles wrote: > > What's your take on an application that generates a minimum of 162 > > million records per day? I need your opinion with regards to this > > because the app I'm developing behaves exactly like that.Is there a particular issue with generating that much data? What is the problem > you need to solve? Home use? Business use? No one can give their "take" as > there is insufficient data to make even the wildest of guess as to what you are > looking for. > > I work with databases that generate 5TB+/mth. To do some quick calculations, > 162M*30=4.86B records and depending on the record size - I would say you have a > pretty healthy size database. Let's say your nominal record length is 250 > bytes, then you are storing ~1.22Tb /mth. > > Hope you have LOTS of disc space. > > -- > Michael Austin. > Database Consultant |
|
|||
|
Erol Fornoles wrote:
> Sorry if my question was a bit vague. > > What I'm looking for is the feasibility of the above and your opinions > given the following conditions: > > 1. We do not have a data center of our own and our server is co-located > instead. > 2. Due to budgetary concerns, we only have 1 server at the moment and > the disc capacity is 160 GB. > 3. The version of MySQL we're using is 4.0.27 MAX, since the other > developer's web-development tools don't work with MySQL 5.0. > 4. Most of the time we are looking at the aggregate of the data. The > aggregate function is defined as the count of each record based on > whether a particular field's value is 1 (ad displayed) or 2 (ad > clicked). With Michael Austin's estimate of 1.22Tb/mth, your hard drive will be filled more than 8 times a month. You simply do not have enough space to store all this information. Store summary data instead. Cheers, Nicholas Sherlock -- http://www.sherlocksoftware.org |
|
|||
|
Erol Fornoles ¼g¹D¡G > What's your take on an application that generates a minimum of 162 > million records per day? I need your opinion with regards to this > because the app I'm developing behaves exactly like that. My company experience this as well, we have a lot of web log to be written into DB (more than 162 million records per day), The only way is to store the summary of data |
|
|||
|
Nicholas Sherlock wrote:
> Erol Fornoles wrote: > >> Sorry if my question was a bit vague. >> >> What I'm looking for is the feasibility of the above and your opinions >> given the following conditions: >> >> 1. We do not have a data center of our own and our server is co-located >> instead. >> 2. Due to budgetary concerns, we only have 1 server at the moment and >> the disc capacity is 160 GB. >> 3. The version of MySQL we're using is 4.0.27 MAX, since the other >> developer's web-development tools don't work with MySQL 5.0. >> 4. Most of the time we are looking at the aggregate of the data. The >> aggregate function is defined as the count of each record based on >> whether a particular field's value is 1 (ad displayed) or 2 (ad >> clicked). > > > With Michael Austin's estimate of 1.22Tb/mth, your hard drive will be > filled more than 8 times a month. You simply do not have enough space to > store all this information. Store summary data instead. > > Cheers, > Nicholas Sherlock > that can be revised down a bit since the record size appears (I am guessing based on the info provided by the OP) to be date(8bytes=64bits), host name(50??), host ip (16bbytes?)2, client ip (16bytes?), clicked (1) displayed(1) ?? Again pure speculation but it looks like: 92bytes*162M = ~14.9Gb/day or 447.2GB/mth which means you are doing a LOT of purging... after the aggregate functions complete. -- Michael Austin. Database Consultant |
|
|||
|
Michael Austin wrote:
> date(8bytes=64bits), host name(50??), host ip (16bbytes?)2, client ip > (16bytes?), clicked (1) displayed(1) ?? Again pure speculation but it > looks like: 92bytes*162M = ~14.9Gb/day or 447.2GB/mth which means you > are doing a LOT of purging... after the aggregate functions complete. I wouldn't recommend storing both client hostname and client IP if space and throughput are issues. Resolving the client hostname can be slow, and storing it is likely to be redundant if the IP is stored. Especially if aggregate data is the normal view mode, because individual hostnames probably won't be displayed anyway. If you do need to store the hostname as a string, read about the tradeoffs between char and varchar. Char is a fixed-length datatype, and if all your columns in a given table are of fixed length, access to that table can be quicker. Store IP addresses as unsigned integers, not strings. One should convert the dotted quad representation to its integer equivalent with the INET_ATON() function. You could store the date as a UNIX-style timestamp in an unsigned integer column, instead of a MySQL DATETIME datatype. That will reduce it to 4 bytes instead of 8. You can probably combine clicked and displayed into one byte-sized column by making them bitfields of a single TINYINT. There -- that reduced 92 bytes/row to 13 bytes/row! Now it requires only 2008MB/day. You should be able to purge once a month and you'll be fine. Regards, Bill K. |
|
|||
|
Thanks for the replies!
We've normalized the hostnames to another table, so only an id (integer) gets saved along with the ips (integers encoded via inet_aton). Aside from the allowable date ranges, would there be a trade-off if I used Unix-style timestamps? Is there a conversion function available that converts the Unix-style timestamp to a normal timestamp or datetime? On Dec 11, 2:56 am, Bill Karwin <b...@karwin.com> wrote: > Michael Austin wrote: > > date(8bytes=64bits), host name(50??), host ip (16bbytes?)2, client ip > > (16bytes?), clicked (1) displayed(1) ?? Again pure speculation but it > > looks like: 92bytes*162M = ~14.9Gb/day or 447.2GB/mth which means you > > are doing a LOT of purging... after the aggregate functions complete.I wouldn't recommend storing both client hostname and client IP if space > and throughput are issues. Resolving the client hostname can be slow, > and storing it is likely to be redundant if the IP is stored. > Especially if aggregate data is the normal view mode, because individual > hostnames probably won't be displayed anyway. > > If you do need to store the hostname as a string, read about the > tradeoffs between char and varchar. Char is a fixed-length datatype, > and if all your columns in a given table are of fixed length, access to > that table can be quicker. > > Store IP addresses as unsigned integers, not strings. One should > convert the dotted quad representation to its integer equivalent with > the INET_ATON() function. > > You could store the date as a UNIX-style timestamp in an unsigned > integer column, instead of a MySQL DATETIME datatype. That will reduce > it to 4 bytes instead of 8. > > You can probably combine clicked and displayed into one byte-sized > column by making them bitfields of a single TINYINT. > > There -- that reduced 92 bytes/row to 13 bytes/row! Now it requires > only 2008MB/day. You should be able to purge once a month and you'll be > fine. > > Regards, > Bill K. |