This is a discussion on Question about SQL table definitions within the PHP Language forums, part of the PHP Programming Forums category; I think I know the answer to this, but suggestions welcomed. I have to store lots of information. An electricity ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I think I know the answer to this, but suggestions welcomed.
I have to store lots of information. An electricity reading every half hour for hundreds of meters for several years. The ways I thought of storing this in SQL were: 1. One table row per day with 48 fields for the readings. 144M table, 3M index saves space and having a long skinny table however, getting the min/max reading in a day is a pain. I think I'll have to store it which goes against the grain 2. One table row per reading. 376M table, 231M index big time long skinny table with more control information than content, huge waste of disk space. However it is first normal form so queries are easier. 3. Two tables, one to hold control information and one for the readings. added complications because of the join and creation of an internal link field. A quick trial proved 1 to be cumbersome but space saving. I'm currently using option 2. any suggestions or recommenndations? |
|
|||
|
On Wed, 26 Nov 2003 14:48:58 +0000, Kevin Thorpe <kevin@pricetrak.com>
wrote: >I think I know the answer to this, but suggestions welcomed. > >I have to store lots of information. An electricity reading every half >hour for hundreds of meters for several years. > >The ways I thought of storing this in SQL were: >1. One table row per day with 48 fields for the readings. > 144M table, 3M index > saves space and having a long skinny table > however, getting the min/max reading in a day is a pain. I > think I'll have to store it which goes against the grain Ouch! >2. One table row per reading. > 376M table, 231M index > big time long skinny table with more control information than > content, huge waste of disk space. However it is first normal > form so queries are easier. >3. Two tables, one to hold control information and one for the readings. > added complications because of the join and creation of an > internal link field. It really ought to be option 3 (which is in 3NF), as you're duplicating your control info with option 2. Indexing the date and control id should help performance. Depending on what kinds of reports you want to run, you may be able to archive off old data with option 3 thereby keeping the database to a manageable size. We do that with some of our customer's history records whenever they notice the system becoming slow. If most queries will concern, say, the previous week's readings you could create a separate database for those and have a scheduled job move the records into an archive database each day/week. Are your controls grouped (say, geographically, by purpose, etc)? Will you want to combine the readings of several controls? -- David ( @priz.co.uk ) The Internet Prisoner Database: http://www.priz.co.uk/ipdb/ The Tarbrax Chronicle: http://www.tarbraxchronicle.com/ |
|
|||
|
On Wed, 26 Nov 2003 14:48:58 +0000, Kevin Thorpe <kevin@pricetrak.com> wrote:
>I think I know the answer to this, but suggestions welcomed. > >I have to store lots of information. An electricity reading every half >hour for hundreds of meters for several years. > >The ways I thought of storing this in SQL were: >1. One table row per day with 48 fields for the readings. > 144M table, 3M index > saves space and having a long skinny table > however, getting the min/max reading in a day is a pain. I > think I'll have to store it which goes against the grain >2. One table row per reading. > 376M table, 231M index > big time long skinny table with more control information than > content, huge waste of disk space. However it is first normal > form so queries are easier. >3. Two tables, one to hold control information and one for the readings. > added complications because of the join and creation of an > internal link field. > >A quick trial proved 1 to be cumbersome but space saving. I'm currently >using option 2. > >any suggestions or recommenndations? What database are you implementing this on? If it were Oracle, it would look like a prime candidate for an Index-Organised table, which would cut down your storage requirements. I don't think MySQL has an equivalent. I vaguely recall the SQL Server equivalent being a 'clustered index' or something like that. -- Andy Hassall (andy@andyh.co.uk) icq(5747695) (http://www.andyh.co.uk) Space: disk usage analysis tool (http://www.andyhsoftware.co.uk/space) |
|
|||
|
David Mackenzie wrote:
> On Wed, 26 Nov 2003 14:48:58 +0000, Kevin Thorpe <kevin@pricetrak.com> > wrote: > > >>I think I know the answer to this, but suggestions welcomed. >> >>I have to store lots of information. An electricity reading every half >>hour for hundreds of meters for several years. >> >>The ways I thought of storing this in SQL were: >>1. One table row per day with 48 fields for the readings. >> 144M table, 3M index >> saves space and having a long skinny table >> however, getting the min/max reading in a day is a pain. I >> think I'll have to store it which goes against the grain > > > Ouch! > > >>2. One table row per reading. >> 376M table, 231M index >> big time long skinny table with more control information than >> content, huge waste of disk space. However it is first normal >> form so queries are easier. >>3. Two tables, one to hold control information and one for the readings. >> added complications because of the join and creation of an >> internal link field. > > > It really ought to be option 3 (which is in 3NF), as you're > duplicating your control info with option 2. Indexing the date and > control id should help performance. That's what I thought. I started off using option 2 because my control information was fairly small (meter number, date, reading number) but I've now had to add extra dimensions to it (some readings are estimated or modelled) so a migration to two tables seems required. > Depending on what kinds of reports you want to run, you may be able to > archive off old data with option 3 thereby keeping the database to a > manageable size. We do that with some of our customer's history > records whenever they notice the system becoming slow. I try not to archive information. Disk space is cheap and properly indexed speed shouldn't be an issue. I've found empirically that the databases I've used fill up only at the rate hardware speeds up. > If most queries will concern, say, the previous week's readings you > could create a separate database for those and have a scheduled job > move the records into an archive database each day/week. > > Are your controls grouped (say, geographically, by purpose, etc)? Will > you want to combine the readings of several controls? The control fields are orthogonal so indexing is important. Lots of EXPLAIN queries I think. |