This is a discussion on Database suggestion within the PHP Language forums, part of the PHP Programming Forums category; Mikhail Kovalev <mikhail_kovalev@mail.ru> wrote in news:59d586fd-60f0-4a33- 8c48-247d1e719203@o6g2000hsd.googlegroups.com: > Hi. &...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Mikhail Kovalev <mikhail_kovalev@mail.ru> wrote in news:59d586fd-60f0-4a33-
8c48-247d1e719203@o6g2000hsd.googlegroups.com: > Hi. > > I work with recursive array trees of 10-20 levels in PHP. So far I > have been using serialize() to store the arrays, generating files 5+ > MB large, which take 10-15 seconds to unserialize and about tenfold of > RAM! I've been adviced to use MySQL (relational db's), but are there > any other options beside that and var_export/include? Something that > works in a similar way as MySQL when adding a new element without > loading the whole database itself... > > Thanks! Databases are truly helpful in the case. You might find this article well worth reading: http://www.sitepoint.com/article/hie...-data-database |
|
|||
|
Mikhail Kovalev wrote:
> On 5 Des, 18:42, Norman Peelman <npeel...@cfl.rr.com> wrote: <snipped> >> INSERT INTO nodes (node_address, node_count) VALUES ($node_address, >> $node_count) ON DUPLICATE KEY UPDATE node_count=node_count+1 >> >> ...will INSERT new entries and UPDATE existing entries in one swoop. >> >> Norm > > Ok, suppose I'm joining to structures which have been created > separately, > from before i have ('112/225/930', 3) > and i want to add ('112/225/930', 2), which also happens to be present > in the second table, only with a different count, > to make ('112/225/930', 5) > > From there I want to make it the general case so that when updating an > entry with count 1 I am actually adding ('112/225/930', 1) to the > existing one, if it exists: > > INSERT INTO nodes (node_address, node_count) VALUES (<new_address>, > <new_count>) ON DUPLICATE KEY UPDATE node_count = node_count + > <new_count> > > Is this correct? (Do I have to use <> in VALUES, I'm following an > example which does it?) No replace everything '<..>' with your own variable name, that will do it. $new_count = 5 ....ON DUPLICATE KEY UPDATE node_count = $new_count Not sure what you mean by 'second table'. How many do you have? Though this was one table... Norm |
|
|||
|
Mikhail Kovalev wrote:
> On 5 Des, 18:42, Norman Peelman <npeel...@cfl.rr.com> wrote: >> Toby A Inkster wrote: >>> Mikhail Kovalev wrote: >>>> $array[112][COUNT] = 10; >>>> $array[112][NEXT][225][COUNT] = 3; >>>> $array[112][NEXT][225][NEXT][930][COUNT] = 1; >>>> $array[112][NEXT][225][NEXT][930][NEXT] = array(); >>>> $array[112][NEXT][11][COUNT] = 5; >>>> $array[112][NEXT][11][NEXT] = array(); >>>> An address is an array, f ex array(112, 225, 930) >>> If $array[112][NEXT][11] exists, does that imply that >>> $array[113][NEXT][11] cannot exist? >>> i.e. Can 11 have only one "parent" or multiple? >>> If it can only have one parent, then I'd suggest using the structure >>> suggested by "C". If '11' can have more than one parent, then a classic >>> tree structure is out of the window, so I'd suggest a simple string key, >>> like this: >>> CREATE TABLE nodes >>> ( >>> node_address varchar(1024) NOT NULL PRIMARY KEY, >>> node_count integer >>> ); >>> To add counts to the database, you'd use: >>> INSERT INTO nodes VALUES ('112', 10); >>> INSERT INTO nodes VALUES ('112/225', 3); >>> INSERT INTO nodes VALUES ('112/225/930', 1); >>> INSERT INTO nodes VALUES ('112/11', 5); >>> To retrieve the count for key array(112, 225, 930), you'd use: >>> SELECT node_count >>> FROM nodes >>> WHERE node_address='112/225/930'; >>> To find a list of descendant nodes of '112', you'd use: >>> SELECT node_address >>> FROM nodes >>> WHERE node_address LIKE '112/%'; >>> Or to find just child nodes (i.e. no grandchildren, etc): >>> SELECT node_address >>> FROM nodes >>> WHERE node_address LIKE '112/%' >>> AND NOT node_address LIKE '112/%/%'; >>> By the way -- do yourself a favour and stop naming your functions things >>> like function_9() and function_10(), and variables like $array_3. If you >>> have to come back to this code after 2 or 3 months away from it, you will >>> have forgotten what they all do. Give them memorable names. >> INSERT INTO nodes (node_address, node_count) VALUES ($node_address, >> $node_count) ON DUPLICATE KEY UPDATE node_count=node_count+1 >> >> ...will INSERT new entries and UPDATE existing entries in one swoop. >> >> Norm > > Ok, suppose I'm joining to structures which have been created > separately, > from before i have ('112/225/930', 3) > and i want to add ('112/225/930', 2), which also happens to be present > in the second table, only with a different count, > to make ('112/225/930', 5) > > From there I want to make it the general case so that when updating an > entry with count 1 I am actually adding ('112/225/930', 1) to the > existing one, if it exists: > > INSERT INTO nodes (node_address, node_count) VALUES (<new_address>, > <new_count>) ON DUPLICATE KEY UPDATE node_count = node_count + > <new_count> > > Is this correct? (Do I have to use <> in VALUES, I'm following an > example which does it?) Ooops, in my other reply that should have read: $your_variable = 2 "...ON DUPLICATE KEY UPDATE new_count = new_count + $your_variable" ....to add an arbitrary amount to new_count. 3 + 2 = 5 for your example. Norm |
|
|||
|
On 5 Des, 20:46, Norman Peelman <npeel...@cfl.rr.com> wrote:
> Mikhail Kovalev wrote: > > On 5 Des, 18:42, Norman Peelman <npeel...@cfl.rr.com> wrote: > > <snipped> > > > > >> INSERT INTO nodes (node_address, node_count) VALUES ($node_address, > >> $node_count) ON DUPLICATE KEY UPDATE node_count=node_count+1 > > >> ...will INSERT new entries and UPDATE existing entries in one swoop. > > >> Norm > > > Ok, suppose I'm joining to structures which have been created > > separately, > > from before i have ('112/225/930', 3) > > and i want to add ('112/225/930', 2), which also happens to be present > > in the second table, only with a different count, > > to make ('112/225/930', 5) > > > From there I want to make it the general case so that when updating an > > entry with count 1 I am actually adding ('112/225/930', 1) to the > > existing one, if it exists: > > > INSERT INTO nodes (node_address, node_count) VALUES (<new_address>, > > <new_count>) ON DUPLICATE KEY UPDATE node_count = node_count + > > <new_count> > > > Is this correct? (Do I have to use <> in VALUES, I'm following an > > example which does it?) > > No replace everything '<..>' with your own variable name, that will do it. > > $new_count = 5 > ...ON DUPLICATE KEY UPDATE node_count = $new_count > > Not sure what you mean by 'second table'. How many do you have? Though > this was one table... > > Norm I have different databases for different types of sequences. Each sequence is some tousands elements in length and is recorded by chunks of small series each from 1 to 20 elements long. Sometimes I join different databases together and analyze each one separately and then in relation to the joint database. I have been thinking, if a sequence is for instance 5.000 elements long, chunks 1-20, and there are no duplicates (absolutely unlikely, but still) I will end up having a database with around 105.000 entries... My current solution is probably better because it saves so much space(?) An entry like this in the current solution: 112 (310) 225 (20) 930 (6) 700 (1) 7 (1) 812 (1) Equals this in the new system I am about to implement: ('112', 310) ('112/225', 20) ('112/225/930', 6) ('112/225/930/700', 1) ('112/225/930/700/7', 1) ('112/225/930/812', 1) I don't know how a flat MySQL database of 100.000-200.000 entries like these will perform. I'm now also considering a filesystem as the database, as C has suggested. |
|
|||
|
Mikhail Kovalev wrote:
> On 5 Des, 20:46, Norman Peelman <npeel...@cfl.rr.com> wrote: >> Mikhail Kovalev wrote: >>> On 5 Des, 18:42, Norman Peelman <npeel...@cfl.rr.com> wrote: >> <snipped> >> >> >> >>>> INSERT INTO nodes (node_address, node_count) VALUES ($node_address, >>>> $node_count) ON DUPLICATE KEY UPDATE node_count=node_count+1 >>>> ...will INSERT new entries and UPDATE existing entries in one swoop. >>>> Norm >>> Ok, suppose I'm joining to structures which have been created >>> separately, >>> from before i have ('112/225/930', 3) >>> and i want to add ('112/225/930', 2), which also happens to be present >>> in the second table, only with a different count, >>> to make ('112/225/930', 5) >>> From there I want to make it the general case so that when updating an >>> entry with count 1 I am actually adding ('112/225/930', 1) to the >>> existing one, if it exists: >>> INSERT INTO nodes (node_address, node_count) VALUES (<new_address>, >>> <new_count>) ON DUPLICATE KEY UPDATE node_count = node_count + >>> <new_count> >>> Is this correct? (Do I have to use <> in VALUES, I'm following an >>> example which does it?) >> No replace everything '<..>' with your own variable name, that will do it. >> >> $new_count = 5 >> ...ON DUPLICATE KEY UPDATE node_count = $new_count >> >> Not sure what you mean by 'second table'. How many do you have? Though >> this was one table... >> >> Norm > > I have different databases for different types of sequences. Each > sequence is some tousands elements in length and is recorded by chunks > of small series each from 1 to 20 elements long. Sometimes I join > different databases together and analyze each one separately and then > in relation to the joint database. > > I have been thinking, if a sequence is for instance 5.000 elements > long, chunks 1-20, and there are no duplicates (absolutely unlikely, > but still) I will end up having a database with around 105.000 > entries... > My current solution is probably better because it saves so much > space(?) > > An entry like this in the current solution: > 112 (310) > 225 (20) > 930 (6) > 700 (1) > 7 (1) > 812 (1) > > Equals this in the new system I am about to implement: > > ('112', 310) > ('112/225', 20) > ('112/225/930', 6) > ('112/225/930/700', 1) > ('112/225/930/700/7', 1) > ('112/225/930/812', 1) > > I don't know how a flat MySQL database of 100.000-200.000 entries like > these will perform. I'm now also considering a filesystem as the > database, as C has suggested. I would imagine it would perform quite well as each one of the addresses is a key (index) into the database. MySQL won't have to examine all the entries that start with 112 to find where 930 or 700 or whatever is. It knows exactly where 112/225/930/812/???/???/??? is. The INSERT UPDATE will take less than a second on average most likely. All this being said, you can do the same thing with your current array in memory style using the same syntax: $your_array['112/225/930'] = 6 $your_array['112/225/930/700'] = 1 the only difference is you still have to load it into memory (an array) to use it. If you go with MySQL and still need an array in memory at some point, a simple: $dataset = array(); $result = mysql_query("SELECT * FROM nodes"); while($row = mysql_fetch_assoc($result)) { $dataset[$row['node_address']] = $row['node_count']; } will get you your entire dataset into an in memory array in which you would access just like sql: $dataset['112'] = 310 $dataset['112/225'] = 20 $dataset['112/225/930'] = 6 $dataset['112/225/930/700'] = 1 $dataset['112/225/930/700/7'] = 1 Norm |
|
|||
|
"Mikhail Kovalev" <mikhail_kovalev@mail.ru> wrote in message news:82298e14-74e0-4d52-bc32-805b5c562ee4@d27g2000prf.googlegroups.com... >I mean, I don't mind using MySQL in itself, it's just that I don't > want use relational db's. > I don't know if it is possible to predefine the structures I'm using > in MySQL, because the number of levels varies etc. > Here is an example recording a single sequence of 6 elements: > > How do I put this in a db? > > array ( > 0 => > array ( > 0 => 2, > 1 => > array ( > 1 => > array ( > 0 => 1, > 1 => > array ( > ), > ), > 2 => > array ( > 0 => 1, > 1 => > array ( [snip] Check this out: http://www.ibase.ru/devinfo/DBMSTrees/sqltrees.html Google this: http://www.google.com.au/search?hl=e...G=Search&meta= Good ol' Joe Celko. Pity I lost my 'SQL for Smarties' book :(( |
|
|||
|
On Wed, 05 Dec 2007 23:39:27 +0100, asdf <asdf@asdf.com> wrote:
> Good ol' Joe Celko. Pity I lost my 'SQL for Smarties' book :(( Joe Celko's Trees and Hierarchies in SQL for Smarties, a must read for anyone serious about the issue indeed. Not the easiest book to plough through, but very rewarding :) -- Rik Wasmus |
|
|||
|
Mikhail Kovalev wrote:
> I have different databases for different types of sequences. Aha. I don't think you mentioned that earlier. CREATE TABLE data_sets ( data_set integer NOT NULL PRIMARY KEY, data_set_name varchar(20) ); CREATE TABLE nodes ( data_set integer NOT NULL REFERENCES data_sets ON DELETE CASCADE, node_address varchar(1024) NOT NULL, node_count integer, PRIMARY KEY (data_set_id, node_address) ); An example of inserting some data: INSERT INTO data_sets VALUES (1, 'Example Data Set 1'); INSERT INTO nodes VALUES (1, '112', 3); INSERT INTO nodes VALUES (1, '118', 4); INSERT INTO nodes VALUES (1, '112/11', 5); INSERT INTO data_sets VALUES (2, 'Example Data Set 2'); INSERT INTO nodes VALUES (2, '112', 2); INSERT INTO nodes VALUES (2, '4/115', 3); An example of selecting data from Data Set 1: SELECT node_count FROM nodes WHERE data_set=1 AND node_address='112'; (returns 3) Some examples of selecting the sum of data from multiple sets: SELECT sum(node_count) AS node_count FROM nodes WHERE data_set IN (1, 2) AND node_address='112' GROUP BY node_address; (returns 5) SELECT node_address, sum(node_count) AS node_count FROM nodes WHERE data_set IN (1, 2) GROUP BY node_address; (returns list of all nodes in set 1 or set 2, plus sum of counts for both data sets) Let's create a new data set 3 and populate it with the data from the previous query. That is, data set 3 is the sum of data sets 1 and 2: INSERT INTO data_sets VALUES (3, 'Combined Set'); INSERT INTO nodes SELECT 3 AS data_set, node_address, sum(node_count) AS node_count FROM nodes WHERE data_set IN (1, 2) GROUP BY node_address; SQL is a really easy way of manipulating data sets. -- Toby A Inkster BSc (Hons) ARCS [Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux] [OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 11 days, 16:07.] Sharing Music with Apple iTunes http://tobyinkster.co.uk/blog/2007/1...tunes-sharing/ |
|
|||
|
On 6 Des, 10:34, Toby A Inkster <usenet200...@tobyinkster.co.uk>
wrote: > Mikhail Kovalev wrote: > > I have different databases for different types of sequences. > > Aha. I don't think you mentioned that earlier. > > CREATE TABLE data_sets > ( > data_set integer NOT NULL PRIMARY KEY, > data_set_name varchar(20) > ); > CREATE TABLE nodes > ( > data_set integer NOT NULL > REFERENCES data_sets ON DELETE CASCADE, > node_address varchar(1024) NOT NULL, > node_count integer, > PRIMARY KEY (data_set_id, node_address) > ); > > An example of inserting some data: > > INSERT INTO data_sets VALUES (1, 'Example Data Set 1'); > INSERT INTO nodes VALUES (1, '112', 3); > INSERT INTO nodes VALUES (1, '118', 4); > INSERT INTO nodes VALUES (1, '112/11', 5); > INSERT INTO data_sets VALUES (2, 'Example Data Set 2'); > INSERT INTO nodes VALUES (2, '112', 2); > INSERT INTO nodes VALUES (2, '4/115', 3); > > An example of selecting data from Data Set 1: > > SELECT node_count > FROM nodes > WHERE data_set=1 > AND node_address='112'; > (returns 3) > > Some examples of selecting the sum of data from multiple sets: > > SELECT sum(node_count) AS node_count > FROM nodes > WHERE data_set IN (1, 2) > AND node_address='112' > GROUP BY node_address; > (returns 5) > > SELECT node_address, sum(node_count) AS node_count > FROM nodes > WHERE data_set IN (1, 2) > GROUP BY node_address; > (returns list of all nodes in set 1 or set 2, plus sum of counts > for both data sets) > > Let's create a new data set 3 and populate it with the data from the > previous query. That is, data set 3 is the sum of data sets 1 and 2: > > INSERT INTO data_sets VALUES (3, 'Combined Set'); > INSERT INTO nodes > SELECT 3 AS data_set, node_address, sum(node_count) AS node_count > FROM nodes > WHERE data_set IN (1, 2) > GROUP BY node_address; > > SQL is a really easy way of manipulating data sets. > > -- > Toby A Inkster BSc (Hons) ARCS > [Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux] > [OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 11 days, 16:07.] > > Sharing Music with Apple iTunes > http://tobyinkster.co.uk/blog/2007/1...tunes-sharing/ |
|
|||
|
On 6 Des, 10:34, Toby A Inkster <usenet200...@tobyinkster.co.uk>
wrote: > Mikhail Kovalev wrote: > > I have different databases for different types of sequences. > > Aha. I don't think you mentioned that earlier. > > CREATE TABLE data_sets > ( > data_set integer NOT NULL PRIMARY KEY, > data_set_name varchar(20) > ); > CREATE TABLE nodes > ( > data_set integer NOT NULL > REFERENCES data_sets ON DELETE CASCADE, > node_address varchar(1024) NOT NULL, > node_count integer, > PRIMARY KEY (data_set_id, node_address) > ); > > An example of inserting some data: > > INSERT INTO data_sets VALUES (1, 'Example Data Set 1'); > INSERT INTO nodes VALUES (1, '112', 3); > INSERT INTO nodes VALUES (1, '118', 4); > INSERT INTO nodes VALUES (1, '112/11', 5); > INSERT INTO data_sets VALUES (2, 'Example Data Set 2'); > INSERT INTO nodes VALUES (2, '112', 2); > INSERT INTO nodes VALUES (2, '4/115', 3); > > An example of selecting data from Data Set 1: > > SELECT node_count > FROM nodes > WHERE data_set=1 > AND node_address='112'; > (returns 3) > > Some examples of selecting the sum of data from multiple sets: > > SELECT sum(node_count) AS node_count > FROM nodes > WHERE data_set IN (1, 2) > AND node_address='112' > GROUP BY node_address; > (returns 5) > > SELECT node_address, sum(node_count) AS node_count > FROM nodes > WHERE data_set IN (1, 2) > GROUP BY node_address; > (returns list of all nodes in set 1 or set 2, plus sum of counts > for both data sets) > > Let's create a new data set 3 and populate it with the data from the > previous query. That is, data set 3 is the sum of data sets 1 and 2: > > INSERT INTO data_sets VALUES (3, 'Combined Set'); > INSERT INTO nodes > SELECT 3 AS data_set, node_address, sum(node_count) AS node_count > FROM nodes > WHERE data_set IN (1, 2) > GROUP BY node_address; > > SQL is a really easy way of manipulating data sets. > > -- > Toby A Inkster BSc (Hons) ARCS > [Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux] > [OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 11 days, 16:07.] > > Sharing Music with Apple iTunes > http://tobyinkster.co.uk/blog/2007/1...tunes-sharing/ This and the wildcard search will save me 50% of programming. Seriously, I don't understand why I didn't looked into it earlier. Thanks for the input! And sorry about last empty message, sent it by mistake. |