Database suggestion

This is a discussion on Database suggestion within the PHP Language forums, part of the PHP Programming Forums category; Mikhail Kovalev <mikhail_kovalev@mail.ru> wrote in news:59d586fd-60f0-4a33- 8c48-247d1e719203@o6g2000hsd.googlegroups.com: > Hi. &...


Go Back   Usenet Forums > PHP Programming Forums > PHP Language

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #21 (permalink)  
Old 12-05-2007
Good Man
 
Posts: n/a
Default Re: Database suggestion

Mikhail Kovalev <mikhail_kovalev@mail.ru> wrote in news:59d586fd-60f0-4a33-
8c48-247d1e719203@o6g2000hsd.googlegroups.com:

> Hi.
>
> I work with recursive array trees of 10-20 levels in PHP. So far I
> have been using serialize() to store the arrays, generating files 5+
> MB large, which take 10-15 seconds to unserialize and about tenfold of
> RAM! I've been adviced to use MySQL (relational db's), but are there
> any other options beside that and var_export/include? Something that
> works in a similar way as MySQL when adding a new element without
> loading the whole database itself...
>
> Thanks!


Databases are truly helpful in the case.

You might find this article well worth reading:

http://www.sitepoint.com/article/hie...-data-database

Reply With Quote
  #22 (permalink)  
Old 12-05-2007
Norman Peelman
 
Posts: n/a
Default Re: Database suggestion

Mikhail Kovalev wrote:
> On 5 Des, 18:42, Norman Peelman <npeel...@cfl.rr.com> wrote:


<snipped>

>> INSERT INTO nodes (node_address, node_count) VALUES ($node_address,
>> $node_count) ON DUPLICATE KEY UPDATE node_count=node_count+1
>>
>> ...will INSERT new entries and UPDATE existing entries in one swoop.
>>
>> Norm

>
> Ok, suppose I'm joining to structures which have been created
> separately,
> from before i have ('112/225/930', 3)
> and i want to add ('112/225/930', 2), which also happens to be present
> in the second table, only with a different count,
> to make ('112/225/930', 5)
>
> From there I want to make it the general case so that when updating an
> entry with count 1 I am actually adding ('112/225/930', 1) to the
> existing one, if it exists:
>
> INSERT INTO nodes (node_address, node_count) VALUES (<new_address>,
> <new_count>) ON DUPLICATE KEY UPDATE node_count = node_count +
> <new_count>
>
> Is this correct? (Do I have to use <> in VALUES, I'm following an
> example which does it?)


No replace everything '<..>' with your own variable name, that will do it.

$new_count = 5
....ON DUPLICATE KEY UPDATE node_count = $new_count

Not sure what you mean by 'second table'. How many do you have? Though
this was one table...

Norm
Reply With Quote
  #23 (permalink)  
Old 12-05-2007
Norman Peelman
 
Posts: n/a
Default Re: Database suggestion

Mikhail Kovalev wrote:
> On 5 Des, 18:42, Norman Peelman <npeel...@cfl.rr.com> wrote:
>> Toby A Inkster wrote:
>>> Mikhail Kovalev wrote:
>>>> $array[112][COUNT] = 10;
>>>> $array[112][NEXT][225][COUNT] = 3;
>>>> $array[112][NEXT][225][NEXT][930][COUNT] = 1;
>>>> $array[112][NEXT][225][NEXT][930][NEXT] = array();
>>>> $array[112][NEXT][11][COUNT] = 5;
>>>> $array[112][NEXT][11][NEXT] = array();
>>>> An address is an array, f ex array(112, 225, 930)
>>> If $array[112][NEXT][11] exists, does that imply that
>>> $array[113][NEXT][11] cannot exist?
>>> i.e. Can 11 have only one "parent" or multiple?
>>> If it can only have one parent, then I'd suggest using the structure
>>> suggested by "C". If '11' can have more than one parent, then a classic
>>> tree structure is out of the window, so I'd suggest a simple string key,
>>> like this:
>>> CREATE TABLE nodes
>>> (
>>> node_address varchar(1024) NOT NULL PRIMARY KEY,
>>> node_count integer
>>> );
>>> To add counts to the database, you'd use:
>>> INSERT INTO nodes VALUES ('112', 10);
>>> INSERT INTO nodes VALUES ('112/225', 3);
>>> INSERT INTO nodes VALUES ('112/225/930', 1);
>>> INSERT INTO nodes VALUES ('112/11', 5);
>>> To retrieve the count for key array(112, 225, 930), you'd use:
>>> SELECT node_count
>>> FROM nodes
>>> WHERE node_address='112/225/930';
>>> To find a list of descendant nodes of '112', you'd use:
>>> SELECT node_address
>>> FROM nodes
>>> WHERE node_address LIKE '112/%';
>>> Or to find just child nodes (i.e. no grandchildren, etc):
>>> SELECT node_address
>>> FROM nodes
>>> WHERE node_address LIKE '112/%'
>>> AND NOT node_address LIKE '112/%/%';
>>> By the way -- do yourself a favour and stop naming your functions things
>>> like function_9() and function_10(), and variables like $array_3. If you
>>> have to come back to this code after 2 or 3 months away from it, you will
>>> have forgotten what they all do. Give them memorable names.

>> INSERT INTO nodes (node_address, node_count) VALUES ($node_address,
>> $node_count) ON DUPLICATE KEY UPDATE node_count=node_count+1
>>
>> ...will INSERT new entries and UPDATE existing entries in one swoop.
>>
>> Norm

>
> Ok, suppose I'm joining to structures which have been created
> separately,
> from before i have ('112/225/930', 3)
> and i want to add ('112/225/930', 2), which also happens to be present
> in the second table, only with a different count,
> to make ('112/225/930', 5)
>
> From there I want to make it the general case so that when updating an
> entry with count 1 I am actually adding ('112/225/930', 1) to the
> existing one, if it exists:
>
> INSERT INTO nodes (node_address, node_count) VALUES (<new_address>,
> <new_count>) ON DUPLICATE KEY UPDATE node_count = node_count +
> <new_count>
>
> Is this correct? (Do I have to use <> in VALUES, I'm following an
> example which does it?)


Ooops, in my other reply that should have read:

$your_variable = 2
"...ON DUPLICATE KEY UPDATE new_count = new_count + $your_variable"

....to add an arbitrary amount to new_count. 3 + 2 = 5 for your example.


Norm
Reply With Quote
  #24 (permalink)  
Old 12-05-2007
Mikhail Kovalev
 
Posts: n/a
Default Re: Database suggestion

On 5 Des, 20:46, Norman Peelman <npeel...@cfl.rr.com> wrote:
> Mikhail Kovalev wrote:
> > On 5 Des, 18:42, Norman Peelman <npeel...@cfl.rr.com> wrote:

>
> <snipped>
>
>
>
> >> INSERT INTO nodes (node_address, node_count) VALUES ($node_address,
> >> $node_count) ON DUPLICATE KEY UPDATE node_count=node_count+1

>
> >> ...will INSERT new entries and UPDATE existing entries in one swoop.

>
> >> Norm

>
> > Ok, suppose I'm joining to structures which have been created
> > separately,
> > from before i have ('112/225/930', 3)
> > and i want to add ('112/225/930', 2), which also happens to be present
> > in the second table, only with a different count,
> > to make ('112/225/930', 5)

>
> > From there I want to make it the general case so that when updating an
> > entry with count 1 I am actually adding ('112/225/930', 1) to the
> > existing one, if it exists:

>
> > INSERT INTO nodes (node_address, node_count) VALUES (<new_address>,
> > <new_count>) ON DUPLICATE KEY UPDATE node_count = node_count +
> > <new_count>

>
> > Is this correct? (Do I have to use <> in VALUES, I'm following an
> > example which does it?)

>
> No replace everything '<..>' with your own variable name, that will do it.
>
> $new_count = 5
> ...ON DUPLICATE KEY UPDATE node_count = $new_count
>
> Not sure what you mean by 'second table'. How many do you have? Though
> this was one table...
>
> Norm


I have different databases for different types of sequences. Each
sequence is some tousands elements in length and is recorded by chunks
of small series each from 1 to 20 elements long. Sometimes I join
different databases together and analyze each one separately and then
in relation to the joint database.

I have been thinking, if a sequence is for instance 5.000 elements
long, chunks 1-20, and there are no duplicates (absolutely unlikely,
but still) I will end up having a database with around 105.000
entries...
My current solution is probably better because it saves so much
space(?)

An entry like this in the current solution:
112 (310)
225 (20)
930 (6)
700 (1)
7 (1)
812 (1)

Equals this in the new system I am about to implement:

('112', 310)
('112/225', 20)
('112/225/930', 6)
('112/225/930/700', 1)
('112/225/930/700/7', 1)
('112/225/930/812', 1)

I don't know how a flat MySQL database of 100.000-200.000 entries like
these will perform. I'm now also considering a filesystem as the
database, as C has suggested.
Reply With Quote
  #25 (permalink)  
Old 12-05-2007
Norman Peelman
 
Posts: n/a
Default Re: Database suggestion

Mikhail Kovalev wrote:
> On 5 Des, 20:46, Norman Peelman <npeel...@cfl.rr.com> wrote:
>> Mikhail Kovalev wrote:
>>> On 5 Des, 18:42, Norman Peelman <npeel...@cfl.rr.com> wrote:

>> <snipped>
>>
>>
>>
>>>> INSERT INTO nodes (node_address, node_count) VALUES ($node_address,
>>>> $node_count) ON DUPLICATE KEY UPDATE node_count=node_count+1
>>>> ...will INSERT new entries and UPDATE existing entries in one swoop.
>>>> Norm
>>> Ok, suppose I'm joining to structures which have been created
>>> separately,
>>> from before i have ('112/225/930', 3)
>>> and i want to add ('112/225/930', 2), which also happens to be present
>>> in the second table, only with a different count,
>>> to make ('112/225/930', 5)
>>> From there I want to make it the general case so that when updating an
>>> entry with count 1 I am actually adding ('112/225/930', 1) to the
>>> existing one, if it exists:
>>> INSERT INTO nodes (node_address, node_count) VALUES (<new_address>,
>>> <new_count>) ON DUPLICATE KEY UPDATE node_count = node_count +
>>> <new_count>
>>> Is this correct? (Do I have to use <> in VALUES, I'm following an
>>> example which does it?)

>> No replace everything '<..>' with your own variable name, that will do it.
>>
>> $new_count = 5
>> ...ON DUPLICATE KEY UPDATE node_count = $new_count
>>
>> Not sure what you mean by 'second table'. How many do you have? Though
>> this was one table...
>>
>> Norm

>
> I have different databases for different types of sequences. Each
> sequence is some tousands elements in length and is recorded by chunks
> of small series each from 1 to 20 elements long. Sometimes I join
> different databases together and analyze each one separately and then
> in relation to the joint database.
>
> I have been thinking, if a sequence is for instance 5.000 elements
> long, chunks 1-20, and there are no duplicates (absolutely unlikely,
> but still) I will end up having a database with around 105.000
> entries...
> My current solution is probably better because it saves so much
> space(?)
>
> An entry like this in the current solution:
> 112 (310)
> 225 (20)
> 930 (6)
> 700 (1)
> 7 (1)
> 812 (1)
>
> Equals this in the new system I am about to implement:
>
> ('112', 310)
> ('112/225', 20)
> ('112/225/930', 6)
> ('112/225/930/700', 1)
> ('112/225/930/700/7', 1)
> ('112/225/930/812', 1)
>
> I don't know how a flat MySQL database of 100.000-200.000 entries like
> these will perform. I'm now also considering a filesystem as the
> database, as C has suggested.


I would imagine it would perform quite well as each one of the
addresses is a key (index) into the database. MySQL won't have to
examine all the entries that start with 112 to find where 930 or 700 or
whatever is. It knows exactly where 112/225/930/812/???/???/??? is. The
INSERT UPDATE will take less than a second on average most likely.

All this being said, you can do the same thing with your current array
in memory style using the same syntax:

$your_array['112/225/930'] = 6
$your_array['112/225/930/700'] = 1

the only difference is you still have to load it into memory (an array)
to use it. If you go with MySQL and still need an array in memory at
some point, a simple:

$dataset = array();
$result = mysql_query("SELECT * FROM nodes");
while($row = mysql_fetch_assoc($result))
{
$dataset[$row['node_address']] = $row['node_count'];
}

will get you your entire dataset into an in memory array in which you
would access just like sql:

$dataset['112'] = 310
$dataset['112/225'] = 20
$dataset['112/225/930'] = 6
$dataset['112/225/930/700'] = 1
$dataset['112/225/930/700/7'] = 1

Norm

Reply With Quote
  #26 (permalink)  
Old 12-05-2007
asdf
 
Posts: n/a
Default Re: Database suggestion


"Mikhail Kovalev" <mikhail_kovalev@mail.ru> wrote in message
news:82298e14-74e0-4d52-bc32-805b5c562ee4@d27g2000prf.googlegroups.com...
>I mean, I don't mind using MySQL in itself, it's just that I don't
> want use relational db's.
> I don't know if it is possible to predefine the structures I'm using
> in MySQL, because the number of levels varies etc.
> Here is an example recording a single sequence of 6 elements:
>
> How do I put this in a db?
>
> array (
> 0 =>
> array (
> 0 => 2,
> 1 =>
> array (
> 1 =>
> array (
> 0 => 1,
> 1 =>
> array (
> ),
> ),
> 2 =>
> array (
> 0 => 1,
> 1 =>
> array (

[snip]

Check this out: http://www.ibase.ru/devinfo/DBMSTrees/sqltrees.html

Google this:
http://www.google.com.au/search?hl=e...G=Search&meta=

Good ol' Joe Celko. Pity I lost my 'SQL for Smarties' book :((


Reply With Quote
  #27 (permalink)  
Old 12-05-2007
Rik Wasmus
 
Posts: n/a
Default Re: Database suggestion

On Wed, 05 Dec 2007 23:39:27 +0100, asdf <asdf@asdf.com> wrote:
> Good ol' Joe Celko. Pity I lost my 'SQL for Smarties' book :((


Joe Celko's Trees and Hierarchies in SQL for Smarties, a must read for
anyone serious about the issue indeed. Not the easiest book to plough
through, but very rewarding :)
--
Rik Wasmus
Reply With Quote
  #28 (permalink)  
Old 12-06-2007
Toby A Inkster
 
Posts: n/a
Default Re: Database suggestion

Mikhail Kovalev wrote:

> I have different databases for different types of sequences.


Aha. I don't think you mentioned that earlier.

CREATE TABLE data_sets
(
data_set integer NOT NULL PRIMARY KEY,
data_set_name varchar(20)
);
CREATE TABLE nodes
(
data_set integer NOT NULL
REFERENCES data_sets ON DELETE CASCADE,
node_address varchar(1024) NOT NULL,
node_count integer,
PRIMARY KEY (data_set_id, node_address)
);

An example of inserting some data:

INSERT INTO data_sets VALUES (1, 'Example Data Set 1');
INSERT INTO nodes VALUES (1, '112', 3);
INSERT INTO nodes VALUES (1, '118', 4);
INSERT INTO nodes VALUES (1, '112/11', 5);
INSERT INTO data_sets VALUES (2, 'Example Data Set 2');
INSERT INTO nodes VALUES (2, '112', 2);
INSERT INTO nodes VALUES (2, '4/115', 3);

An example of selecting data from Data Set 1:

SELECT node_count
FROM nodes
WHERE data_set=1
AND node_address='112';
(returns 3)

Some examples of selecting the sum of data from multiple sets:

SELECT sum(node_count) AS node_count
FROM nodes
WHERE data_set IN (1, 2)
AND node_address='112'
GROUP BY node_address;
(returns 5)

SELECT node_address, sum(node_count) AS node_count
FROM nodes
WHERE data_set IN (1, 2)
GROUP BY node_address;
(returns list of all nodes in set 1 or set 2, plus sum of counts
for both data sets)

Let's create a new data set 3 and populate it with the data from the
previous query. That is, data set 3 is the sum of data sets 1 and 2:

INSERT INTO data_sets VALUES (3, 'Combined Set');
INSERT INTO nodes
SELECT 3 AS data_set, node_address, sum(node_count) AS node_count
FROM nodes
WHERE data_set IN (1, 2)
GROUP BY node_address;

SQL is a really easy way of manipulating data sets.

--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 11 days, 16:07.]

Sharing Music with Apple iTunes
http://tobyinkster.co.uk/blog/2007/1...tunes-sharing/
Reply With Quote
  #29 (permalink)  
Old 12-06-2007
Mikhail Kovalev
 
Posts: n/a
Default Re: Database suggestion

On 6 Des, 10:34, Toby A Inkster <usenet200...@tobyinkster.co.uk>
wrote:
> Mikhail Kovalev wrote:
> > I have different databases for different types of sequences.

>
> Aha. I don't think you mentioned that earlier.
>
> CREATE TABLE data_sets
> (
> data_set integer NOT NULL PRIMARY KEY,
> data_set_name varchar(20)
> );
> CREATE TABLE nodes
> (
> data_set integer NOT NULL
> REFERENCES data_sets ON DELETE CASCADE,
> node_address varchar(1024) NOT NULL,
> node_count integer,
> PRIMARY KEY (data_set_id, node_address)
> );
>
> An example of inserting some data:
>
> INSERT INTO data_sets VALUES (1, 'Example Data Set 1');
> INSERT INTO nodes VALUES (1, '112', 3);
> INSERT INTO nodes VALUES (1, '118', 4);
> INSERT INTO nodes VALUES (1, '112/11', 5);
> INSERT INTO data_sets VALUES (2, 'Example Data Set 2');
> INSERT INTO nodes VALUES (2, '112', 2);
> INSERT INTO nodes VALUES (2, '4/115', 3);
>
> An example of selecting data from Data Set 1:
>
> SELECT node_count
> FROM nodes
> WHERE data_set=1
> AND node_address='112';
> (returns 3)
>
> Some examples of selecting the sum of data from multiple sets:
>
> SELECT sum(node_count) AS node_count
> FROM nodes
> WHERE data_set IN (1, 2)
> AND node_address='112'
> GROUP BY node_address;
> (returns 5)
>
> SELECT node_address, sum(node_count) AS node_count
> FROM nodes
> WHERE data_set IN (1, 2)
> GROUP BY node_address;
> (returns list of all nodes in set 1 or set 2, plus sum of counts
> for both data sets)
>
> Let's create a new data set 3 and populate it with the data from the
> previous query. That is, data set 3 is the sum of data sets 1 and 2:
>
> INSERT INTO data_sets VALUES (3, 'Combined Set');
> INSERT INTO nodes
> SELECT 3 AS data_set, node_address, sum(node_count) AS node_count
> FROM nodes
> WHERE data_set IN (1, 2)
> GROUP BY node_address;
>
> SQL is a really easy way of manipulating data sets.
>
> --
> Toby A Inkster BSc (Hons) ARCS
> [Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
> [OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 11 days, 16:07.]
>
> Sharing Music with Apple iTunes
> http://tobyinkster.co.uk/blog/2007/1...tunes-sharing/


Reply With Quote
  #30 (permalink)  
Old 12-06-2007
Mikhail Kovalev
 
Posts: n/a
Default Re: Database suggestion

On 6 Des, 10:34, Toby A Inkster <usenet200...@tobyinkster.co.uk>
wrote:
> Mikhail Kovalev wrote:
> > I have different databases for different types of sequences.

>
> Aha. I don't think you mentioned that earlier.
>
> CREATE TABLE data_sets
> (
> data_set integer NOT NULL PRIMARY KEY,
> data_set_name varchar(20)
> );
> CREATE TABLE nodes
> (
> data_set integer NOT NULL
> REFERENCES data_sets ON DELETE CASCADE,
> node_address varchar(1024) NOT NULL,
> node_count integer,
> PRIMARY KEY (data_set_id, node_address)
> );
>
> An example of inserting some data:
>
> INSERT INTO data_sets VALUES (1, 'Example Data Set 1');
> INSERT INTO nodes VALUES (1, '112', 3);
> INSERT INTO nodes VALUES (1, '118', 4);
> INSERT INTO nodes VALUES (1, '112/11', 5);
> INSERT INTO data_sets VALUES (2, 'Example Data Set 2');
> INSERT INTO nodes VALUES (2, '112', 2);
> INSERT INTO nodes VALUES (2, '4/115', 3);
>
> An example of selecting data from Data Set 1:
>
> SELECT node_count
> FROM nodes
> WHERE data_set=1
> AND node_address='112';
> (returns 3)
>
> Some examples of selecting the sum of data from multiple sets:
>
> SELECT sum(node_count) AS node_count
> FROM nodes
> WHERE data_set IN (1, 2)
> AND node_address='112'
> GROUP BY node_address;
> (returns 5)
>
> SELECT node_address, sum(node_count) AS node_count
> FROM nodes
> WHERE data_set IN (1, 2)
> GROUP BY node_address;
> (returns list of all nodes in set 1 or set 2, plus sum of counts
> for both data sets)
>
> Let's create a new data set 3 and populate it with the data from the
> previous query. That is, data set 3 is the sum of data sets 1 and 2:
>
> INSERT INTO data_sets VALUES (3, 'Combined Set');
> INSERT INTO nodes
> SELECT 3 AS data_set, node_address, sum(node_count) AS node_count
> FROM nodes
> WHERE data_set IN (1, 2)
> GROUP BY node_address;
>
> SQL is a really easy way of manipulating data sets.
>
> --
> Toby A Inkster BSc (Hons) ARCS
> [Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
> [OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 11 days, 16:07.]
>
> Sharing Music with Apple iTunes
> http://tobyinkster.co.uk/blog/2007/1...tunes-sharing/


This and the wildcard search will save me 50% of programming.
Seriously, I don't understand why I didn't looked into it earlier.

Thanks for the input!

And sorry about last empty message, sent it by mistake.

Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT +1. The time now is 10:02 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0