How to find holes in a sequence?

This is a discussion on How to find holes in a sequence? within the MySQL Database forums, part of the Database Forums category; I've got a leafnode directory with usenet articles named, roughly, 1-79389. After running a sh script on the ...


Go Back   Usenet Forums > Database Forums > MySQL Database

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 1 Week Ago
Steve Ackman
 
Posts: n/a
Default How to find holes in a sequence?

I've got a leafnode directory with usenet articles
named, roughly, 1-79389.

After running a sh script on the directory to populate a
table with various headers and the full article, I have,

mysql> select count(*) from rcm;
+----------+
| count(*) |
+----------+
| 79303 |
+----------+
1 row in set (0.00 sec)

and an actual article count on the directory of 79347.

Obviously, there are 44 articles that for some
reason or other, failed the INSERT (and an additional
42 that don't exist as filenames). Given that
server_id (mediumint) is assigned the filename of the
article, how can I find which numbers between 1 and
79389 *don't* exist as server_id?
Reply With Quote
  #2 (permalink)  
Old 1 Week Ago
cvh@LE
 
Posts: n/a
Default Re: How to find holes in a sequence?

On May 5, 9:00*pm, Steve Ackman <st...@SNIP-THIS.twoloonscoffee.com>
wrote:
> * I've got a leafnode directory with usenet articles
> named, roughly, 1-79389.
>
> After running a sh script on the directory to populate a
> table with various headers and the full article, I have,
>
> mysql> select count(*) from rcm;
> +----------+
> | count(*) |
> +----------+
> | * *79303 |
> +----------+
> 1 row in set (0.00 sec)
>
> and an actual article count on the directory of 79347.
>
> * Obviously, there are 44 articles that for some
> reason or other, failed the INSERT (and an additional
> 42 that don't exist as filenames). *Given that
> server_id (mediumint) is assigned the filename of the
> article, how can I find which numbers between 1 and
> 79389 *don't* exist as server_id?


I'm a bit unsure whether I correctly understand what you try to
accomplish,
but given you have a table with all numbers 1..79389 you could simply
run a denying left join query on both tables
like:

select
f.id as missing
from fulltable f left join leafnodes l on f.id=l.nodeid
where isnull(l.nodeid)

but I am afraid I'm missing the point here

Reply With Quote
  #3 (permalink)  
Old 1 Week Ago
Steve Ackman
 
Posts: n/a
Default Re: How to find holes in a sequence?

In <9b3a64b4-aeab-43fb-bc07-ef960a2c34b0@y38g2000hsy.googlegroups.com>,
on Mon, 5 May 2008 13:57:43 -0700 (PDT), cvh@LE,
christian.hansel@cpi-service.com wrote:
> On May 5, 9:00*pm, Steve Ackman <st...@SNIP-THIS.twoloonscoffee.com>
> wrote:
>> * I've got a leafnode directory with usenet articles
>> named, roughly, 1-79389.
>>
>> After running a sh script on the directory to populate a
>> table with various headers and the full article, I have,
>>
>> mysql> select count(*) from rcm;
>> +----------+
>> | count(*) |
>> +----------+
>> | * *79303 |
>> +----------+
>> 1 row in set (0.00 sec)
>>
>> and an actual article count on the directory of 79347.
>>
>> * Obviously, there are 44 articles that for some
>> reason or other, failed the INSERT (and an additional
>> 42 that don't exist as filenames). *Given that
>> server_id (mediumint) is assigned the filename of the
>> article, how can I find which numbers between 1 and
>> 79389 *don't* exist as server_id?

>
> I'm a bit unsure whether I correctly understand what you try to
> accomplish,
> but given you have a table with all numbers 1..79389


I'm not sure whether you think I have such a table,
or whether you're suggesting I create one.

> you could simply
> run a denying left join query on both tables
> like:
>
> select
> f.id as missing
> from fulltable f left join leafnodes l on f.id=l.nodeid
> where isnull(l.nodeid)


I have one table rcm. That's it. I guess I can
look up denying left joins and see where that leads me.
Probably to understanding what you're doing there.

> but I am afraid I'm missing the point here


The source is a directory containing filenames in the
range of 3 to 79389.

The table "rcm" has 79303 records with server_id field
in the range of those above filenames... so some files
failed to be INSERTED by the script.

Very basically, given a table with record id numbers
of 2,3,4,6,7,8,9, which numbers are missing? How do I
get MySQL to tell me that 1 and 5 are not there?

The ultimate point is to find out which 44 scripted
INSERT statements failed and why, to fix the script.

I could find the missing articles via a couple of
additional steps in the shell script, but it takes
hours each time I run it, so I was hoping MySQL would
have some sort of easy built-in way to do it.
Reply With Quote
  #4 (permalink)  
Old 1 Week Ago
Peter H. Coffin
 
Posts: n/a
Default Re: How to find holes in a sequence?

On Mon, 5 May 2008 18:46:19 -0400, Steve Ackman wrote:
> In <9b3a64b4-aeab-43fb-bc07-ef960a2c34b0@y38g2000hsy.googlegroups.com>,
> on Mon, 5 May 2008 13:57:43 -0700 (PDT), cvh@LE,
> christian.hansel@cpi-service.com wrote:
>> On May 5, 9:00*pm, Steve Ackman <st...@SNIP-THIS.twoloonscoffee.com>
>> wrote:
>>> * I've got a leafnode directory with usenet articles
>>> named, roughly, 1-79389.
>>>
>>> After running a sh script on the directory to populate a
>>> table with various headers and the full article, I have,
>>>
>>> mysql> select count(*) from rcm;
>>> +----------+
>>> | count(*) |
>>> +----------+
>>> | * *79303 |
>>> +----------+
>>> 1 row in set (0.00 sec)
>>>
>>> and an actual article count on the directory of 79347.
>>>
>>> * Obviously, there are 44 articles that for some
>>> reason or other, failed the INSERT (and an additional
>>> 42 that don't exist as filenames). *Given that
>>> server_id (mediumint) is assigned the filename of the
>>> article, how can I find which numbers between 1 and
>>> 79389 *don't* exist as server_id?

>>
>> I'm a bit unsure whether I correctly understand what you try to
>> accomplish,
>> but given you have a table with all numbers 1..79389

>
> I'm not sure whether you think I have such a table,
> or whether you're suggesting I create one.


Create one.

ome web searching will turn up a variety of tricks for creating a table
filled with test data.

--
43. I will maintain a healthy amount of skepticism when I capture the beautiful
rebel and she claims she is attracted to my power and good looks and will
gladly betray her companions if I just let her in on my plans.
--Peter Anspach's list of things to do as an Evil Overlord
Reply With Quote
  #5 (permalink)  
Old 1 Week Ago
cvh@LE
 
Posts: n/a
Default Re: How to find holes in a sequence?

On 6 Mai, 00:46, Steve Ackman <st...@SNIP-THIS.twoloonscoffee.com>
wrote:
> In <9b3a64b4-aeab-43fb-bc07-ef960a2c3...@y38g2000hsy.googlegroups.com>,
> on Mon, 5 May 2008 13:57:43 -0700 (PDT), cvh@LE,
>
>
>
> christian.han...@cpi-service.com wrote:
> > On May 5, 9:00*pm, Steve Ackman <st...@SNIP-THIS.twoloonscoffee.com>
> > wrote:
> >> * I've got a leafnode directory with usenet articles
> >> named, roughly, 1-79389.

>
> >> After running a sh script on the directory to populate a
> >> table with various headers and the full article, I have,

>
> >> mysql> select count(*) from rcm;
> >> +----------+
> >> | count(*) |
> >> +----------+
> >> | * *79303 |
> >> +----------+
> >> 1 row in set (0.00 sec)

>
> >> and an actual article count on the directory of 79347.

>
> >> * Obviously, there are 44 articles that for some
> >> reason or other, failed the INSERT (and an additional
> >> 42 that don't exist as filenames). *Given that
> >> server_id (mediumint) is assigned the filename of the
> >> article, how can I find which numbers between 1 and
> >> 79389 *don't* exist as server_id?

>
> > I'm a bit unsure whether I correctly understand what you try to
> > accomplish,
> > but given you have a table with all numbers 1..79389

>
> * I'm not sure whether you think I have such a table,
> or whether you're suggesting I create one.
>
> > you could simply
> > run a denying left join query on both tables
> > like:

>
> > select
> > f.id as missing
> > from fulltable f left join leafnodes l on f.id=l.nodeid
> > where isnull(l.nodeid)

>
> * I have one table rcm. *That's it. *I guess I can
> look up denying left joins and see where that leads me.
> Probably to understanding what you're doing there.
>
> > *but I am afraid I'm missing the point here

>
> * The source is a directory containing filenames in the
> range of 3 to 79389.
>
> * The table "rcm" has 79303 records with server_id field
> in the range of those above filenames... so some files
> failed to be INSERTED by the script.
>
> * Very basically, given a table with record id numbers
> of 2,3,4,6,7,8,9, which numbers are missing? *How do I
> get MySQL to tell me that 1 and 5 are not there?
>
> * The ultimate point is to find out which 44 scripted
> INSERT statements failed and why, to fix the script.
>
> * I could find the missing articles via a couple of
> additional steps in the shell script, but it takes
> hours each time I run it, so I was hoping MySQL would
> have some sort of easy built-in way to do it.


I am abit surprised that running a shell script over a directory
should take so long.

Using Linux tools like awk with piped output to mysql a table (given
its created) of all filenames/nodes in a directory can easily be
created by something like:

ls -l |awk '$1!~/^d/{print "INSERT INTO mytable set filename=
\""$8"\";"}' |mysql -uxxxx -pxxxx mydatabase

of course this would take longer if your putting fulltext content in
the database, too. However this little snippet should suffice to fill
a table of all files in a directory

If you only want files of a given pattern (e.g. your integer-numbered
filename) change the snippet to something like

ls -l |awk '$1!~/^d/ && $8~/^[0-9]+/{print "INSERT INTO mytable set
filename=\""$8"\";"}' |mysql -uxxxx -pxxxx mydatabase

if your filenames contain extensions like ".png" or ".msg" etc. you
also will need to change the join condition using the substring_index
function like:

from fulltable f left join leafnodes l on substring_index(f.name,",",
1)=l.nodeid

hope this helps a bit

Reply With Quote
  #6 (permalink)  
Old 6 Days Ago
Steve Ackman
 
Posts: n/a
Default Re: How to find holes in a sequence?

In <41e20971-4fa0-41eb-9eff-b2bcddb1d6a5@d45g2000hsc.googlegroups.com>,
on Tue, 6 May 2008 00:08:15 -0700 (PDT), cvh@LE,
christian.hansel@cpi-service.com wrote:

> I am abit surprised that running a shell script over a directory
> should take so long.


It takes ~34 minutes on this machine just to do this:

#! /bin/sh

i=3
while [ $i -le 79389 ]
do
echo $i
mysql news <<MARKER
INSERT into count
(id)
VALUES
($i);
MARKER
i=$((i+1))
done

Now that it's made, I can look at your earlier post
re: left join deny to see if I can actually make any
sense of it. ;-)

> Using Linux tools like awk with piped output to mysql a table (given
> its created) of all filenames/nodes in a directory can easily be
> created by something like:
>
> ls -l |awk '$1!~/^d/{print "INSERT INTO mytable set filename=
> \""$8"\";"}' |mysql -uxxxx -pxxxx mydatabase


I'm breaking out headers (with awk/sed) so I can do:

mysql news <<MARKER
# DELETE FROM rcm WHERE server_id = $i;
INSERT into rcm
(server_id,Message_ID,Sender,Subject,Date,User_Age nt,NNTP_Posting_Host,X_Trace,Content_Type,Xref,Art icle)
VALUES
($i,"$Message_ID","$Sender",'$Subject','$Date','$U ser_Agent','$NNTP_Posting_Host','$X_Trace','$Conte nt_Type','$Xref',"$Article");
MARKER

> of course this would take longer if your putting fulltext content in
> the database, too.


I haven't decided about fulltext yet... or whether
to include Lines and Bytes.

>However this little snippet should suffice to fill
> a table of all files in a directory


I don't see that simply filling a table with files
would be particularly useful (would it?), though I
suppose searching keywords would be quicker than using
grep.

Reply With Quote
  #7 (permalink)  
Old 6 Days Ago
Helmut Wais
 
Posts: n/a
Default Re: How to find holes in a sequence?

Steve Ackman wrote:
> In <41e20971-4fa0-41eb-9eff-b2bcddb1d6a5@d45g2000hsc.googlegroups.com>,
> on Tue, 6 May 2008 00:08:15 -0700 (PDT), cvh@LE,
> christian.hansel@cpi-service.com wrote:
>
>> I am abit surprised that running a shell script over a directory
>> should take so long.

>
> It takes ~34 minutes on this machine just to do this:
>
> #! /bin/sh
>
> i=3
> while [ $i -le 79389 ]
> do
> echo $i
> mysql news <<MARKER
> INSERT into count
> (id)
> VALUES
> ($i);
> MARKER
> i=$((i+1))
> done


which uses one DB connection and one mysql process for each row.

try this:

( i=3
while [ $i -le 79389 ]
do
echo "INSERT into count (id) VALUES ($i);"
i=$((i+1))
done
) | mysql news

regards, helmut
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 06:21 PM.


Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0