This is a discussion on How to find holes in a sequence? within the MySQL Database forums, part of the Database Forums category; I've got a leafnode directory with usenet articles named, roughly, 1-79389. After running a sh script on the ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I've got a leafnode directory with usenet articles
named, roughly, 1-79389. After running a sh script on the directory to populate a table with various headers and the full article, I have, mysql> select count(*) from rcm; +----------+ | count(*) | +----------+ | 79303 | +----------+ 1 row in set (0.00 sec) and an actual article count on the directory of 79347. Obviously, there are 44 articles that for some reason or other, failed the INSERT (and an additional 42 that don't exist as filenames). Given that server_id (mediumint) is assigned the filename of the article, how can I find which numbers between 1 and 79389 *don't* exist as server_id? |
|
|||
|
On May 5, 9:00*pm, Steve Ackman <st...@SNIP-THIS.twoloonscoffee.com>
wrote: > * I've got a leafnode directory with usenet articles > named, roughly, 1-79389. > > After running a sh script on the directory to populate a > table with various headers and the full article, I have, > > mysql> select count(*) from rcm; > +----------+ > | count(*) | > +----------+ > | * *79303 | > +----------+ > 1 row in set (0.00 sec) > > and an actual article count on the directory of 79347. > > * Obviously, there are 44 articles that for some > reason or other, failed the INSERT (and an additional > 42 that don't exist as filenames). *Given that > server_id (mediumint) is assigned the filename of the > article, how can I find which numbers between 1 and > 79389 *don't* exist as server_id? I'm a bit unsure whether I correctly understand what you try to accomplish, but given you have a table with all numbers 1..79389 you could simply run a denying left join query on both tables like: select f.id as missing from fulltable f left join leafnodes l on f.id=l.nodeid where isnull(l.nodeid) but I am afraid I'm missing the point here |
|
|||
|
In <9b3a64b4-aeab-43fb-bc07-ef960a2c34b0@y38g2000hsy.googlegroups.com>,
on Mon, 5 May 2008 13:57:43 -0700 (PDT), cvh@LE, christian.hansel@cpi-service.com wrote: > On May 5, 9:00*pm, Steve Ackman <st...@SNIP-THIS.twoloonscoffee.com> > wrote: >> * I've got a leafnode directory with usenet articles >> named, roughly, 1-79389. >> >> After running a sh script on the directory to populate a >> table with various headers and the full article, I have, >> >> mysql> select count(*) from rcm; >> +----------+ >> | count(*) | >> +----------+ >> | * *79303 | >> +----------+ >> 1 row in set (0.00 sec) >> >> and an actual article count on the directory of 79347. >> >> * Obviously, there are 44 articles that for some >> reason or other, failed the INSERT (and an additional >> 42 that don't exist as filenames). *Given that >> server_id (mediumint) is assigned the filename of the >> article, how can I find which numbers between 1 and >> 79389 *don't* exist as server_id? > > I'm a bit unsure whether I correctly understand what you try to > accomplish, > but given you have a table with all numbers 1..79389 I'm not sure whether you think I have such a table, or whether you're suggesting I create one. > you could simply > run a denying left join query on both tables > like: > > select > f.id as missing > from fulltable f left join leafnodes l on f.id=l.nodeid > where isnull(l.nodeid) I have one table rcm. That's it. I guess I can look up denying left joins and see where that leads me. Probably to understanding what you're doing there. > but I am afraid I'm missing the point here The source is a directory containing filenames in the range of 3 to 79389. The table "rcm" has 79303 records with server_id field in the range of those above filenames... so some files failed to be INSERTED by the script. Very basically, given a table with record id numbers of 2,3,4,6,7,8,9, which numbers are missing? How do I get MySQL to tell me that 1 and 5 are not there? The ultimate point is to find out which 44 scripted INSERT statements failed and why, to fix the script. I could find the missing articles via a couple of additional steps in the shell script, but it takes hours each time I run it, so I was hoping MySQL would have some sort of easy built-in way to do it. |
|
|||
|
On Mon, 5 May 2008 18:46:19 -0400, Steve Ackman wrote:
> In <9b3a64b4-aeab-43fb-bc07-ef960a2c34b0@y38g2000hsy.googlegroups.com>, > on Mon, 5 May 2008 13:57:43 -0700 (PDT), cvh@LE, > christian.hansel@cpi-service.com wrote: >> On May 5, 9:00*pm, Steve Ackman <st...@SNIP-THIS.twoloonscoffee.com> >> wrote: >>> * I've got a leafnode directory with usenet articles >>> named, roughly, 1-79389. >>> >>> After running a sh script on the directory to populate a >>> table with various headers and the full article, I have, >>> >>> mysql> select count(*) from rcm; >>> +----------+ >>> | count(*) | >>> +----------+ >>> | * *79303 | >>> +----------+ >>> 1 row in set (0.00 sec) >>> >>> and an actual article count on the directory of 79347. >>> >>> * Obviously, there are 44 articles that for some >>> reason or other, failed the INSERT (and an additional >>> 42 that don't exist as filenames). *Given that >>> server_id (mediumint) is assigned the filename of the >>> article, how can I find which numbers between 1 and >>> 79389 *don't* exist as server_id? >> >> I'm a bit unsure whether I correctly understand what you try to >> accomplish, >> but given you have a table with all numbers 1..79389 > > I'm not sure whether you think I have such a table, > or whether you're suggesting I create one. Create one. ome web searching will turn up a variety of tricks for creating a table filled with test data. -- 43. I will maintain a healthy amount of skepticism when I capture the beautiful rebel and she claims she is attracted to my power and good looks and will gladly betray her companions if I just let her in on my plans. --Peter Anspach's list of things to do as an Evil Overlord |
|
|||
|
On 6 Mai, 00:46, Steve Ackman <st...@SNIP-THIS.twoloonscoffee.com>
wrote: > In <9b3a64b4-aeab-43fb-bc07-ef960a2c3...@y38g2000hsy.googlegroups.com>, > on Mon, 5 May 2008 13:57:43 -0700 (PDT), cvh@LE, > > > > christian.han...@cpi-service.com wrote: > > On May 5, 9:00*pm, Steve Ackman <st...@SNIP-THIS.twoloonscoffee.com> > > wrote: > >> * I've got a leafnode directory with usenet articles > >> named, roughly, 1-79389. > > >> After running a sh script on the directory to populate a > >> table with various headers and the full article, I have, > > >> mysql> select count(*) from rcm; > >> +----------+ > >> | count(*) | > >> +----------+ > >> | * *79303 | > >> +----------+ > >> 1 row in set (0.00 sec) > > >> and an actual article count on the directory of 79347. > > >> * Obviously, there are 44 articles that for some > >> reason or other, failed the INSERT (and an additional > >> 42 that don't exist as filenames). *Given that > >> server_id (mediumint) is assigned the filename of the > >> article, how can I find which numbers between 1 and > >> 79389 *don't* exist as server_id? > > > I'm a bit unsure whether I correctly understand what you try to > > accomplish, > > but given you have a table with all numbers 1..79389 > > * I'm not sure whether you think I have such a table, > or whether you're suggesting I create one. > > > you could simply > > run a denying left join query on both tables > > like: > > > select > > f.id as missing > > from fulltable f left join leafnodes l on f.id=l.nodeid > > where isnull(l.nodeid) > > * I have one table rcm. *That's it. *I guess I can > look up denying left joins and see where that leads me. > Probably to understanding what you're doing there. > > > *but I am afraid I'm missing the point here > > * The source is a directory containing filenames in the > range of 3 to 79389. > > * The table "rcm" has 79303 records with server_id field > in the range of those above filenames... so some files > failed to be INSERTED by the script. > > * Very basically, given a table with record id numbers > of 2,3,4,6,7,8,9, which numbers are missing? *How do I > get MySQL to tell me that 1 and 5 are not there? > > * The ultimate point is to find out which 44 scripted > INSERT statements failed and why, to fix the script. > > * I could find the missing articles via a couple of > additional steps in the shell script, but it takes > hours each time I run it, so I was hoping MySQL would > have some sort of easy built-in way to do it. I am abit surprised that running a shell script over a directory should take so long. Using Linux tools like awk with piped output to mysql a table (given its created) of all filenames/nodes in a directory can easily be created by something like: ls -l |awk '$1!~/^d/{print "INSERT INTO mytable set filename= \""$8"\";"}' |mysql -uxxxx -pxxxx mydatabase of course this would take longer if your putting fulltext content in the database, too. However this little snippet should suffice to fill a table of all files in a directory If you only want files of a given pattern (e.g. your integer-numbered filename) change the snippet to something like ls -l |awk '$1!~/^d/ && $8~/^[0-9]+/{print "INSERT INTO mytable set filename=\""$8"\";"}' |mysql -uxxxx -pxxxx mydatabase if your filenames contain extensions like ".png" or ".msg" etc. you also will need to change the join condition using the substring_index function like: from fulltable f left join leafnodes l on substring_index(f.name,",", 1)=l.nodeid hope this helps a bit |
|
|||
|
In <41e20971-4fa0-41eb-9eff-b2bcddb1d6a5@d45g2000hsc.googlegroups.com>,
on Tue, 6 May 2008 00:08:15 -0700 (PDT), cvh@LE, christian.hansel@cpi-service.com wrote: > I am abit surprised that running a shell script over a directory > should take so long. It takes ~34 minutes on this machine just to do this: #! /bin/sh i=3 while [ $i -le 79389 ] do echo $i mysql news <<MARKER INSERT into count (id) VALUES ($i); MARKER i=$((i+1)) done Now that it's made, I can look at your earlier post re: left join deny to see if I can actually make any sense of it. ;-) > Using Linux tools like awk with piped output to mysql a table (given > its created) of all filenames/nodes in a directory can easily be > created by something like: > > ls -l |awk '$1!~/^d/{print "INSERT INTO mytable set filename= > \""$8"\";"}' |mysql -uxxxx -pxxxx mydatabase I'm breaking out headers (with awk/sed) so I can do: mysql news <<MARKER # DELETE FROM rcm WHERE server_id = $i; INSERT into rcm (server_id,Message_ID,Sender,Subject,Date,User_Age nt,NNTP_Posting_Host,X_Trace,Content_Type,Xref,Art icle) VALUES ($i,"$Message_ID","$Sender",'$Subject','$Date','$U ser_Agent','$NNTP_Posting_Host','$X_Trace','$Conte nt_Type','$Xref',"$Article"); MARKER > of course this would take longer if your putting fulltext content in > the database, too. I haven't decided about fulltext yet... or whether to include Lines and Bytes. >However this little snippet should suffice to fill > a table of all files in a directory I don't see that simply filling a table with files would be particularly useful (would it?), though I suppose searching keywords would be quicker than using grep. |
|
|||
|
Steve Ackman wrote:
> In <41e20971-4fa0-41eb-9eff-b2bcddb1d6a5@d45g2000hsc.googlegroups.com>, > on Tue, 6 May 2008 00:08:15 -0700 (PDT), cvh@LE, > christian.hansel@cpi-service.com wrote: > >> I am abit surprised that running a shell script over a directory >> should take so long. > > It takes ~34 minutes on this machine just to do this: > > #! /bin/sh > > i=3 > while [ $i -le 79389 ] > do > echo $i > mysql news <<MARKER > INSERT into count > (id) > VALUES > ($i); > MARKER > i=$((i+1)) > done which uses one DB connection and one mysql process for each row. try this: ( i=3 while [ $i -le 79389 ] do echo "INSERT into count (id) VALUES ($i);" i=$((i+1)) done ) | mysql news regards, helmut |
![]() |
| Thread Tools | |
| Display Modes | |
|
|