Re: How to find holes in a sequence?
On 6 Mai, 00:46, Steve Ackman <st...@SNIP-THIS.twoloonscoffee.com>
wrote:
> In <9b3a64b4-aeab-43fb-bc07-ef960a2c3...@y38g2000hsy.googlegroups.com>,
> on Mon, 5 May 2008 13:57:43 -0700 (PDT), cvh@LE,
>
>
>
> christian.han...@cpi-service.com wrote:
> > On May 5, 9:00*pm, Steve Ackman <st...@SNIP-THIS.twoloonscoffee.com>
> > wrote:
> >> * I've got a leafnode directory with usenet articles
> >> named, roughly, 1-79389.
>
> >> After running a sh script on the directory to populate a
> >> table with various headers and the full article, I have,
>
> >> mysql> select count(*) from rcm;
> >> +----------+
> >> | count(*) |
> >> +----------+
> >> | * *79303 |
> >> +----------+
> >> 1 row in set (0.00 sec)
>
> >> and an actual article count on the directory of 79347.
>
> >> * Obviously, there are 44 articles that for some
> >> reason or other, failed the INSERT (and an additional
> >> 42 that don't exist as filenames). *Given that
> >> server_id (mediumint) is assigned the filename of the
> >> article, how can I find which numbers between 1 and
> >> 79389 *don't* exist as server_id?
>
> > I'm a bit unsure whether I correctly understand what you try to
> > accomplish,
> > but given you have a table with all numbers 1..79389
>
> * I'm not sure whether you think I have such a table,
> or whether you're suggesting I create one.
>
> > you could simply
> > run a denying left join query on both tables
> > like:
>
> > select
> > f.id as missing
> > from fulltable f left join leafnodes l on f.id=l.nodeid
> > where isnull(l.nodeid)
>
> * I have one table rcm. *That's it. *I guess I can
> look up denying left joins and see where that leads me.
> Probably to understanding what you're doing there.
>
> > *but I am afraid I'm missing the point here
>
> * The source is a directory containing filenames in the
> range of 3 to 79389.
>
> * The table "rcm" has 79303 records with server_id field
> in the range of those above filenames... so some files
> failed to be INSERTED by the script.
>
> * Very basically, given a table with record id numbers
> of 2,3,4,6,7,8,9, which numbers are missing? *How do I
> get MySQL to tell me that 1 and 5 are not there?
>
> * The ultimate point is to find out which 44 scripted
> INSERT statements failed and why, to fix the script.
>
> * I could find the missing articles via a couple of
> additional steps in the shell script, but it takes
> hours each time I run it, so I was hoping MySQL would
> have some sort of easy built-in way to do it.
I am abit surprised that running a shell script over a directory
should take so long.
Using Linux tools like awk with piped output to mysql a table (given
its created) of all filenames/nodes in a directory can easily be
created by something like:
ls -l |awk '$1!~/^d/{print "INSERT INTO mytable set filename=
\""$8"\";"}' |mysql -uxxxx -pxxxx mydatabase
of course this would take longer if your putting fulltext content in
the database, too. However this little snippet should suffice to fill
a table of all files in a directory
If you only want files of a given pattern (e.g. your integer-numbered
filename) change the snippet to something like
ls -l |awk '$1!~/^d/ && $8~/^[0-9]+/{print "INSERT INTO mytable set
filename=\""$8"\";"}' |mysql -uxxxx -pxxxx mydatabase
if your filenames contain extensions like ".png" or ".msg" etc. you
also will need to change the join condition using the substring_index
function like:
from fulltable f left join leafnodes l on substring_index(f.name,",",
1)=l.nodeid
hope this helps a bit
|