This is a discussion on regex question within the Linux General forums, part of the Linux Forums category; I need to find patterns like these (e.g. with sed or perl or grep): G1150G111 00443E104 etc. That is, ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I need to find patterns like these (e.g. with sed or perl or grep):
G1150G111 00443E104 etc. That is, 9 digit words made only of letters or digits, of which at least one character is a digit. The letters can occur in random positions. What would be the pattern to match? Thanks! |
|
|||
|
In article <pan.2008.05.25.17.19.45@verizon.net>,
Amadeus W.M. wrote: > I need to find patterns like these (e.g. with sed or perl or grep): > > G1150G111 > 00443E104 > > etc. That is, 9 digit words made only of letters or digits, of which > at least one character is a digit. The letters can occur in random > positions. perl -ne ' while (/(^|[^[:alnum:]])([[:alnum:]]{9})([^[:alnum:]]|$)/g) { if ($2 =~ /[[:digit:]]/) {print;last;} }' <infile >outfile Regards, Marcel -- printf -v email $(echo \ 155 141 162 143 145 154 155 141 162 \ 143 145 154 100 157 162 141 156 147 145 56 156 154 | tr \ \\) # O Herr, lass Hirn vom Himmel fallen! # |
|
|||
|
On Mon, 26 May 2008 00:10:08 +0200, Marcel Bruinsma wrote:
> In article <pan.2008.05.25.17.19.45@verizon.net>, > Amadeus W.M. wrote: > >> I need to find patterns like these (e.g. with sed or perl or grep): >> >> G1150G111 >> 00443E104 >> >> etc. That is, 9 digit words made only of letters or digits, of which at >> least one character is a digit. The letters can occur in random >> positions. > > perl -ne ' > while (/(^|[^[:alnum:]])([[:alnum:]]{9})([^[:alnum:]]|$)/g) { > if ($2 =~ /[[:digit:]]/) {print;last;} > }' <infile >outfile > > > Regards, > Marcel Thanks! I'm not sure this will work for what I need though. Perhaps my initial question was incomplete. I have a file with many lines of the form company type companyId $amount #shares etc. For instance: GENERAL MTRS CORP Preferred 370442691 4,602 200,000 Shrs Shared-Defined 1 200,000 The file has many lines like this, but not only. I'm trying to find the lines of this form, and within each line found, extract the companyId, $amount and #shares. To thie ens, I'm searching for the pattern "companyId $amount #shares". I have something like ([[:alnum:]]{9})\s+(\$?number_pattern)\s+(number_pattern) where number_pattern is something that matches numbers, with or without commas. With [[:alnum:]]{9} for companyId, followed by the other two patterns, in the example above I pick up companyId = Preferred $amount = 370442691 #shares = 4,602 which would be wrong (but the program thinks it's ok). I need to change the companyId pattern from the simple minded [[:alnum:]]{9} to something to include at least 1 digit. And keep the next two patterns. |
|
|||
|
On Mon, 26 May 2008 00:10:08 +0200, Marcel Bruinsma wrote:
> In article <pan.2008.05.25.17.19.45@verizon.net>, > Amadeus W.M. wrote: > >> I need to find patterns like these (e.g. with sed or perl or grep): >> >> G1150G111 >> 00443E104 >> >> etc. That is, 9 digit words made only of letters or digits, of which at >> least one character is a digit. The letters can occur in random >> positions. > > perl -ne ' > while (/(^|[^[:alnum:]])([[:alnum:]]{9})([^[:alnum:]]|$)/g) { > if ($2 =~ /[[:digit:]]/) {print;last;} > }' <infile >outfile > > > Regards, > Marcel I guess I need something like ([[:digit:]][[:alnum:]]{8})|([[:alnum:]]{1}[[:digit:]][[:alnum:]]{7})| etc. That is, keep moving the [[:digit:]] over each of the 9 possible positions. Is there a smarter way to write this? |
|
|||
|
In article <pan.2008.05.26.02.15.07@verizon.net>,
Amadeus W.M. wrote: >> >>> I need to find patterns like these (e.g. with sed or perl or grep): >>> >>> G1150G111 >>> 00443E104 >>> >> perl -ne ' >> while (/(^|[^[:alnum:]])([[:alnum:]]{9})([^[:alnum:]]|$)/g) { >> if ($2 =~ /[[:digit:]]/) {print;last;} >> }' <infile >outfile > > Thanks! I'm not sure this will work for what I need though. Perhaps my > initial question was incomplete. I have a file with many lines of the > form > > company type companyId $amount #shares etc. > > For instance: > > GENERAL MTRS CORP Preferred 370442691 4,602 200,000 > Shrs Shared-Defined 1 200,000 > > > The file has many lines like this, but not only. I'm trying to find > the lines of this form, and within each line found, extract the > companyId, $amount and #shares. > > To thie ens, I'm searching for the pattern "companyId $amount > #shares". I have something like > > ([[:alnum:]]{9})\s+(\$?number_pattern)\s+(number_pattern) > > where number_pattern is something that matches numbers, with or > without commas. With [[:alnum:]]{9} for companyId, followed by the > other two patterns, in the example above I pick up > > companyId = Preferred > $amount = 370442691 > #shares = 4,602 > > which would be wrong (but the program thinks it's ok). I need to > change the companyId pattern from the simple minded [[:alnum:]]{9} to > something to include at least 1 digit. And keep the next two patterns. The '$2 =~ /[[:digit:]]/' is the check for 'at least one digit', but only one expression is also possible, just more complicated: #!/usr/bin/perl $p = '[[:blank:]]([[:digit:]][[:alnum:]]{8}|' ..'[[:alnum:]][[:digit:]][[:alnum:]]{7}|' ..'[[:alnum:]]{2}[[:digit:]][[:alnum:]]{6}|' ..'[[:alnum:]]{3}[[:digit:]][[:alnum:]]{5}|' ..'[[:alnum:]]{4}[[:digit:]][[:alnum:]]{4}|' ..'[[:alnum:]]{5}[[:digit:]][[:alnum:]]{3}|' ..'[[:alnum:]]{6}[[:digit:]][[:alnum:]]{2}|' ..'[[:alnum:]]{7}[[:digit:]][[:alnum:]]|' ..'[[:alnum:]]{8}[[:digit:]])' ..'[[:blank:]]+([[:digit:]]+,[[:digit:]]+)' ..'[[:blank:]]+([[:digit:]]+,[[:digit:]]+)'; while (<DATA>) { if (/$p/o) { $companyID = $1; $amount = $2; $shares = $3; print "|$companyID|$amount|$shares|\n"; } } __END__ GENERAL MTRS CORP Preferred 370442691 4,602 200,000 Shrs Shared-Defined 1 200,000 If the companyIDs contain no lowercase, you should replace all '[:alnum:]' by '[:upper:][:digit:]'. Regards, Marcel -- printf -v email $(echo \ 155 141 162 143 145 154 155 141 162 \ 143 145 154 100 157 162 141 156 147 145 56 156 154 | tr \ \\) # O Herr, lass Hirn vom Himmel fallen! # |