preg_match_all optional subpattern

This is a discussion on preg_match_all optional subpattern within the PHP Language forums, part of the PHP Programming Forums category; Using preg_match_all, I need to capture a list of first and last names plus an optional country code proceeding them. ...


Go Back   Usenet Forums > PHP Programming Forums > PHP Language

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 10-06-2003
Han
 
Posts: n/a
Default preg_match_all optional subpattern

Using preg_match_all, I need to capture a list of first and last names plus
an optional country code proceeding them.

For example:

<tr><td>AU</td><td>Jane Smith</td></tr>
<tr><td></td><td>Bill Johnson</td></tr>
<tr><td>GB</td><td>Larry Brown</td></tr>
<tr><td>US</td><td>Mary Jordon</td></tr>
<tr><td></td><td>Peter Jones</td></tr>

The country code might exist, it might not.

I would like the array contents to look like this:

AU Jane Smith
Bill Johnson
US Larry Brown
GB Mary Jordon
Peter Jones

I know a subpattern is needed of all the possible country codes:

AU|GB|US

but how do you include this as an optional subpattern?

Thanks in advance.






Reply With Quote
  #2 (permalink)  
Old 10-06-2003
John Dunlop
 
Posts: n/a
Default Re: preg_match_all optional subpattern

Han wrote:

> Using preg_match_all, I need to capture a list of first and last names plus
> an optional country code proceeding them.
>
> For example:
>
> <tr><td>AU</td><td>Jane Smith</td></tr>
> <tr><td></td><td>Bill Johnson</td></tr>
> <tr><td>GB</td><td>Larry Brown</td></tr>
> <tr><td>US</td><td>Mary Jordon</td></tr>
> <tr><td></td><td>Peter Jones</td></tr>
>
> [...] I know a subpattern is needed of all the possible country codes:
>
> AU|GB|US
>
> but how do you include this as an optional subpattern?


The ? quantifier means zero or one of whatever came before,
representable by {0,1}. Quantifying a subpattern using the
question mark denotes its nonobligatory nature.

So, to match optional two-letter country codes within a table cell
(doesn't properly cater for attributes, but that's rectifiable):

`<td.*>([a-z]{2})?</td.*>`Usi

If you wish to list the possible values, precluding others:

`<td.*>(au|gb|us)?</td.*>`Usi

--
Jock
Reply With Quote
  #3 (permalink)  
Old 10-06-2003
s van gemmert
 
Posts: n/a
Default Re: preg_match_all optional subpattern

"Han" <nobody@nowhere.com> wrote in message news:<TT4gb.225533$mp.141550@rwcrnsc51.ops.asp.att .net>...
> Using preg_match_all, I need to capture a list of first and last names plus
> an optional country code proceeding them.
>
> For example:
>
> <tr><td>AU</td><td>Jane Smith</td></tr>
> <tr><td></td><td>Bill Johnson</td></tr>
> <tr><td>GB</td><td>Larry Brown</td></tr>
> <tr><td>US</td><td>Mary Jordon</td></tr>
> <tr><td></td><td>Peter Jones</td></tr>
>
> The country code might exist, it might not.
>
> I would like the array contents to look like this:
>
> AU Jane Smith
> Bill Johnson
> US Larry Brown
> GB Mary Jordon
> Peter Jones
>
> I know a subpattern is needed of all the possible country codes:
>
> AU|GB|US


this pattern should do the job:

"{<tr><td>\s*([A-Z]{2})?\s*</td><td>\s*(\w+)?\s*(\w+)\s*</td></tr>}im"

if this pattern is used in preg_match_all, it should produce the
desired result.
it will extract the country code if availible, first name if
availible, and last name.
They will be put in an 2 dim array. If no country code or first name
is given, the array element will be left empty.

hope this helps,

sascha



>
> but how do you include this as an optional subpattern?
>
> Thanks in advance.

Reply With Quote
  #4 (permalink)  
Old 10-07-2003
Han
 
Posts: n/a
Default Re: preg_match_all optional subpattern

John,

Thank you for another detailed reply.

The cryptic syntax is beginning to slowly sink in, but there's still a few
nagging issues.

In my price list, the amount may or may not be preceded with a $ sign.

For instance, the list might look like this:

$2.99
1.99
$3.00
$4.00

I modified my price pattern to accommodate this:

((\\$|\s*)?\d{1,3}\.\d{2})

which works great. The problem is, it also creates another array dimension
that contains only $ or space:

$

$
$

I can simply ignore this dimension, but is there a way to prevent it?

Thanks (again) in advance.

"John Dunlop" <john+usenet@johndunlop.info> wrote in message
news:MPG.19eb262ef3b81717989777@news.freeserve.net ...
> Han wrote:
>
> > Using preg_match_all, I need to capture a list of first and last names

plus
> > an optional country code proceeding them.
> >
> > For example:
> >
> > <tr><td>AU</td><td>Jane Smith</td></tr>
> > <tr><td></td><td>Bill Johnson</td></tr>
> > <tr><td>GB</td><td>Larry Brown</td></tr>
> > <tr><td>US</td><td>Mary Jordon</td></tr>
> > <tr><td></td><td>Peter Jones</td></tr>
> >
> > [...] I know a subpattern is needed of all the possible country codes:
> >
> > AU|GB|US
> >
> > but how do you include this as an optional subpattern?

>
> The ? quantifier means zero or one of whatever came before,
> representable by {0,1}. Quantifying a subpattern using the
> question mark denotes its nonobligatory nature.
>
> So, to match optional two-letter country codes within a table cell
> (doesn't properly cater for attributes, but that's rectifiable):
>
> `<td.*>([a-z]{2})?</td.*>`Usi
>
> If you wish to list the possible values, precluding others:
>
> `<td.*>(au|gb|us)?</td.*>`Usi
>
> --
> Jock




Reply With Quote
  #5 (permalink)  
Old 10-07-2003
John Dunlop
 
Posts: n/a
Default Re: preg_match_all optional subpattern

Han wrote:

> ((\\$|\s*)?\d{1,3}\.\d{2})
>
> which works great. The problem is, it also creates another array
> dimension that contains only $ or space:
>
> [...] I can simply ignore this dimension, but is there a way to
> prevent it?


Subpatterns that begin with the two character sequence "?:" aren't
captured. You could then write your pattern as:

`(?:\\$|\s*)?\d{1,3}\.\d{2}`

--
Jock
Reply With Quote
  #6 (permalink)  
Old 10-07-2003
Han
 
Posts: n/a
Default Re: preg_match_all optional subpattern

Jock,

That's it--thanks.

I've been spending some time re-reading the pattern documentation on php.net
and it's beginning to sink in.

Again, much appreciated!

"John Dunlop" <john+usenet@johndunlop.info> wrote in message
news:MPG.19ec77cdbce17be098977c@news.freeserve.net ...
> Han wrote:
>
> > ((\\$|\s*)?\d{1,3}\.\d{2})
> >
> > which works great. The problem is, it also creates another array
> > dimension that contains only $ or space:
> >
> > [...] I can simply ignore this dimension, but is there a way to
> > prevent it?

>
> Subpatterns that begin with the two character sequence "?:" aren't
> captured. You could then write your pattern as:
>
> `(?:\\$|\s*)?\d{1,3}\.\d{2}`
>
> --
> Jock



Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT +1. The time now is 06:39 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0