This is a discussion on preg_match_all optional subpattern within the PHP Language forums, part of the PHP Programming Forums category; Using preg_match_all, I need to capture a list of first and last names plus an optional country code proceeding them. ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Using preg_match_all, I need to capture a list of first and last names plus
an optional country code proceeding them. For example: <tr><td>AU</td><td>Jane Smith</td></tr> <tr><td></td><td>Bill Johnson</td></tr> <tr><td>GB</td><td>Larry Brown</td></tr> <tr><td>US</td><td>Mary Jordon</td></tr> <tr><td></td><td>Peter Jones</td></tr> The country code might exist, it might not. I would like the array contents to look like this: AU Jane Smith Bill Johnson US Larry Brown GB Mary Jordon Peter Jones I know a subpattern is needed of all the possible country codes: AU|GB|US but how do you include this as an optional subpattern? Thanks in advance. |
|
|||
|
Han wrote:
> Using preg_match_all, I need to capture a list of first and last names plus > an optional country code proceeding them. > > For example: > > <tr><td>AU</td><td>Jane Smith</td></tr> > <tr><td></td><td>Bill Johnson</td></tr> > <tr><td>GB</td><td>Larry Brown</td></tr> > <tr><td>US</td><td>Mary Jordon</td></tr> > <tr><td></td><td>Peter Jones</td></tr> > > [...] I know a subpattern is needed of all the possible country codes: > > AU|GB|US > > but how do you include this as an optional subpattern? The ? quantifier means zero or one of whatever came before, representable by {0,1}. Quantifying a subpattern using the question mark denotes its nonobligatory nature. So, to match optional two-letter country codes within a table cell (doesn't properly cater for attributes, but that's rectifiable): `<td.*>([a-z]{2})?</td.*>`Usi If you wish to list the possible values, precluding others: `<td.*>(au|gb|us)?</td.*>`Usi -- Jock |
|
|||
|
"Han" <nobody@nowhere.com> wrote in message news:<TT4gb.225533$mp.141550@rwcrnsc51.ops.asp.att .net>...
> Using preg_match_all, I need to capture a list of first and last names plus > an optional country code proceeding them. > > For example: > > <tr><td>AU</td><td>Jane Smith</td></tr> > <tr><td></td><td>Bill Johnson</td></tr> > <tr><td>GB</td><td>Larry Brown</td></tr> > <tr><td>US</td><td>Mary Jordon</td></tr> > <tr><td></td><td>Peter Jones</td></tr> > > The country code might exist, it might not. > > I would like the array contents to look like this: > > AU Jane Smith > Bill Johnson > US Larry Brown > GB Mary Jordon > Peter Jones > > I know a subpattern is needed of all the possible country codes: > > AU|GB|US this pattern should do the job: "{<tr><td>\s*([A-Z]{2})?\s*</td><td>\s*(\w+)?\s*(\w+)\s*</td></tr>}im" if this pattern is used in preg_match_all, it should produce the desired result. it will extract the country code if availible, first name if availible, and last name. They will be put in an 2 dim array. If no country code or first name is given, the array element will be left empty. hope this helps, sascha > > but how do you include this as an optional subpattern? > > Thanks in advance. |
|
|||
|
John,
Thank you for another detailed reply. The cryptic syntax is beginning to slowly sink in, but there's still a few nagging issues. In my price list, the amount may or may not be preceded with a $ sign. For instance, the list might look like this: $2.99 1.99 $3.00 $4.00 I modified my price pattern to accommodate this: ((\\$|\s*)?\d{1,3}\.\d{2}) which works great. The problem is, it also creates another array dimension that contains only $ or space: $ $ $ I can simply ignore this dimension, but is there a way to prevent it? Thanks (again) in advance. "John Dunlop" <john+usenet@johndunlop.info> wrote in message news:MPG.19eb262ef3b81717989777@news.freeserve.net ... > Han wrote: > > > Using preg_match_all, I need to capture a list of first and last names plus > > an optional country code proceeding them. > > > > For example: > > > > <tr><td>AU</td><td>Jane Smith</td></tr> > > <tr><td></td><td>Bill Johnson</td></tr> > > <tr><td>GB</td><td>Larry Brown</td></tr> > > <tr><td>US</td><td>Mary Jordon</td></tr> > > <tr><td></td><td>Peter Jones</td></tr> > > > > [...] I know a subpattern is needed of all the possible country codes: > > > > AU|GB|US > > > > but how do you include this as an optional subpattern? > > The ? quantifier means zero or one of whatever came before, > representable by {0,1}. Quantifying a subpattern using the > question mark denotes its nonobligatory nature. > > So, to match optional two-letter country codes within a table cell > (doesn't properly cater for attributes, but that's rectifiable): > > `<td.*>([a-z]{2})?</td.*>`Usi > > If you wish to list the possible values, precluding others: > > `<td.*>(au|gb|us)?</td.*>`Usi > > -- > Jock |
|
|||
|
Han wrote:
> ((\\$|\s*)?\d{1,3}\.\d{2}) > > which works great. The problem is, it also creates another array > dimension that contains only $ or space: > > [...] I can simply ignore this dimension, but is there a way to > prevent it? Subpatterns that begin with the two character sequence "?:" aren't captured. You could then write your pattern as: `(?:\\$|\s*)?\d{1,3}\.\d{2}` -- Jock |
|
|||
|
Jock,
That's it--thanks. I've been spending some time re-reading the pattern documentation on php.net and it's beginning to sink in. Again, much appreciated! "John Dunlop" <john+usenet@johndunlop.info> wrote in message news:MPG.19ec77cdbce17be098977c@news.freeserve.net ... > Han wrote: > > > ((\\$|\s*)?\d{1,3}\.\d{2}) > > > > which works great. The problem is, it also creates another array > > dimension that contains only $ or space: > > > > [...] I can simply ignore this dimension, but is there a way to > > prevent it? > > Subpatterns that begin with the two character sequence "?:" aren't > captured. You could then write your pattern as: > > `(?:\\$|\s*)?\d{1,3}\.\d{2}` > > -- > Jock |