This is a discussion on RegExp Help Please within the alt.comp.lang.php forums, part of the PHP Programming Forums category; I am coding a function which converts text links and email address to HTML compatible tags in an email newsletter. ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
I am coding a function which converts text links and email address to
HTML compatible tags in an email newsletter. I have it working except, as the str_replace function loops through the text and, should there be a duplicate address or link, it inserts the tags again, like this <a href="<a href="someurl">someurl</a>">someurl</a> Here is the code (with debug echoes): function convert2links ($text) { preg_match_all("/[a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}/", $text, $arr0); foreach($arr0[0] as $v) { $url[] = trim(email2anchor ($v)); } preg_match_all("/((http)+(s)?:(\/\/))?[a-zA-Z0-9]+\.[a-zA-Z0-9]+\.[a-zA-Z]{2,4}(\/)?([~a-zA-Z0-9_\-.\/?=]*)?/", $text, $arr1); foreach($arr1[0] as $v) { $url[] = url2anchor ($v); } $s = array_merge ($arr0[0], $arr1[0]); $r = $url; echo "<pre>"; print_r ($s); print_r ($r); for ($i=0; $i<count($s); $i++) { $text = str_replace ($s[$i], $r[$i], $text) } echo $text; echo "</pre>"; exit; } Note: I have been using the for() loop in the debug to track the replacements. The original code used the arrays in the str_replace(). I presume that I need to evaluate whether the there is a preceding ">" to the search string, but I cannot figure out the regexp to do this. After a couple hours of failure, my frustration level has gotten to the point where nothing comes to mind. Any ideas? TIA! |
|
|||
|
If you are using preg, then you are missing two very nice features that
would most likely solve your problem. If you have a unique match, like "http://" which you know should only occur once, then you can start your preg like this: /^(?P<protocol>(?:http:\/\/){7,7}) The (?P<blah>match) construct will match whatever is in match and assign it to the return array from preg_match ASSOCIATIVELY. This means that when you run preg_match( , , $matches = array()); you will be able to extract the proto with $matches['protocol']. This is called named matching, and eliminates the need to loop through matches with for. You only need to make a positive match. The other construct I demonstrated up there is (?:blahblah) which does a subpattern match but does not assign it to the array for preg_match. Both (?P< > ) and (?: ) are just like regular parenthesis for submatching, but they extend the power. ALSO: You have a loose dash in your character class. I highly recommend you escape dashes, even though the pcre docs may not say you need to. Also as well, add a capital U next to the i at the end of your pregex for ungreedy matching. This makes sure you only get one solid line of matches. I know I didn't specifically answer your question, but maybe with this newfound info you will find a solution automatically. :-) "Tyrone Slothrop" <ts@paranoids.com> wrote in message news:6f4nu155oc1dgg7rlc2ri21b37pic2l8u9@4ax.com... > I am coding a function which converts text links and email address to > HTML compatible tags in an email newsletter. > > I have it working except, as the str_replace function loops through > the text and, should there be a duplicate address or link, it inserts > the tags again, like this > <a href="<a href="someurl">someurl</a>">someurl</a> > > Here is the code (with debug echoes): > function convert2links ($text) > { > preg_match_all("/[a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}/", > $text, $arr0); > foreach($arr0[0] as $v) { $url[] = trim(email2anchor ($v)); } > > preg_match_all("/((http)+(s)?:(\/\/))?[a-zA-Z0-9]+\.[a-zA-Z0-9]+\.[a-zA-Z]{2 ,4}(\/)?([~a-zA-Z0-9_\-.\/?=]*)?/", > $text, $arr1); > foreach($arr1[0] as $v) { $url[] = url2anchor ($v); } > $s = array_merge ($arr0[0], $arr1[0]); > $r = $url; > echo "<pre>"; > print_r ($s); > print_r ($r); > for ($i=0; $i<count($s); $i++) > { > $text = str_replace ($s[$i], $r[$i], $text) > } > echo $text; > echo "</pre>"; > exit; > } > > Note: I have been using the for() loop in the debug to track the > replacements. The original code used the arrays in the str_replace(). > > I presume that I need to evaluate whether the there is a preceding ">" > to the search string, but I cannot figure out the regexp to do this. > After a couple hours of failure, my frustration level has gotten to > the point where nothing comes to mind. > > Any ideas? > > TIA! > > |
|
|||
|
On Thu, 9 Feb 2006 18:40:39 -0800, "Connector5"
<junkmilenko@charter.net> wrote: >If you are using preg, then you are missing two very nice features that >would most likely solve your problem. > >If you have a unique match, like "http://" which you know should only occur >once, then you can start your preg like this: > > /^(?P<protocol>(?:http:\/\/){7,7}) > >The (?P<blah>match) construct will match whatever is in match and assign it >to the return array from preg_match ASSOCIATIVELY. This means that when you >run preg_match( , , $matches = array()); you will be able to extract the >proto with $matches['protocol']. This is called named matching, and >eliminates the need to loop through matches with for. You only need to make >a positive match. > >The other construct I demonstrated up there is (?:blahblah) which does a >subpattern match but does not assign it to the array for preg_match. > >Both (?P< > ) and (?: ) are just like regular parenthesis for submatching, >but they extend the power. > > > >ALSO: You have a loose dash in your character class. I highly recommend >you escape dashes, even though the pcre docs may not say you need to. Also >as well, add a capital U next to the i at the end of your pregex for >ungreedy matching. This makes sure you only get one solid line of matches. > > >I know I didn't specifically answer your question, but maybe with this >newfound info you will find a solution automatically. :-) I managed to get it all working 100% and under all conditions I could think of throwing at it. Thanks! |