RegExp Help Please

This is a discussion on RegExp Help Please within the alt.comp.lang.php forums, part of the PHP Programming Forums category; I am coding a function which converts text links and email address to HTML compatible tags in an email newsletter. ...


Go Back   Usenet Forums > PHP Programming Forums > alt.comp.lang.php

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-09-2006
Tyrone Slothrop
 
Posts: n/a
Default RegExp Help Please

I am coding a function which converts text links and email address to
HTML compatible tags in an email newsletter.

I have it working except, as the str_replace function loops through
the text and, should there be a duplicate address or link, it inserts
the tags again, like this
<a href="<a href="someurl">someurl</a>">someurl</a>

Here is the code (with debug echoes):
function convert2links ($text)
{
preg_match_all("/[a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}/",
$text, $arr0);
foreach($arr0[0] as $v) { $url[] = trim(email2anchor ($v)); }

preg_match_all("/((http)+(s)?:(\/\/))?[a-zA-Z0-9]+\.[a-zA-Z0-9]+\.[a-zA-Z]{2,4}(\/)?([~a-zA-Z0-9_\-.\/?=]*)?/",
$text, $arr1);
foreach($arr1[0] as $v) { $url[] = url2anchor ($v); }
$s = array_merge ($arr0[0], $arr1[0]);
$r = $url;
echo "<pre>";
print_r ($s);
print_r ($r);
for ($i=0; $i<count($s); $i++)
{
$text = str_replace ($s[$i], $r[$i], $text)
}
echo $text;
echo "</pre>";
exit;
}

Note: I have been using the for() loop in the debug to track the
replacements. The original code used the arrays in the str_replace().

I presume that I need to evaluate whether the there is a preceding ">"
to the search string, but I cannot figure out the regexp to do this.
After a couple hours of failure, my frustration level has gotten to
the point where nothing comes to mind.

Any ideas?

TIA!


Reply With Quote
  #2 (permalink)  
Old 02-09-2006
Drakazz
 
Posts: n/a
Default Re: RegExp Help Please

If anything, try print_r with $arr1 and seethe results, as i remember
that the key 0 is the full match, and you should find your match in one
of the keys

Reply With Quote
  #3 (permalink)  
Old 02-10-2006
Connector5
 
Posts: n/a
Default Re: RegExp Help Please

If you are using preg, then you are missing two very nice features that
would most likely solve your problem.

If you have a unique match, like "http://" which you know should only occur
once, then you can start your preg like this:

/^(?P<protocol>(?:http:\/\/){7,7})

The (?P<blah>match) construct will match whatever is in match and assign it
to the return array from preg_match ASSOCIATIVELY. This means that when you
run preg_match( , , $matches = array()); you will be able to extract the
proto with $matches['protocol']. This is called named matching, and
eliminates the need to loop through matches with for. You only need to make
a positive match.

The other construct I demonstrated up there is (?:blahblah) which does a
subpattern match but does not assign it to the array for preg_match.

Both (?P< > ) and (?: ) are just like regular parenthesis for submatching,
but they extend the power.



ALSO: You have a loose dash in your character class. I highly recommend
you escape dashes, even though the pcre docs may not say you need to. Also
as well, add a capital U next to the i at the end of your pregex for
ungreedy matching. This makes sure you only get one solid line of matches.


I know I didn't specifically answer your question, but maybe with this
newfound info you will find a solution automatically. :-)



"Tyrone Slothrop" <ts@paranoids.com> wrote in message
news:6f4nu155oc1dgg7rlc2ri21b37pic2l8u9@4ax.com...
> I am coding a function which converts text links and email address to
> HTML compatible tags in an email newsletter.
>
> I have it working except, as the str_replace function loops through
> the text and, should there be a duplicate address or link, it inserts
> the tags again, like this
> <a href="<a href="someurl">someurl</a>">someurl</a>
>
> Here is the code (with debug echoes):
> function convert2links ($text)
> {
> preg_match_all("/[a-zA-Z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}/",
> $text, $arr0);
> foreach($arr0[0] as $v) { $url[] = trim(email2anchor ($v)); }
>
>

preg_match_all("/((http)+(s)?:(\/\/))?[a-zA-Z0-9]+\.[a-zA-Z0-9]+\.[a-zA-Z]{2
,4}(\/)?([~a-zA-Z0-9_\-.\/?=]*)?/",
> $text, $arr1);
> foreach($arr1[0] as $v) { $url[] = url2anchor ($v); }
> $s = array_merge ($arr0[0], $arr1[0]);
> $r = $url;
> echo "<pre>";
> print_r ($s);
> print_r ($r);
> for ($i=0; $i<count($s); $i++)
> {
> $text = str_replace ($s[$i], $r[$i], $text)
> }
> echo $text;
> echo "</pre>";
> exit;
> }
>
> Note: I have been using the for() loop in the debug to track the
> replacements. The original code used the arrays in the str_replace().
>
> I presume that I need to evaluate whether the there is a preceding ">"
> to the search string, but I cannot figure out the regexp to do this.
> After a couple hours of failure, my frustration level has gotten to
> the point where nothing comes to mind.
>
> Any ideas?
>
> TIA!
>
>



Reply With Quote
  #4 (permalink)  
Old 02-10-2006
Tyrone Slothrop
 
Posts: n/a
Default Re: RegExp Help Please

On Thu, 9 Feb 2006 18:40:39 -0800, "Connector5"
<junkmilenko@charter.net> wrote:

>If you are using preg, then you are missing two very nice features that
>would most likely solve your problem.
>
>If you have a unique match, like "http://" which you know should only occur
>once, then you can start your preg like this:
>
> /^(?P<protocol>(?:http:\/\/){7,7})
>
>The (?P<blah>match) construct will match whatever is in match and assign it
>to the return array from preg_match ASSOCIATIVELY. This means that when you
>run preg_match( , , $matches = array()); you will be able to extract the
>proto with $matches['protocol']. This is called named matching, and
>eliminates the need to loop through matches with for. You only need to make
>a positive match.
>
>The other construct I demonstrated up there is (?:blahblah) which does a
>subpattern match but does not assign it to the array for preg_match.
>
>Both (?P< > ) and (?: ) are just like regular parenthesis for submatching,
>but they extend the power.
>
>
>
>ALSO: You have a loose dash in your character class. I highly recommend
>you escape dashes, even though the pcre docs may not say you need to. Also
>as well, add a capital U next to the i at the end of your pregex for
>ungreedy matching. This makes sure you only get one solid line of matches.
>
>
>I know I didn't specifically answer your question, but maybe with this
>newfound info you will find a solution automatically. :-)


I managed to get it all working 100% and under all conditions I could
think of throwing at it.

Thanks!


Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 12:19 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0