Eregi pattern matching - bit of a challenge I thinks

This is a discussion on Eregi pattern matching - bit of a challenge I thinks within the alt.comp.lang.php forums, part of the PHP Programming Forums category; Hi,. I'm trying to detect any links that are contained within an html page using eregi pattern matching. I ...


Go Back   Usenet Forums > PHP Programming Forums > alt.comp.lang.php

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 12-04-2003
NimP
 
Posts: n/a
Default Eregi pattern matching - bit of a challenge I thinks

Hi,. I'm trying to detect any links that are contained within an html page
using eregi pattern matching. I was wondering if there are any pattern
matching geniuses out there who could write a pattern that merges all the
different manners in which a link could be wriiten,

Current patterns I can think of include:

<a href=x.com> no spaces betwen href, equals and url, no quotation marks
around url
<a href =x.com> space between href and equals, no space between equals and
url, no quotation marks round url
<a href= x.com> no space between href and equals, space between equals and
url, no quotation marks around url
<a href = x.com> space between href and equals, space between equals and
url, no quotation marks round url


<a href='x.com'> no spaces betwen href, equals and url, single quotation
marks around url
<a href ='x.com'> space between href and equals, no space between equals and
url, single quotation marks round url
<a href= 'x.com'> no space between href and equals, space between equals and
url, single quotation marks around url
<a href = 'x.com'> space between href and equals, space between equals and
url, single quotation marks round url

<a href="x.com"> no spaces betwen href, equals and url, double quotation
marks around url
<a href ="x.com"> space between href and equals, no space between equals and
url, double quotation marks round url
<a href= "x.com"> no space between href and equals, space between equals and
url, double quotation marks around url
<a href = "x.com"> space between href and equals, space between equals and
url, double quotation marks round url

<a href='x.com"> no spaces betwen href, equals and url, mismatched quotation
marks around url - single open, double to close
<a href ='x.com"> space between href and equals, no space between equals and
url, mismatched quotation marks around url - single open, double to close
<a href= 'x.com"> no space between href and equals, space between equals and
url,mismatched quotation marks around url - single open, double to close
<a href = 'x.com"> space between href and equals, space between equals and
url, mismatched quotation marks around url - single open, double to close

<a href="x.com'> no spaces betwen href, equals and url, mismatched quotation
marks around url - double open, single to close
<a href ="x.com'> space between href and equals, no space between equals and
url, mismatched quotation marks around url - double open, single to close
<a href= "x.com'> no space between href and equals, space between equals and
url,mismatched quotation marks around url - double open, single to close
<a href = "x.com'> space between href and equals, space between equals and
url,mismatched quotation marks around url - double open, single to close


I guess whats needed is something more advanced than

eregi("href=\"/(.*)\">",string,$arryaholding_results))

I'd appreciate any help you could give,

Thanks
NimP






Reply With Quote
  #2 (permalink)  
Old 12-04-2003
Jon Kraft
 
Posts: n/a
Default Re: Eregi pattern matching - bit of a challenge I thinks

"NimP" <stu@sturobbie.co.uk> wrote:

> Hi,. I'm trying to detect any links that are contained within an html
> page using eregi pattern matching. I was wondering if there are any
> pattern matching geniuses out there who could write a pattern that
> merges all the different manners in which a link could be wriiten,



I'm sure there is an easier solution out there somewhere, but by going
through your examples I came up with that (wouldn't validate an URL
though):

preg_match("/<a(\s)+href(\s)*=(\s)*(['\"])*([a-z0-9_\-\.])+(['\"])*>/i",
$string, $matches);

echo htmlentities($matches[0]);

JOn
Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 01:54 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0