This is a discussion on regular expression for parsing html using preg_match_all within the alt.comp.lang.php forums, part of the PHP Programming Forums category; Hi all, I've been trying unsuccessfully to get the text from html page. Html tag that I'm interested ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Hi all,
I've been trying unsuccessfully to get the text from html page. Html tag that I'm interested in looks like this: <a class=link href="http://www.something.com/_something.php?type=cart">Shopping Cart</a> <div><em class=newentry><a href=http://nothing.com>New Age</a></em></div> >From the above tag, I want to extract "Shopping Cart". I'm not very good with RE. I tried this: $lines = file_get_contents("http://theabovetag.com/page.html"); preg_match_all("/(<a\ class\=link\ href\=(.*)>)(<\/a>)/", $lines, $matches1); The above RE gives me "Shopping Cart" plus "New Age" as well. I just want "Shopping Cart". What am I doing wrong? My RE is somehow ignoring </a> tag right after Shopping Cart and instead accepting </a> after New Age. Please help! |
|
|||
|
crescent_au@yahoo.com wrote:
> preg_match_all("/(<a\ class\=link\ href\=(.*)>)(<\/a>)/", $lines, > $matches1); > The above RE gives me "Shopping Cart" plus "New Age" as well. I just > want "Shopping Cart". What am I doing wrong? My RE is somehow ignoring > </a> tag right after Shopping Cart and instead accepting </a> after New > Age. Please help! By default the multipliers are "greedy" and match as much as possible. You can stop this by placing a question mark behind the multiplier like (.*?) Then it will match as little a possible. Jos PS. This little prog may be useful if you have trouble with Regexes: http://www.regexbuddy.com/ (not mine) |