regular expression for parsing html using preg_match_all

This is a discussion on regular expression for parsing html using preg_match_all within the alt.comp.lang.php forums, part of the PHP Programming Forums category; Hi all, I've been trying unsuccessfully to get the text from html page. Html tag that I'm interested ...


Go Back   Usenet Forums > PHP Programming Forums > alt.comp.lang.php

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 07-06-2006
crescent_au@yahoo.com
 
Posts: n/a
Default regular expression for parsing html using preg_match_all

Hi all,

I've been trying unsuccessfully to get the text from html page. Html
tag that I'm interested in looks like this:

<a class=link
href="http://www.something.com/_something.php?type=cart">Shopping
Cart</a>
<div><em class=newentry><a href=http://nothing.com>New
Age</a></em></div>

>From the above tag, I want to extract "Shopping Cart". I'm not very

good with RE. I tried this:
$lines = file_get_contents("http://theabovetag.com/page.html");
preg_match_all("/(<a\ class\=link\ href\=(.*)>)(<\/a>)/", $lines,
$matches1);

The above RE gives me "Shopping Cart" plus "New Age" as well. I just
want "Shopping Cart". What am I doing wrong? My RE is somehow ignoring
</a> tag right after Shopping Cart and instead accepting </a> after New
Age. Please help!

Reply With Quote
  #2 (permalink)  
Old 07-08-2006
Jos van Uden
 
Posts: n/a
Default Re: regular expression for parsing html using preg_match_all

crescent_au@yahoo.com wrote:

> preg_match_all("/(<a\ class\=link\ href\=(.*)>)(<\/a>)/", $lines,
> $matches1);
>

The above RE gives me "Shopping Cart" plus "New Age" as well. I just
> want "Shopping Cart". What am I doing wrong? My RE is somehow ignoring
> </a> tag right after Shopping Cart and instead accepting </a> after New
> Age. Please help!


By default the multipliers are "greedy" and match as
much as possible. You can stop this by placing a question
mark behind the multiplier like (.*?)

Then it will match as little a possible.

Jos

PS. This little prog may be useful if you have trouble
with Regexes: http://www.regexbuddy.com/

(not mine)

Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 11:22 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0