This is a discussion on using PHP to parse through HTML within the PHP Language forums, part of the PHP Programming Forums category; Hi, I'm using PHP 4 and trying to parse through HTML to look for HREF attributes of anchor tags ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Hi, I'm using PHP 4 and trying to parse through HTML to look for HREF
attributes of anchor tags and SRC attributes of IMG tags. Does anyone know of any libraries/freeware to help parse through HTML to find these things. Right now, I'm doing a lot of "strstr" calls, but there is probably a better way to do what I need. Thanks for any help, - Dave |
|
|||
|
On 19 Feb 2005 11:49:24 -0800, laredotornado@gmail.com wrote:
>Hi, I'm using PHP 4 and trying to parse through HTML to look for HREF >attributes of anchor tags and SRC attributes of IMG tags. Does anyone >know of any libraries/freeware to help parse through HTML to find these >things. Right now, I'm doing a lot of "strstr" calls, but there is >probably a better way to do what I need. Haven't used it myself, but seen mentions of: http://pear.php.net/package/XML_HTMLSax ... which looks possibly suitable from the description on the page. -- Andy Hassall / <andy@andyh.co.uk> / <http://www.andyh.co.uk> <http://www.andyhsoftware.co.uk/space> Space: disk usage analysis tool |
|
|||
|
laredotornado@gmail.com wrote in
news:1108842564.846225.81750@c13g2000cwb.googlegro ups.com: > Hi, I'm using PHP 4 and trying to parse through HTML to look for HREF > attributes of anchor tags and SRC attributes of IMG tags. Does anyone > know of any libraries/freeware to help parse through HTML to find these > things. Right now, I'm doing a lot of "strstr" calls, but there is > probably a better way to do what I need. Take a look at preg_split() http://www.php.net/manual/en/function.preg-split.php -- Dave Patton Canadian Coordinator, Degree Confluence Project http://www.confluence.org/ My website: http://members.shaw.ca/davepatton/ |
|
|||
|
Too bad none of the examples work. I untarred/uncompressed the file,
copied the folder to a public html directory and then every time I try and launch an example, I get errors like Warning: main(XML/HTMLSax/XML_HTMLSax_States.php): failed to open stream: No such file or directory in /usr/local/apache/htdocs/temp/XML/XML_HTMLSax.php on line 36 Fatal error: main(): Failed opening required 'XML/HTMLSax/XML_HTMLSax_States.php' (include_path='.:/usr/local/lib/php') in /usr/local/apache/htdocs/temp/XML/XML_HTMLSax.php on line 36 Andy Hassall wrote: > On 19 Feb 2005 11:49:24 -0800, laredotornado@gmail.com wrote: > > >Hi, I'm using PHP 4 and trying to parse through HTML to look for HREF > >attributes of anchor tags and SRC attributes of IMG tags. Does anyone > >know of any libraries/freeware to help parse through HTML to find these > >things. Right now, I'm doing a lot of "strstr" calls, but there is > >probably a better way to do what I need. > > Haven't used it myself, but seen mentions of: > > http://pear.php.net/package/XML_HTMLSax > > ... which looks possibly suitable from the description on the page. > > -- > Andy Hassall / <andy@andyh.co.uk> / <http://www.andyh.co.uk> > <http://www.andyhsoftware.co.uk/space> Space: disk usage analysis tool |
|
|||
|
"laredotornado" wrote:
> Hi, I'm using PHP 4 and trying to parse through HTML to look > for HREF > attributes of anchor tags and SRC attributes of IMG tags. > Does anyone > know of any libraries/freeware to help parse through HTML to > find these > things. Right now, I'm doing a lot of "strstr" calls, but > there is > probably a better way to do what I need. > > Thanks for any help, - Dave strstr is the LAST thing you want to do in this case! I don’t know of libraries, but you can use preg_match to grab the tags that you need. If you are into php, learning preg_match and regular expressions in general is almost a must.. it will substantially increase the power of your code. steve -- Posted using the http://www.dbforumz.com interface, at author's request Articles individually checked for conformance to usenet standards Topic URL: http://www.dbforumz.com/PHP-parse-HT...ict199658.html Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbforumz.com/eform.php?p=677948 |
|
|||
|
>
> strstr is the LAST thing you want to do in this case! I don't know > of libraries, but you can use preg_match to grab the tags that you > need. > > If you are into php, learning preg_match and regular expressions in > general is almost a must.. it will substantially increase the power > of your code. > > steve > > -- Sorry can you elaborate on you first statement. Are you saying that "strstr" is slower that "preg_match"? what about "strpos"? The reason I ask is, if it was faster to look for a character in string using "preg_match" then why wouldn't strpos/strstr us it themselves? I need to look for 2 characters in some data, (case sensitive), what would be the fastest way of finding the first occurrence? $first = strpos( $data, $charA ); $sec = strpos( $data, $charB ); // check for ===false; return ($first<$sec)?$first:$sec; // would there be a faster way to achieve the above using "preg_match"? Simon |
|
|||
|
On 19 Feb 2005 20:22:22 -0800, laredotornado@zipmail.com wrote:
>Andy Hassall wrote: >> On 19 Feb 2005 11:49:24 -0800, laredotornado@gmail.com wrote: >> >> >Hi, I'm using PHP 4 and trying to parse through HTML to look for >HREF >> >attributes of anchor tags and SRC attributes of IMG tags. Does >anyone >> >know of any libraries/freeware to help parse through HTML to find >these >> >things. Right now, I'm doing a lot of "strstr" calls, but there is >> >probably a better way to do what I need. >> >> Haven't used it myself, but seen mentions of: >> >> http://pear.php.net/package/XML_HTMLSax >> >> ... which looks possibly suitable from the description on the page. > >Too bad none of the examples work. I untarred/uncompressed the file, >copied the folder to a public html directory That's not how you're supposed to install PEAR modules; here's an example how: root@server:~# pear install http://pear.php.net/get/XML_HTMLSax-2.1.2.tgz downloading XML_HTMLSax-2.1.2.tgz ... Starting to download XML_HTMLSax-2.1.2.tgz (16,099 bytes) .......done: 16,099 bytes install ok: XML_HTMLSax 2.1.2 You could probably get away with unpacking to a public_html directory but you'd need to fiddle with your include_path else you get errors like: >Warning: main(XML/HTMLSax/XML_HTMLSax_States.php): failed to open >stream: No such file or directory in >/usr/local/apache/htdocs/temp/XML/XML_HTMLSax.php on line 36 The examples work OK for me after installing through pear as above. -- Andy Hassall / <andy@andyh.co.uk> / <http://www.andyh.co.uk> <http://www.andyhsoftware.co.uk/space> Space: disk usage analysis tool |
|
|||
|
|
|
|||
|
"Simon" wrote:
>> >> strstr is the LAST thing you want to do in this case! I >don’t know >> of libraries, but you can use preg_match to grab the tags that >you >> need. >> >> If you are into php, learning preg_match and regular expressions >in >> general is almost a must.. it will substantially increase the >power >> of your code. >> >> steve >> >> -- > > >Sorry can you elaborate on you first statement. >Are you saying that "strstr" is slower that "preg_match"? what about >"strpos"? > >The reason I ask is, if it was faster to look for a character in >string >using "preg_match" then why wouldn’t strpos/strstr us it >themselves? > >I need to look for 2 characters in some data, (case sensitive), what >would >be the fastest way of finding the first occurrence? > >$first = strpos( $data, $charA ); >$sec = strpos( $data, $charB ); >// check for ===false; >return ($first<$sec)?$first:$sec; > >// would there be a faster way to achieve the above using >"preg_match"? > >Simon Simon, in 99% of the cases, speed does not matter, i.e. you can achieve good speed regardless --not something I have ever had to worry about in the code. The point is that with preg_match and regex, you can achieve with one statement what it takes 10 statement to achive, if you did not have regex. If you ever parse free text in any shape or form, regex is the way to go. Your example above is simple and if that is all you need fine, but as soon as the text has spurious (sp?) spaces, other characters that may or may not be present, and a whole bunch of other conditions outside your control, you need a much more powerful engine, and that is regex. -- Posted using the http://www.dbforumz.com interface, at author's request Articles individually checked for conformance to usenet standards Topic URL: http://www.dbforumz.com/PHP-parse-HT...ict199658.html Visit Topic URL to contact author (reg. req'd). Report abuse: http://www.dbforumz.com/eform.php?p=678383 |