html source

This is a discussion on html source within the alt.comp.lang.php forums, part of the PHP Programming Forums category; Is there anyway to capture the html source code of a page and only grab the content in the body ...


Go Back   Usenet Forums > PHP Programming Forums > alt.comp.lang.php

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 02-15-2007
yoko
 
Posts: n/a
Default html source

Is there anyway to capture the html source code of a page and only grab the
content in the body tags without using fsockopen?

for example lets say the URL is $url="http://ca3.php.net/manual/en/faq.obtaining.php";

Thanks to everyone that helps


Reply With Quote
  #2 (permalink)  
Old 02-15-2007
Dennis Kehrig
 
Posts: n/a
Default Re: html source

yoko wrote:
> Is there anyway to capture the html source code of a page and only grab
> the content in the body tags without using fsockopen?
> for example lets say the URL is
> $url="http://ca3.php.net/manual/en/faq.obtaining.php";
>
> Thanks to everyone that helps.


Try this (allow_url_fopen needs to be enabled, probably a bad idea):

// Get the HTML file
$html = file_get_contents($url);
// Reduce it to the contents of the <body> tag
$body = preg_replace("#^.*<body[^>]*>(.*)</body>.*$#si", "\\1", $html);
// Strip of whitespace at the beginning and the end
$body = trim($body);

Best regards,

Dennis Kehrig
Reply With Quote
  #3 (permalink)  
Old 02-16-2007
yoko
 
Posts: n/a
Default Re: html source


That worked no problems. What about cURL is that a good method as well?

Hello Dennis,

> // Get the HTML file
> $html = file_get_contents($url);
> // Reduce it to the contents of the <body> tag
> $body = preg_replace("#^.*<body[^>]*>(.*)</body>.*$#si", "\\1",
> $html);
> // Strip of whitespace at the beginning and the end
> $body = trim($body)



Reply With Quote
  #4 (permalink)  
Old 02-16-2007
Rik
 
Posts: n/a
Default Re: html source

On Fri, 16 Feb 2007 05:05:41 +0100, yoko <nana@na.ca> wrote:

>> // Get the HTML file
>> $html = file_get_contents($url);
>> // Reduce it to the contents of the <body> tag
>> $body = preg_replace("#^.*<body[^>]*>(.*)</body>.*$#si", "\\1",
>> $html);
>> // Strip of whitespace at the beginning and the end
>> $body = trim($body);

>
> That worked no problems. What about cURL is that a good method as
> well?


'the body' of the response for CURL is the entire HTML document, just
without the headers (so _not_ without the html head). No extra
functionality there to get only the body.

Using cURL is usefull when:
- You're possibly redirected, cURL will follow the redirect if you tell it
to.
- You want to use cookie or post values to get the content.
--
Rik Wasmus
Reply With Quote
  #5 (permalink)  
Old 02-23-2007
Lorenzo Bettini
 
Posts: n/a
Default Re: html source

yoko wrote:
> Is there anyway to capture the html source code of a page and only grab
> the content in the body tags without using fsockopen?
> for example lets say the URL is
> $url="http://ca3.php.net/manual/en/faq.obtaining.php";
>


here's my version:
http://tronprog.blogspot.com/2007/02...dy-in-php.html

hope this helps

--
Lorenzo Bettini, PhD in Computer Science, DSI, Univ. di Firenze
ICQ# lbetto, 16080134 (GNU/Linux User # 158233)
HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com
BLOGS: http://tronprog.blogspot.com http://longlivemusic.blogspot.com
http://www.gnu.org/software/src-highlite
http://www.gnu.org/software/gengetopt
http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 11:20 AM.


Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0