This is a discussion on html source within the alt.comp.lang.php forums, part of the PHP Programming Forums category; Is there anyway to capture the html source code of a page and only grab the content in the body ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
yoko wrote:
> Is there anyway to capture the html source code of a page and only grab > the content in the body tags without using fsockopen? > for example lets say the URL is > $url="http://ca3.php.net/manual/en/faq.obtaining.php"; > > Thanks to everyone that helps. Try this (allow_url_fopen needs to be enabled, probably a bad idea): // Get the HTML file $html = file_get_contents($url); // Reduce it to the contents of the <body> tag $body = preg_replace("#^.*<body[^>]*>(.*)</body>.*$#si", "\\1", $html); // Strip of whitespace at the beginning and the end $body = trim($body); Best regards, Dennis Kehrig |
|
|||
|
That worked no problems. What about cURL is that a good method as well? Hello Dennis, > // Get the HTML file > $html = file_get_contents($url); > // Reduce it to the contents of the <body> tag > $body = preg_replace("#^.*<body[^>]*>(.*)</body>.*$#si", "\\1", > $html); > // Strip of whitespace at the beginning and the end > $body = trim($body) |
|
|||
|
On Fri, 16 Feb 2007 05:05:41 +0100, yoko <nana@na.ca> wrote:
>> // Get the HTML file >> $html = file_get_contents($url); >> // Reduce it to the contents of the <body> tag >> $body = preg_replace("#^.*<body[^>]*>(.*)</body>.*$#si", "\\1", >> $html); >> // Strip of whitespace at the beginning and the end >> $body = trim($body); > > That worked no problems. What about cURL is that a good method as > well? 'the body' of the response for CURL is the entire HTML document, just without the headers (so _not_ without the html head). No extra functionality there to get only the body. Using cURL is usefull when: - You're possibly redirected, cURL will follow the redirect if you tell it to. - You want to use cookie or post values to get the content. -- Rik Wasmus |
|
|||
|
yoko wrote:
> Is there anyway to capture the html source code of a page and only grab > the content in the body tags without using fsockopen? > for example lets say the URL is > $url="http://ca3.php.net/manual/en/faq.obtaining.php"; > here's my version: http://tronprog.blogspot.com/2007/02...dy-in-php.html hope this helps -- Lorenzo Bettini, PhD in Computer Science, DSI, Univ. di Firenze ICQ# lbetto, 16080134 (GNU/Linux User # 158233) HOME: http://www.lorenzobettini.it MUSIC: http://www.purplesucker.com BLOGS: http://tronprog.blogspot.com http://longlivemusic.blogspot.com http://www.gnu.org/software/src-highlite http://www.gnu.org/software/gengetopt http://www.gnu.org/software/gengen http://doublecpp.sourceforge.net |