Extract specific div element from page

This is a discussion on Extract specific div element from page within the PHP General forums, part of the PHP Programming Forums category; Hey folks, I need to pull the contents inside of a specific div out of a page, and write it ...


Go Back   Usenet Forums > PHP Programming Forums > PHP General

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 06-15-2007
Anthony Hiscox
 
Posts: n/a
Default Extract specific div element from page

Hey folks,

I need to pull the contents inside of a specific div out of a page, and
write it to a separate file. In this instance I am taking everything inside
of <div id="content"></div> tags from a wordpress blog, this will give me
only the content and not the menus, or other stuff. I need to do this
because the final document will be converted for viewing on a palm pilot.

Is anyone aware of a simple solution to this problem, short of parsing the
entire page and starting when I hit that div opening tag, and stopping when
I hit the closing tag? One problem I can see with this method is that I
would have to count divs inside of that div, otherwise I would end too early
on.

Any advice would be greatly appreciated.

Peace and Love,
distatica.

--
---------------------------------
Anthony Hiscox

Video Watch Group
Public Site Currently Under Development
Group Members Site Fully Operational
---------------------------------

Reply With Quote
  #2 (permalink)  
Old 06-16-2007
Dan
 
Posts: n/a
Default Re: Extract specific div element from page

Or you could just use Javascript combined with PHP, just use javascript it's
something like this document.getElementById('tagId').innerHtml that will
give you the html(contents) of the <div> tag you specify. Then just do
something like document.form.value =
document.getElementById('tagId').innerHtml. Basicly you're setting a hidden
form element to have the value of the div, then when you submit the page,
you have the content as $_POST['formYouSetTo']. You could have the JS
execute on the submit button's onclick.

It should be relatively easy if you look up the exact syntax of the
javascript.

- Daniel

""Anthony Hiscox"" <distatica@melknight.net> wrote in message
news:6dfcba5e0706151440p19d81dccrddda1633339827e5@ mail.gmail.com...
> Hey folks,
>
> I need to pull the contents inside of a specific div out of a page, and
> write it to a separate file. In this instance I am taking everything
> inside
> of <div id="content"></div> tags from a wordpress blog, this will give me
> only the content and not the menus, or other stuff. I need to do this
> because the final document will be converted for viewing on a palm pilot.
>
> Is anyone aware of a simple solution to this problem, short of parsing the
> entire page and starting when I hit that div opening tag, and stopping
> when
> I hit the closing tag? One problem I can see with this method is that I
> would have to count divs inside of that div, otherwise I would end too
> early
> on.
>
> Any advice would be greatly appreciated.
>
> Peace and Love,
> distatica.
>
> --
> ---------------------------------
> Anthony Hiscox
>
> Video Watch Group
> Public Site Currently Under Development
> Group Members Site Fully Operational
> ---------------------------------
>

Reply With Quote
  #3 (permalink)  
Old 06-16-2007
Myron Turner
 
Posts: n/a
Default Re: [PHP] Extract specific div element from page

Anthony Hiscox wrote:
> Hey folks,
>
> I need to pull the contents inside of a specific div out of a page, and
> write it to a separate file. In this instance I am taking everything
> inside
> of <div id="content"></div> tags from a wordpress blog, this will give me
> only the content and not the menus, or other stuff. I need to do this
> because the final document will be converted for viewing on a palm pilot.
>
> Is anyone aware of a simple solution to this problem, short of parsing
> the
> entire page and starting when I hit that div opening tag, and stopping
> when
> I hit the closing tag? One problem I can see with this method is that I
> would have to count divs inside of that div, otherwise I would end too
> early
> on.
>
> Any advice would be greatly appreciated.
>
> Peace and Love,
> distatica.
>

What is your relationship to the wordpress blog? If you have control
over it, the easiest way to do this is in the browser with Javasacript.
You can then send the contents to the server, using Ajax if need be,
where it can be written to the file.

var content = document.getElementById("content').innerHTML;

Or if the page is being sent back to the server through a form, then put
the content into a form variable and read that when the page gets back
to the server, then write it to the file.


--

_____________________
Myron Turner
http://www.room535.org
http://www.bstatzero.org
http://www.mturner.org/XML_PullParser/
Reply With Quote
  #4 (permalink)  
Old 06-16-2007
Anthony Hiscox
 
Posts: n/a
Default Re: [PHP] Re: Extract specific div element from page

Oops, I accidentally sent this directly to CK, my apologies.

Thank you for your replies. The reason that I didn't explore the JS route is
because this will be running in the background, I didn't want to have to
visit the page in any way. I went looking for an easy way to accomplish this
in PHP but due to malformed HTML in some sites (not wordpress that I am
aware of) it wasn't going to be so easy. Someone in ##php on
irc.freenode.net pointed me to BeautifulSoup which is a Python module for
scraping pages even if they have bad HTML. Within a minute I had a script
that grabbed the parts I wanted, and even removed the parts I didn't (such
as comments). Now I have a Python script that runs when I am going to update
the docs on my Palm, it grabs the page(s), strips out the unimportant stuff,
saves to a local directory, and then I have Sunrise parse that into plucker
document format.

Once again, thank you for the responses.



On 6/15/07, Dan <frozendice@gmail.com> wrote:
>
> Or you could just use Javascript combined with PHP, just use javascript
> it's
> something like this document.getElementById('tagId').innerHtml that will
> give you the html(contents) of the <div> tag you specify. Then just do
> something like document.form.value =
> document.getElementById('tagId').innerHtml. Basicly you're setting a
> hidden
> form element to have the value of the div, then when you submit the page,
> you have the content as $_POST['formYouSetTo']. You could have the JS
> execute on the submit button's onclick.
>
> It should be relatively easy if you look up the exact syntax of the
> javascript.
>
> - Daniel
>
> ""Anthony Hiscox"" <distatica@melknight.net> wrote in message
> news:6dfcba5e0706151440p19d81dccrddda1633339827e5@ mail.gmail.com...
> > Hey folks,
> >
> > I need to pull the contents inside of a specific div out of a page, and
> > write it to a separate file. In this instance I am taking everything
> > inside
> > of <div id="content"></div> tags from a wordpress blog, this will give

> me
> > only the content and not the menus, or other stuff. I need to do this
> > because the final document will be converted for viewing on a palm

> pilot.
> >
> > Is anyone aware of a simple solution to this problem, short of parsing

> the
> > entire page and starting when I hit that div opening tag, and stopping
> > when
> > I hit the closing tag? One problem I can see with this method is that I
> > would have to count divs inside of that div, otherwise I would end too
> > early
> > on.
> >
> > Any advice would be greatly appreciated.
> >
> > Peace and Love,
> > distatica.
> >
> > --
> > ---------------------------------
> > Anthony Hiscox
> >
> > Video Watch Group
> > Public Site Currently Under Development
> > Group Members Site Fully Operational
> > ---------------------------------
> >

>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>



--
---------------------------------
Anthony Hiscox

Video Watch Group
Public Site Currently Under Development
Group Members Site Fully Operational
---------------------------------

Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 05:35 PM.


Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0