This is a discussion on Extract specific div element from page within the PHP General forums, part of the PHP Programming Forums category; Hey folks, I need to pull the contents inside of a specific div out of a page, and write it ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Hey folks,
I need to pull the contents inside of a specific div out of a page, and write it to a separate file. In this instance I am taking everything inside of <div id="content"></div> tags from a wordpress blog, this will give me only the content and not the menus, or other stuff. I need to do this because the final document will be converted for viewing on a palm pilot. Is anyone aware of a simple solution to this problem, short of parsing the entire page and starting when I hit that div opening tag, and stopping when I hit the closing tag? One problem I can see with this method is that I would have to count divs inside of that div, otherwise I would end too early on. Any advice would be greatly appreciated. Peace and Love, distatica. -- --------------------------------- Anthony Hiscox Video Watch Group Public Site Currently Under Development Group Members Site Fully Operational --------------------------------- |
|
|||
|
Or you could just use Javascript combined with PHP, just use javascript it's
something like this document.getElementById('tagId').innerHtml that will give you the html(contents) of the <div> tag you specify. Then just do something like document.form.value = document.getElementById('tagId').innerHtml. Basicly you're setting a hidden form element to have the value of the div, then when you submit the page, you have the content as $_POST['formYouSetTo']. You could have the JS execute on the submit button's onclick. It should be relatively easy if you look up the exact syntax of the javascript. - Daniel ""Anthony Hiscox"" <distatica@melknight.net> wrote in message news:6dfcba5e0706151440p19d81dccrddda1633339827e5@ mail.gmail.com... > Hey folks, > > I need to pull the contents inside of a specific div out of a page, and > write it to a separate file. In this instance I am taking everything > inside > of <div id="content"></div> tags from a wordpress blog, this will give me > only the content and not the menus, or other stuff. I need to do this > because the final document will be converted for viewing on a palm pilot. > > Is anyone aware of a simple solution to this problem, short of parsing the > entire page and starting when I hit that div opening tag, and stopping > when > I hit the closing tag? One problem I can see with this method is that I > would have to count divs inside of that div, otherwise I would end too > early > on. > > Any advice would be greatly appreciated. > > Peace and Love, > distatica. > > -- > --------------------------------- > Anthony Hiscox > > Video Watch Group > Public Site Currently Under Development > Group Members Site Fully Operational > --------------------------------- > |
|
|||
|
Anthony Hiscox wrote:
> Hey folks, > > I need to pull the contents inside of a specific div out of a page, and > write it to a separate file. In this instance I am taking everything > inside > of <div id="content"></div> tags from a wordpress blog, this will give me > only the content and not the menus, or other stuff. I need to do this > because the final document will be converted for viewing on a palm pilot. > > Is anyone aware of a simple solution to this problem, short of parsing > the > entire page and starting when I hit that div opening tag, and stopping > when > I hit the closing tag? One problem I can see with this method is that I > would have to count divs inside of that div, otherwise I would end too > early > on. > > Any advice would be greatly appreciated. > > Peace and Love, > distatica. > What is your relationship to the wordpress blog? If you have control over it, the easiest way to do this is in the browser with Javasacript. You can then send the contents to the server, using Ajax if need be, where it can be written to the file. var content = document.getElementById("content').innerHTML; Or if the page is being sent back to the server through a form, then put the content into a form variable and read that when the page gets back to the server, then write it to the file. -- _____________________ Myron Turner http://www.room535.org http://www.bstatzero.org http://www.mturner.org/XML_PullParser/ |
|
|||
|
Oops, I accidentally sent this directly to CK, my apologies.
Thank you for your replies. The reason that I didn't explore the JS route is because this will be running in the background, I didn't want to have to visit the page in any way. I went looking for an easy way to accomplish this in PHP but due to malformed HTML in some sites (not wordpress that I am aware of) it wasn't going to be so easy. Someone in ##php on irc.freenode.net pointed me to BeautifulSoup which is a Python module for scraping pages even if they have bad HTML. Within a minute I had a script that grabbed the parts I wanted, and even removed the parts I didn't (such as comments). Now I have a Python script that runs when I am going to update the docs on my Palm, it grabs the page(s), strips out the unimportant stuff, saves to a local directory, and then I have Sunrise parse that into plucker document format. Once again, thank you for the responses. On 6/15/07, Dan <frozendice@gmail.com> wrote: > > Or you could just use Javascript combined with PHP, just use javascript > it's > something like this document.getElementById('tagId').innerHtml that will > give you the html(contents) of the <div> tag you specify. Then just do > something like document.form.value = > document.getElementById('tagId').innerHtml. Basicly you're setting a > hidden > form element to have the value of the div, then when you submit the page, > you have the content as $_POST['formYouSetTo']. You could have the JS > execute on the submit button's onclick. > > It should be relatively easy if you look up the exact syntax of the > javascript. > > - Daniel > > ""Anthony Hiscox"" <distatica@melknight.net> wrote in message > news:6dfcba5e0706151440p19d81dccrddda1633339827e5@ mail.gmail.com... > > Hey folks, > > > > I need to pull the contents inside of a specific div out of a page, and > > write it to a separate file. In this instance I am taking everything > > inside > > of <div id="content"></div> tags from a wordpress blog, this will give > me > > only the content and not the menus, or other stuff. I need to do this > > because the final document will be converted for viewing on a palm > pilot. > > > > Is anyone aware of a simple solution to this problem, short of parsing > the > > entire page and starting when I hit that div opening tag, and stopping > > when > > I hit the closing tag? One problem I can see with this method is that I > > would have to count divs inside of that div, otherwise I would end too > > early > > on. > > > > Any advice would be greatly appreciated. > > > > Peace and Love, > > distatica. > > > > -- > > --------------------------------- > > Anthony Hiscox > > > > Video Watch Group > > Public Site Currently Under Development > > Group Members Site Fully Operational > > --------------------------------- > > > > -- > PHP General Mailing List (http://www.php.net/) > To unsubscribe, visit: http://www.php.net/unsub.php > > -- --------------------------------- Anthony Hiscox Video Watch Group Public Site Currently Under Development Group Members Site Fully Operational --------------------------------- |
![]() |
| Thread Tools | |
| Display Modes | |
|
|