This is a discussion on finding a specific area from page using regular expression within the PHP Language forums, part of the PHP Programming Forums category; hi friends, i have a bunch of html pages and i want to fetch records from them and i m ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
hi friends,
i have a bunch of html pages and i want to fetch records from them and i m really confused how i can do after working with regular expressions and other stuffs from last few days can anyone help me with this ? i have a pages with html and table all scatter there..now i want just specific table from the page and all records in that page, i was successfull somehow but still have problems ,here are they.. my example page...(just a table it has all the tags like html but i didnt write here just the thing i want is here) <table> <tr> <img src=.."> </tr> <tr> <table> <tr> <tr> <td> <b>name1</B> <br> <font size=2 color=darkgray ><i>address1</i></font><br> <br>phone no | <a href=mailto:mail@gmail.com>E-mail1 </a> | <a href='www.website.com' target=_blank>website1</a> </font> </td> </tr> </tr> <tr> <b> name2</b> ......... </tr> </table> </tr> </table> now from that table i want name,address,phone no,email,website.. using preg_replace function i was able to find all those things but it removes <A> tag so email and website are also removed...can anyone tell me how i can find email and website first from that code and then using preg_replace i can get other records...or else can anyone tell me any better solution like currently by using while loop and using if condition i m breaking at the main table and then fetching each record but any better solution ? |
|
|||
|
Hardik Dangar wrote:
> hi friends, > i have a bunch of html pages and i want to fetch records from them and > i m really confused how i can do after working with regular > expressions and other stuffs from last few days can anyone help me > with this ? > > i have a pages with html and table all scatter there..now i want just > specific table from the page and all records in that page, i was > successfull somehow but still have problems ,here are they.. > > my example page...(just a table it has all the tags like html but i > didnt write here just the thing i want is here) > > <table> > <tr> > <img src=.."> > </tr> > <tr> > <table> > <tr> > <tr> > <td> > <b>name1</B> > <br> > <font size=2 color=darkgray ><i>address1</i></font><br> > <br>phone no >> <a href=mailto:mail@gmail.com>E-mail1 </a> >> <a href='www.website.com' target=_blank>website1</a> > </font> > </td> > </tr> > </tr> > <tr> > <b> name2</b> > ......... > </tr> > </table> > </tr> > </table> > > now from that table i want name,address,phone no,email,website.. > using preg_replace function i was able to find all those things but it > removes <A> tag so email and website are also removed...can anyone > tell me how i can find email and website first from that code and then > using preg_replace i can get other records...or else can anyone tell > me any better solution like currently by using while loop and using if > condition i m breaking at the main table and then fetching each record > but any better solution ? Check out the DOM functions: http://uk.php.net/manual/en/ref.dom.php |
|
|||
|
On Aug 4, 10:26 pm, "Paul Lautman" <paul.laut...@btinternet.com>
wrote: > Hardik Dangar wrote: > > hi friends, > > i have a bunch of html pages and i want to fetch records from them and > > i m really confused how i can do after working with regular > > expressions and other stuffs from last few days can anyone help me > > with this ? > > > i have a pages with html and table all scatter there..now i want just > > specific table from the page and all records in that page, i was > > successfull somehow but still have problems ,here are they.. > > > my example page...(just a table it has all the tags like html but i > > didnt write here just the thing i want is here) > > > <table> > > <tr> > > <img src=.."> > > </tr> > > <tr> > > <table> > > <tr> > > <tr> > > <td> > > <b>name1</B> > > <br> > > <font size=2 color=darkgray ><i>address1</i></font><br> > > <br>phone no > >> <a href=mailto:m...@gmail.com>E-mail1 </a> > >> <a href='www.website.com'target=_blank>website1</a> > > </font> > > </td> > > </tr> > > </tr> > > <tr> > > <b> name2</b> > > ......... > > </tr> > > </table> > > </tr> > > </table> > > > now from that table i want name,address,phone no,email,website.. > > using preg_replace function i was able to find all those things but it > > removes <A> tag so email and website are also removed...can anyone > > tell me how i can find email and website first from that code and then > > using preg_replace i can get other records...or else can anyone tell > > me any better solution like currently by using while loop and using if > > condition i m breaking at the main table and then fetching each record > > but any better solution ? > > Check out the DOM functions:http://uk.php.net/manual/en/ref.dom.php @paul thanx for the help but, can you explain me how i can use it ? i didnt get i guess its for working with xml how can i use with my problem ? |
|
|||
|
Hardik Dangar wrote:
> On Aug 4, 10:26 pm, "Paul Lautman" <paul.laut...@btinternet.com> > wrote: >> Hardik Dangar wrote: >>> hi friends, >>> i have a bunch of html pages and i want to fetch records from them >>> and i m really confused how i can do after working with regular >>> expressions and other stuffs from last few days can anyone help me >>> with this ? >> >>> i have a pages with html and table all scatter there..now i want >>> just specific table from the page and all records in that page, i >>> was successfull somehow but still have problems ,here are they.. >> >>> my example page...(just a table it has all the tags like html but i >>> didnt write here just the thing i want is here) >> >>> <table> >>> <tr> >>> <img src=.."> >>> </tr> >>> <tr> >>> <table> >>> <tr> >>> <tr> >>> <td> >>> <b>name1</B> >>> <br> >>> <font size=2 color=darkgray ><i>address1</i></font><br> >>> <br>phone no >>>> <a href=mailto:m...@gmail.com>E-mail1 </a> >>>> <a href='www.website.com'target=_blank>website1</a> >>> </font> >>> </td> >>> </tr> >>> </tr> >>> <tr> >>> <b> name2</b> >>> ......... >>> </tr> >>> </table> >>> </tr> >>> </table> >> >>> now from that table i want name,address,phone no,email,website.. >>> using preg_replace function i was able to find all those things but >>> it removes <A> tag so email and website are also removed...can >>> anyone tell me how i can find email and website first from that >>> code and then using preg_replace i can get other records...or else >>> can anyone tell me any better solution like currently by using >>> while loop and using if condition i m breaking at the main table >>> and then fetching each record but any better solution ? >> >> Check out the DOM functions:http://uk.php.net/manual/en/ref.dom.php > > @paul > thanx for the help but, > can you explain me how i can use it ? > i didnt get i guess its for working with xml > how can i use with my problem ? As long as your HTML is "well formed", you can use the DOM functions to process it. |
|
|||
|
On Aug 4, 11:16 pm, "Paul Lautman" <paul.laut...@btinternet.com>
wrote: > Hardik Dangar wrote: > > On Aug 4, 10:26 pm, "Paul Lautman" <paul.laut...@btinternet.com> > > wrote: > >> Hardik Dangar wrote: > >>> hi friends, > >>> i have a bunch of html pages and i want to fetch records from them > >>> and i m really confused how i can do after working with regular > >>> expressions and other stuffs from last few days can anyone help me > >>> with this ? > > >>> i have a pages with html and table all scatter there..now i want > >>> just specific table from the page and all records in that page, i > >>> was successfull somehow but still have problems ,here are they.. > > >>> my example page...(just a table it has all the tags like html but i > >>> didnt write here just the thing i want is here) > > >>> <table> > >>> <tr> > >>> <img src=.."> > >>> </tr> > >>> <tr> > >>> <table> > >>> <tr> > >>> <tr> > >>> <td> > >>> <b>name1</B> > >>> <br> > >>> <font size=2 color=darkgray ><i>address1</i></font><br> > >>> <br>phone no > >>>> <a href=mailto:m...@gmail.com>E-mail1 </a> > >>>> <a href='www.website.com'target=_blank>website1</a> > >>> </font> > >>> </td> > >>> </tr> > >>> </tr> > >>> <tr> > >>> <b> name2</b> > >>> ......... > >>> </tr> > >>> </table> > >>> </tr> > >>> </table> > > >>> now from that table i want name,address,phone no,email,website.. > >>> using preg_replace function i was able to find all those things but > >>> it removes <A> tag so email and website are also removed...can > >>> anyone tell me how i can find email and website first from that > >>> code and then using preg_replace i can get other records...or else > >>> can anyone tell me any better solution like currently by using > >>> while loop and using if condition i m breaking at the main table > >>> and then fetching each record but any better solution ? > > >> Check out the DOM functions:http://uk.php.net/manual/en/ref.dom.php > > > @paul > > thanx for the help but, > > can you explain me how i can use it ? > > i didnt get i guess its for working with xml > > how can i use with my problem ? > > As long as your HTML is "well formed", you can use the DOM functions to > process it. @paul i have seen the documentation again and i find loadhtmlfile but steel i m very much confused how i can get my table data from webpage using those functions ? if you know anything then plz plz just give me simple example i m reading this dom thing again and again and getting interested in it... thank you very much for helping |
|
|||
|
On Aug 4, 11:34 am, Hardik Dangar <hardikdan...@gmail.com> wrote:
> On Aug 4, 11:16 pm, "Paul Lautman" <paul.laut...@btinternet.com> > wrote: > > > > > Hardik Dangar wrote: > > > On Aug 4, 10:26 pm, "Paul Lautman" <paul.laut...@btinternet.com> > > > wrote: > > >> Hardik Dangar wrote: > > >>> hi friends, > > >>> i have a bunch of html pages and i want to fetch records from them > > >>> and i m really confused how i can do after working with regular > > >>> expressions and other stuffs from last few days can anyone help me > > >>> with this ? > > > >>> i have a pages with html and table all scatter there..now i want > > >>> just specific table from the page and all records in that page, i > > >>> was successfull somehow but still have problems ,here are they.. > > > >>> my example page...(just a table it has all the tags like html but i > > >>> didnt write here just the thing i want is here) > > > >>> <table> > > >>> <tr> > > >>> <img src=.."> > > >>> </tr> > > >>> <tr> > > >>> <table> > > >>> <tr> > > >>> <tr> > > >>> <td> > > >>> <b>name1</B> > > >>> <br> > > >>> <font size=2 color=darkgray ><i>address1</i></font><br> > > >>> <br>phone no > > >>>> <a href=mailto:m...@gmail.com>E-mail1 </a> > > >>>> <a href='www.website.com'target=_blank>website1</a> > > >>> </font> > > >>> </td> > > >>> </tr> > > >>> </tr> > > >>> <tr> > > >>> <b> name2</b> > > >>> ......... > > >>> </tr> > > >>> </table> > > >>> </tr> > > >>> </table> > > > >>> now from that table i want name,address,phone no,email,website.. > > >>> using preg_replace function i was able to find all those things but > > >>> it removes <A> tag so email and website are also removed...can > > >>> anyone tell me how i can find email and website first from that > > >>> code and then using preg_replace i can get other records...or else > > >>> can anyone tell me any better solution like currently by using > > >>> while loop and using if condition i m breaking at the main table > > >>> and then fetching each record but any better solution ? > > > >> Check out the DOM functions:http://uk.php.net/manual/en/ref.dom.php > > > > @paul > > > thanx for the help but, > > > can you explain me how i can use it ? > > > i didnt get i guess its for working with xml > > > how can i use with my problem ? > > > As long as your HTML is "well formed", you can use the DOM functions to > > process it. > > @paul > i have seen the documentation again and i find loadhtmlfile but steel > i m very much confused how i can get my table data from webpage using > those functions ? if you know anything then plz plz just give me > simple example i m reading this dom thing again and again and getting > interested in it... > thank you very much for helping Use the XPath functionality of the DOM Functions to extract the tags you need. |
|
|||
|
On Aug 4, 12:23 pm, Hardik Dangar <hardikdan...@gmail.com> wrote:
[snip] > > now from that table i want name,address,phone no,email,website.. > using preg_replace function i was able to find all those things but it > removes <A> tag so email and website are also removed...can anyone > tell me how i can find email and website first from that code and then > using preg_replace i can get other records...or else can anyone tell > me any better solution like currently by using while loop and using if > condition i m breaking at the main table and then fetching each record > but any better solution ? This should do what you want (retrieving the email and URL): $matches = array(); preg_match('/<a href=[\'"]?mailto:([^>\'"\s]+).*<a href=[\'"]?([^> \'"\s]+)/s',$yourPageContents,$matches); print "email: $matches[1], url: $matches[2]"; HTH -Kurt |
|
|||
|
On Aug 6, 7:21 am, Kurt Milligan <kurt.milli...@gmail.com> wrote:
> On Aug 4, 12:23 pm, Hardik Dangar <hardikdan...@gmail.com> wrote: > [snip] > > > > > now from that table i want name,address,phone no,email,website.. > > using preg_replace function i was able to find all those things but it > > removes <A> tag so email and website are also removed...can anyone > > tell me how i can find email and website first from that code and then > > using preg_replace i can get other records...or else can anyone tell > > me any better solution like currently by using while loop and using if > > condition i m breaking at the main table and then fetching each record > > but any better solution ? > > This should do what you want (retrieving the email and URL): > > $matches = array(); > > preg_match('/<a href=[\'"]?mailto:([^>\'"\s]+).*<a href=[\'"]?([^> > \'"\s]+)/s',$yourPageContents,$matches); > > print "email: $matches[1], url: $matches[2]"; > > HTH > -Kurt thank you everyone i m almost at end of work but new problem... :( i've try to use dom functions and it did work fine at my home but once i try at server it give me error domdocument() expects at least 1 parameter, 0 given in /home/sphere/ public_html/hardik/curl/temp2.php on line 2 here is my code, $doc = new DOMDocument(); $doc->loadHTML($str); can anyone suggest me what is the problem |