finding a specific area from page using regular expression

This is a discussion on finding a specific area from page using regular expression within the PHP Language forums, part of the PHP Programming Forums category; hi friends, i have a bunch of html pages and i want to fetch records from them and i m ...


Go Back   Usenet Forums > PHP Programming Forums > PHP Language

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 08-04-2007
Hardik Dangar
 
Posts: n/a
Default finding a specific area from page using regular expression

hi friends,
i have a bunch of html pages and i want to fetch records from them and
i m really confused how i can do after working with regular
expressions and other stuffs from last few days can anyone help me
with this ?

i have a pages with html and table all scatter there..now i want just
specific table from the page and all records in that page, i was
successfull somehow but still have problems ,here are they..

my example page...(just a table it has all the tags like html but i
didnt write here just the thing i want is here)

<table>
<tr>
<img src=..">
</tr>
<tr>
<table>
<tr>
<tr>
<td>
<b>name1</B>
<br>
<font size=2 color=darkgray ><i>address1</i></font><br>
<br>phone no
| <a href=mailto:mail@gmail.com>E-mail1 </a>
| <a href='www.website.com' target=_blank>website1</a>
</font>
</td>
</tr>
</tr>
<tr>
<b> name2</b>
.........
</tr>
</table>
</tr>
</table>

now from that table i want name,address,phone no,email,website..
using preg_replace function i was able to find all those things but it
removes <A> tag so email and website are also removed...can anyone
tell me how i can find email and website first from that code and then
using preg_replace i can get other records...or else can anyone tell
me any better solution like currently by using while loop and using if
condition i m breaking at the main table and then fetching each record
but any better solution ?

Reply With Quote
  #2 (permalink)  
Old 08-04-2007
Paul Lautman
 
Posts: n/a
Default Re: finding a specific area from page using regular expression

Hardik Dangar wrote:
> hi friends,
> i have a bunch of html pages and i want to fetch records from them and
> i m really confused how i can do after working with regular
> expressions and other stuffs from last few days can anyone help me
> with this ?
>
> i have a pages with html and table all scatter there..now i want just
> specific table from the page and all records in that page, i was
> successfull somehow but still have problems ,here are they..
>
> my example page...(just a table it has all the tags like html but i
> didnt write here just the thing i want is here)
>
> <table>
> <tr>
> <img src=..">
> </tr>
> <tr>
> <table>
> <tr>
> <tr>
> <td>
> <b>name1</B>
> <br>
> <font size=2 color=darkgray ><i>address1</i></font><br>
> <br>phone no
>> <a href=mailto:mail@gmail.com>E-mail1 </a>
>> <a href='www.website.com' target=_blank>website1</a>

> </font>
> </td>
> </tr>
> </tr>
> <tr>
> <b> name2</b>
> .........
> </tr>
> </table>
> </tr>
> </table>
>
> now from that table i want name,address,phone no,email,website..
> using preg_replace function i was able to find all those things but it
> removes <A> tag so email and website are also removed...can anyone
> tell me how i can find email and website first from that code and then
> using preg_replace i can get other records...or else can anyone tell
> me any better solution like currently by using while loop and using if
> condition i m breaking at the main table and then fetching each record
> but any better solution ?


Check out the DOM functions:
http://uk.php.net/manual/en/ref.dom.php


Reply With Quote
  #3 (permalink)  
Old 08-04-2007
Hardik Dangar
 
Posts: n/a
Default Re: finding a specific area from page using regular expression

On Aug 4, 10:26 pm, "Paul Lautman" <paul.laut...@btinternet.com>
wrote:
> Hardik Dangar wrote:
> > hi friends,
> > i have a bunch of html pages and i want to fetch records from them and
> > i m really confused how i can do after working with regular
> > expressions and other stuffs from last few days can anyone help me
> > with this ?

>
> > i have a pages with html and table all scatter there..now i want just
> > specific table from the page and all records in that page, i was
> > successfull somehow but still have problems ,here are they..

>
> > my example page...(just a table it has all the tags like html but i
> > didnt write here just the thing i want is here)

>
> > <table>
> > <tr>
> > <img src=..">
> > </tr>
> > <tr>
> > <table>
> > <tr>
> > <tr>
> > <td>
> > <b>name1</B>
> > <br>
> > <font size=2 color=darkgray ><i>address1</i></font><br>
> > <br>phone no
> >> <a href=mailto:m...@gmail.com>E-mail1 </a>
> >> <a href='www.website.com'target=_blank>website1</a>

> > </font>
> > </td>
> > </tr>
> > </tr>
> > <tr>
> > <b> name2</b>
> > .........
> > </tr>
> > </table>
> > </tr>
> > </table>

>
> > now from that table i want name,address,phone no,email,website..
> > using preg_replace function i was able to find all those things but it
> > removes <A> tag so email and website are also removed...can anyone
> > tell me how i can find email and website first from that code and then
> > using preg_replace i can get other records...or else can anyone tell
> > me any better solution like currently by using while loop and using if
> > condition i m breaking at the main table and then fetching each record
> > but any better solution ?

>
> Check out the DOM functions:http://uk.php.net/manual/en/ref.dom.php


@paul
thanx for the help but,
can you explain me how i can use it ?
i didnt get i guess its for working with xml
how can i use with my problem ?

Reply With Quote
  #4 (permalink)  
Old 08-04-2007
Paul Lautman
 
Posts: n/a
Default Re: finding a specific area from page using regular expression

Hardik Dangar wrote:
> On Aug 4, 10:26 pm, "Paul Lautman" <paul.laut...@btinternet.com>
> wrote:
>> Hardik Dangar wrote:
>>> hi friends,
>>> i have a bunch of html pages and i want to fetch records from them
>>> and i m really confused how i can do after working with regular
>>> expressions and other stuffs from last few days can anyone help me
>>> with this ?

>>
>>> i have a pages with html and table all scatter there..now i want
>>> just specific table from the page and all records in that page, i
>>> was successfull somehow but still have problems ,here are they..

>>
>>> my example page...(just a table it has all the tags like html but i
>>> didnt write here just the thing i want is here)

>>
>>> <table>
>>> <tr>
>>> <img src=..">
>>> </tr>
>>> <tr>
>>> <table>
>>> <tr>
>>> <tr>
>>> <td>
>>> <b>name1</B>
>>> <br>
>>> <font size=2 color=darkgray ><i>address1</i></font><br>
>>> <br>phone no
>>>> <a href=mailto:m...@gmail.com>E-mail1 </a>
>>>> <a href='www.website.com'target=_blank>website1</a>
>>> </font>
>>> </td>
>>> </tr>
>>> </tr>
>>> <tr>
>>> <b> name2</b>
>>> .........
>>> </tr>
>>> </table>
>>> </tr>
>>> </table>

>>
>>> now from that table i want name,address,phone no,email,website..
>>> using preg_replace function i was able to find all those things but
>>> it removes <A> tag so email and website are also removed...can
>>> anyone tell me how i can find email and website first from that
>>> code and then using preg_replace i can get other records...or else
>>> can anyone tell me any better solution like currently by using
>>> while loop and using if condition i m breaking at the main table
>>> and then fetching each record but any better solution ?

>>
>> Check out the DOM functions:http://uk.php.net/manual/en/ref.dom.php

>
> @paul
> thanx for the help but,
> can you explain me how i can use it ?
> i didnt get i guess its for working with xml
> how can i use with my problem ?

As long as your HTML is "well formed", you can use the DOM functions to
process it.


Reply With Quote
  #5 (permalink)  
Old 08-04-2007
Hardik Dangar
 
Posts: n/a
Default Re: finding a specific area from page using regular expression

On Aug 4, 11:16 pm, "Paul Lautman" <paul.laut...@btinternet.com>
wrote:
> Hardik Dangar wrote:
> > On Aug 4, 10:26 pm, "Paul Lautman" <paul.laut...@btinternet.com>
> > wrote:
> >> Hardik Dangar wrote:
> >>> hi friends,
> >>> i have a bunch of html pages and i want to fetch records from them
> >>> and i m really confused how i can do after working with regular
> >>> expressions and other stuffs from last few days can anyone help me
> >>> with this ?

>
> >>> i have a pages with html and table all scatter there..now i want
> >>> just specific table from the page and all records in that page, i
> >>> was successfull somehow but still have problems ,here are they..

>
> >>> my example page...(just a table it has all the tags like html but i
> >>> didnt write here just the thing i want is here)

>
> >>> <table>
> >>> <tr>
> >>> <img src=..">
> >>> </tr>
> >>> <tr>
> >>> <table>
> >>> <tr>
> >>> <tr>
> >>> <td>
> >>> <b>name1</B>
> >>> <br>
> >>> <font size=2 color=darkgray ><i>address1</i></font><br>
> >>> <br>phone no
> >>>> <a href=mailto:m...@gmail.com>E-mail1 </a>
> >>>> <a href='www.website.com'target=_blank>website1</a>
> >>> </font>
> >>> </td>
> >>> </tr>
> >>> </tr>
> >>> <tr>
> >>> <b> name2</b>
> >>> .........
> >>> </tr>
> >>> </table>
> >>> </tr>
> >>> </table>

>
> >>> now from that table i want name,address,phone no,email,website..
> >>> using preg_replace function i was able to find all those things but
> >>> it removes <A> tag so email and website are also removed...can
> >>> anyone tell me how i can find email and website first from that
> >>> code and then using preg_replace i can get other records...or else
> >>> can anyone tell me any better solution like currently by using
> >>> while loop and using if condition i m breaking at the main table
> >>> and then fetching each record but any better solution ?

>
> >> Check out the DOM functions:http://uk.php.net/manual/en/ref.dom.php

>
> > @paul
> > thanx for the help but,
> > can you explain me how i can use it ?
> > i didnt get i guess its for working with xml
> > how can i use with my problem ?

>
> As long as your HTML is "well formed", you can use the DOM functions to
> process it.


@paul
i have seen the documentation again and i find loadhtmlfile but steel
i m very much confused how i can get my table data from webpage using
those functions ? if you know anything then plz plz just give me
simple example i m reading this dom thing again and again and getting
interested in it...
thank you very much for helping

Reply With Quote
  #6 (permalink)  
Old 08-05-2007
Neil
 
Posts: n/a
Default Re: finding a specific area from page using regular expression

On Aug 4, 11:34 am, Hardik Dangar <hardikdan...@gmail.com> wrote:
> On Aug 4, 11:16 pm, "Paul Lautman" <paul.laut...@btinternet.com>
> wrote:
>
>
>
> > Hardik Dangar wrote:
> > > On Aug 4, 10:26 pm, "Paul Lautman" <paul.laut...@btinternet.com>
> > > wrote:
> > >> Hardik Dangar wrote:
> > >>> hi friends,
> > >>> i have a bunch of html pages and i want to fetch records from them
> > >>> and i m really confused how i can do after working with regular
> > >>> expressions and other stuffs from last few days can anyone help me
> > >>> with this ?

>
> > >>> i have a pages with html and table all scatter there..now i want
> > >>> just specific table from the page and all records in that page, i
> > >>> was successfull somehow but still have problems ,here are they..

>
> > >>> my example page...(just a table it has all the tags like html but i
> > >>> didnt write here just the thing i want is here)

>
> > >>> <table>
> > >>> <tr>
> > >>> <img src=..">
> > >>> </tr>
> > >>> <tr>
> > >>> <table>
> > >>> <tr>
> > >>> <tr>
> > >>> <td>
> > >>> <b>name1</B>
> > >>> <br>
> > >>> <font size=2 color=darkgray ><i>address1</i></font><br>
> > >>> <br>phone no
> > >>>> <a href=mailto:m...@gmail.com>E-mail1 </a>
> > >>>> <a href='www.website.com'target=_blank>website1</a>
> > >>> </font>
> > >>> </td>
> > >>> </tr>
> > >>> </tr>
> > >>> <tr>
> > >>> <b> name2</b>
> > >>> .........
> > >>> </tr>
> > >>> </table>
> > >>> </tr>
> > >>> </table>

>
> > >>> now from that table i want name,address,phone no,email,website..
> > >>> using preg_replace function i was able to find all those things but
> > >>> it removes <A> tag so email and website are also removed...can
> > >>> anyone tell me how i can find email and website first from that
> > >>> code and then using preg_replace i can get other records...or else
> > >>> can anyone tell me any better solution like currently by using
> > >>> while loop and using if condition i m breaking at the main table
> > >>> and then fetching each record but any better solution ?

>
> > >> Check out the DOM functions:http://uk.php.net/manual/en/ref.dom.php

>
> > > @paul
> > > thanx for the help but,
> > > can you explain me how i can use it ?
> > > i didnt get i guess its for working with xml
> > > how can i use with my problem ?

>
> > As long as your HTML is "well formed", you can use the DOM functions to
> > process it.

>
> @paul
> i have seen the documentation again and i find loadhtmlfile but steel
> i m very much confused how i can get my table data from webpage using
> those functions ? if you know anything then plz plz just give me
> simple example i m reading this dom thing again and again and getting
> interested in it...
> thank you very much for helping


Use the XPath functionality of the DOM Functions to extract the tags
you need.

Reply With Quote
  #7 (permalink)  
Old 08-06-2007
Kurt Milligan
 
Posts: n/a
Default Re: finding a specific area from page using regular expression

On Aug 4, 12:23 pm, Hardik Dangar <hardikdan...@gmail.com> wrote:
[snip]
>
> now from that table i want name,address,phone no,email,website..
> using preg_replace function i was able to find all those things but it
> removes <A> tag so email and website are also removed...can anyone
> tell me how i can find email and website first from that code and then
> using preg_replace i can get other records...or else can anyone tell
> me any better solution like currently by using while loop and using if
> condition i m breaking at the main table and then fetching each record
> but any better solution ?


This should do what you want (retrieving the email and URL):

$matches = array();

preg_match('/<a href=[\'"]?mailto:([^>\'"\s]+).*<a href=[\'"]?([^>
\'"\s]+)/s',$yourPageContents,$matches);

print "email: $matches[1], url: $matches[2]";

HTH
-Kurt

Reply With Quote
  #8 (permalink)  
Old 08-07-2007
Hardik Dangar
 
Posts: n/a
Default Re: finding a specific area from page using regular expression

On Aug 6, 7:21 am, Kurt Milligan <kurt.milli...@gmail.com> wrote:
> On Aug 4, 12:23 pm, Hardik Dangar <hardikdan...@gmail.com> wrote:
> [snip]
>
>
>
> > now from that table i want name,address,phone no,email,website..
> > using preg_replace function i was able to find all those things but it
> > removes <A> tag so email and website are also removed...can anyone
> > tell me how i can find email and website first from that code and then
> > using preg_replace i can get other records...or else can anyone tell
> > me any better solution like currently by using while loop and using if
> > condition i m breaking at the main table and then fetching each record
> > but any better solution ?

>
> This should do what you want (retrieving the email and URL):
>
> $matches = array();
>
> preg_match('/<a href=[\'"]?mailto:([^>\'"\s]+).*<a href=[\'"]?([^>
> \'"\s]+)/s',$yourPageContents,$matches);
>
> print "email: $matches[1], url: $matches[2]";
>
> HTH
> -Kurt

thank you everyone i m almost at end of work but new problem... :(
i've try to use dom functions and it did work fine at my home but once
i try at server it give me error

domdocument() expects at least 1 parameter, 0 given in /home/sphere/
public_html/hardik/curl/temp2.php on line 2


here is my code,


$doc = new DOMDocument();
$doc->loadHTML($str);

can anyone suggest me what is the problem

Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT +1. The time now is 05:49 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0