Re: [PHP] spider

This is a discussion on Re: [PHP] spider within the PHP General forums, part of the PHP Programming Forums category; ---- tedd <tedd@sperling.com> wrote: > Hi gang: > > How do you spider a remote web site ...


Go Back   Usenet Forums > PHP Programming Forums > PHP General

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 03-21-2008
Wolf
 
Posts: n/a
Default Re: [PHP] spider


---- tedd <tedd@sperling.com> wrote:
> Hi gang:
>
> How do you spider a remote web site in php?
>
> I get the general idea, which is to take the root page, strip out the
> links and repeat the process on those links. But, what's the code?
> Does anyone have an example they can share or a direction for me to
> take?
>
> Also, is there a way to spider through a remote web site gathering
> directory permissions?
>
> I know there are applications, such as Site-sucker, that will travel
> a remote web site looking for anything that it can download and if
> found, do so. But is there a way to determine what the permissions
> are for those directories?
>
> If not, can one attempt to write a file and record the
> failures/successes (0777 directories)?
>
> What I am trying to do is to develop a way to test if a web site is
> secure or not. I'm not trying to develop evil code, but if it can be
> done then I want to know how.
>
> Thanks and Cheers,
>
> tedd
>
> --
> -------
> http://sperling.com http://ancientstones.com http://earthstones.com


In one word: CURL

In another word: WGET

Both are pretty effecitve and give pretty much the same results, however with the CURL you can pass other things alone (user:pass) which with wget you can not do.

HTH,
Wolf
Reply With Quote
  #2 (permalink)  
Old 03-21-2008
Robert Cummings
 
Posts: n/a
Default Re: [PHP] spider


On Fri, 2008-03-21 at 13:58 -0400, Wolf wrote:
> ---- tedd <tedd@sperling.com> wrote:
>
> In one word: CURL
>
> In another word: WGET
>
> Both are pretty effecitve and give pretty much the same results, however
> with the CURL you can pass other things alone (user:pass) which with
> wget you can not do.


You can pass user and password via wget also:

wget http://user:password@interjinn.com/privateCrud

Or:

--user=USER --password=PASSWORD

Or more specifically so it doesn't also count for FTP:

--http-user=USER --http-password=PASSWORD

Cheers,
Rob.
--
http://www.interjinn.com
Application and Templating Framework for PHP

Reply With Quote
  #3 (permalink)  
Old 03-21-2008
Daniel Brown
 
Posts: n/a
Default Re: [PHP] spider

On Fri, Mar 21, 2008 at 1:58 PM, Wolf <lonewolf@nc.rr.com> wrote:
>
> In one word: CURL
>
> In another word: WGET
>
> Both are pretty effecitve and give pretty much the same results, however with the CURL you can pass other things alone (user:pass) which with wget you can not do.


pilotpig@pilotpig.net [~/www/img]# wget --help|grep -i password
--password=PASS set both ftp and http password to PASS.
--http-password=PASS set http password to PASS.
--proxy-password=PASS set PASS as proxy password.
--ftp-password=PASS set ftp password to PASS.

--
</Daniel P. Brown>
Forensic Services, Senior Unix Engineer
1+ (570-) 362-0283
Reply With Quote
  #4 (permalink)  
Old 03-21-2008
Børge Holen
 
Posts: n/a
Default Re: [PHP] spider

On Friday 21 March 2008 18:58:59 Wolf wrote:
> ---- tedd <tedd@sperling.com> wrote:
> > Hi gang:
> >
> > How do you spider a remote web site in php?
> >
> > I get the general idea, which is to take the root page, strip out the
> > links and repeat the process on those links. But, what's the code?
> > Does anyone have an example they can share or a direction for me to
> > take?
> >
> > Also, is there a way to spider through a remote web site gathering
> > directory permissions?
> >
> > I know there are applications, such as Site-sucker, that will travel
> > a remote web site looking for anything that it can download and if
> > found, do so. But is there a way to determine what the permissions
> > are for those directories?
> >
> > If not, can one attempt to write a file and record the
> > failures/successes (0777 directories)?
> >
> > What I am trying to do is to develop a way to test if a web site is
> > secure or not. I'm not trying to develop evil code, but if it can be
> > done then I want to know how.
> >
> > Thanks and Cheers,
> >
> > tedd
> >
> > --
> > -------
> > http://sperling.com http://ancientstones.com http://earthstones.com

>
> In one word: CURL
>
> In another word: WGET
>
> Both are pretty effecitve and give pretty much the same results, however
> with the CURL you can pass other things alone (user:pass) which with wget
> you can not do.


wget is fast and easy though... umm I'm on an direct 100mbit connection...
wget does it brute

>
> HTH,
> Wolf




--
---
Børge Holen
http://www.arivene.net
Reply With Quote
  #5 (permalink)  
Old 03-23-2008
Michelle Konzack
 
Posts: n/a
Default Re: [PHP] spider

Am 2008-03-21 13:58:59, schrieb Wolf:
> Both are pretty effecitve and give pretty much the same results,
> however with the CURL you can pass other things alone (user:pass)
> which with wget you can not do.


???

wget http://${USER}:${PASS}@some.url.tld/

is working and

wget --http-user="${USER}" --http-passwd="${PASS}" http://some.url.tld/

too. And since it is visible to any users on the local machine which
can excute "ps", you can put the user/passwd into your ~/.wgetrc.

Thanks, Greetings and nice Day
Michelle Konzack
Systemadministrator
24V Electronic Engineer
Tamay Dogan Network
Debian GNU/Linux Consultant


--
Linux-User #280138 with the Linux Counter, http://counter.li.org/
##################### Debian GNU/Linux Consultant #####################
Michelle Konzack Apt. 917 ICQ #328449886
+49/177/9351947 50, rue de Soultz MSN LinuxMichi
+33/6/61925193 67100 Strasbourg/France IRC #Debian (irc.icq.com)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFH5jpIC0FPBMSS+BIRArkpAJ9cfDxHmGjdJc6gCaFWSv 6nU0n7KwCffrXx
kQjv/C0UKwIOM0gqKsCrd3A=
=ooDZ
-----END PGP SIGNATURE-----

Reply With Quote
  #6 (permalink)  
Old 03-23-2008
Michelle Konzack
 
Posts: n/a
Default Re: [PHP] spider

Am 2008-03-21 19:15:13, schrieb Børge Holen:
> wget is fast and easy though... umm I'm on an direct 100mbit connection...
> wget does it brute


Sometimes it is too fast for me... :-)
Specialy If I work in Paris on my Dual-STM-4 network...

Then, --limit-rate=<rate> is my friend.

Thanks, Greetings and nice Day
Michelle Konzack
Systemadministrator
24V Electronic Engineer
Tamay Dogan Network
Debian GNU/Linux Consultant


--
Linux-User #280138 with the Linux Counter, http://counter.li.org/
##################### Debian GNU/Linux Consultant #####################
Michelle Konzack Apt. 917 ICQ #328449886
+49/177/9351947 50, rue de Soultz MSN LinuxMichi
+33/6/61925193 67100 Strasbourg/France IRC #Debian (irc.icq.com)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFH5jsEC0FPBMSS+BIRAm/VAKCVyjY2ekEknSeX++uzhp1o5LCVrQCg0rfJ
NDIAGK96GO5do1bOnS3x8wM=
=cRZ+
-----END PGP SIGNATURE-----

Reply With Quote
  #7 (permalink)  
Old 03-23-2008
Børge Holen
 
Posts: n/a
Default Re: [PHP] spider

On Sunday 23 March 2008 12:12:04 Michelle Konzack wrote:
> Am 2008-03-21 19:15:13, schrieb Børge Holen:
> > wget is fast and easy though... umm I'm on an direct 100mbit
> > connection... wget does it brute

>
> Sometimes it is too fast for me... :-)
> Specialy If I work in Paris on my Dual-STM-4 network...


wanna share?

>
> Then, --limit-rate=<rate> is my friend.
>
> Thanks, Greetings and nice Day
> Michelle Konzack
> Systemadministrator
> 24V Electronic Engineer
> Tamay Dogan Network
> Debian GNU/Linux Consultant




--
---
Børge Holen
http://www.arivene.net
Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 09:19 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0