detection of a robot in php

This is a discussion on detection of a robot in php within the PHP Language forums, part of the PHP Programming Forums category; Hello everybody :) A friend recently showed me an odd thing while playing with the command wget under linux, I don'...


Go Back   Usenet Forums > PHP Programming Forums > PHP Language

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 08-09-2006
giminik@gmail.com
 
Posts: n/a
Default detection of a robot in php

Hello everybody :)

A friend recently showed me an odd thing while playing with the command
wget under linux, I don't know why... But the result has surprised me :
$ wget http://www.prizee.com/parole.php
--02:35:29-- http://www.prizee.com/parole.php
=> `parole.php'
Resolution de www.prizee.com... 213.186.63.5
Connexion vers www.prizee.com|213.186.63.5|:80...connecte.
requete HTTP transmise, en attente de la reponse...302 Found
Emplacement: /index.php?joueur=1 [suivant]
--02:35:30-- http://www.prizee.com/index.php?joueur=1
=> `index.php?joueur=1.1'
Connexion vers www.prizee.com|213.186.63.5|:80...connecte.
requete HTTP transmise, en attente de la reponse...200 OK
Longueur: non specifie [text/html]

[ <=>

] 12,521
--.--K/s

02:35:30 (103.57 KB/s) - ? index.php?joueur=1.1 a sauvegarde [12521]


Then, he obtains an http error code (302) which redirect him on the
index page of the site.
With a browser like firefox, ie, safari we get the good page without
any redirection.
After that, I've made some tests. I tried to change the user agent
string with wget to identify it as mozilla, but I have the same result
(redirection). I tried links (command line browser) and curl but same
problem.
Here is the result of curl command :

$ curl -v http://www.prizee.com/parole.php
* About to connect() to www.prizee.com port 80
* Trying 213.186.63.5... connected
* Connected to www.prizee.com (213.186.63.5) port 80
> GET /parole.php HTTP/1.1
> User-Agent: curl/7.15.1 (i686-pc-linux-gnu) libcurl/7.15.1 GnuTLS/1.2.10 zlib/1.2.3 libidn/0.5.15
> Host: www.prizee.com
> Accept: */*
>

< HTTP/1.1 302 Found
< Date: Wed, 09 Aug 2006 00:02:57 GMT
< Server: Apache/1.3.33 (Unix) PHP/4.3.10
< X-Powered-By: PHP/4.3.10
< X-Accelerated-By: PHPA/1.3.3r2
< Expires: Mon, 26 Jul 1997 05:00:00 GMT
< Last-Modified: Wed, 09 Aug 2006 00:02:59 GMT
< Cache-Control: no-cache, must-revalidate
< Pragma: no-cache
< Set-Cookie: COOKIEis_accepted=1; path=/; domain=.prizee.com
< Location: /index.php?joueur=1
< Connection: close
< Transfer-Encoding: chunked
< Content-Type: text/html
* Closing connection #0


So, my question is : How we can detect the use of a command line tool
on a web site ? Like the site above. Thank you for your answers.

Sorry for my bad english, i'm french ;)

Reply With Quote
  #2 (permalink)  
Old 08-09-2006
Chris Hope
 
Posts: n/a
Default Re: detection of a robot in php

giminik@gmail.com wrote:

> Hello everybody :)
>
> A friend recently showed me an odd thing while playing with the
> command wget under linux, I don't know why... But the result has
> surprised me : $ wget http://www.prizee.com/parole.php
> --02:35:29-- http://www.prizee.com/parole.php
> => `parole.php'
> Resolution de www.prizee.com... 213.186.63.5
> Connexion vers www.prizee.com|213.186.63.5|:80...connecte.
> requete HTTP transmise, en attente de la reponse...302 Found
> Emplacement: /index.php?joueur=1 [suivant]
> --02:35:30-- http://www.prizee.com/index.php?joueur=1
> => `index.php?joueur=1.1'
> Connexion vers www.prizee.com|213.186.63.5|:80...connecte.
> requete HTTP transmise, en attente de la reponse...200 OK
> Longueur: non specifie [text/html]
>
> [ <=>
>
> ] 12,521
> --.--K/s
>
> 02:35:30 (103.57 KB/s) - ? index.php?joueur=1.1 a sauvegarde [12521]
>
>
> Then, he obtains an http error code (302) which redirect him on the
> index page of the site.
> With a browser like firefox, ie, safari we get the good page without
> any redirection.
> After that, I've made some tests. I tried to change the user agent
> string with wget to identify it as mozilla, but I have the same result
> (redirection). I tried links (command line browser) and curl but same
> problem.
> Here is the result of curl command :
>
> $ curl -v http://www.prizee.com/parole.php
> * About to connect() to www.prizee.com port 80
> * Trying 213.186.63.5... connected
> * Connected to www.prizee.com (213.186.63.5) port 80
>> GET /parole.php HTTP/1.1
>> User-Agent: curl/7.15.1 (i686-pc-linux-gnu) libcurl/7.15.1
>> GnuTLS/1.2.10 zlib/1.2.3 libidn/0.5.15 Host: www.prizee.com
>> Accept: */*
>>

> < HTTP/1.1 302 Found
> < Date: Wed, 09 Aug 2006 00:02:57 GMT
> < Server: Apache/1.3.33 (Unix) PHP/4.3.10
> < X-Powered-By: PHP/4.3.10
> < X-Accelerated-By: PHPA/1.3.3r2
> < Expires: Mon, 26 Jul 1997 05:00:00 GMT
> < Last-Modified: Wed, 09 Aug 2006 00:02:59 GMT
> < Cache-Control: no-cache, must-revalidate
> < Pragma: no-cache
> < Set-Cookie: COOKIEis_accepted=1; path=/; domain=.prizee.com
> < Location: /index.php?joueur=1
> < Connection: close
> < Transfer-Encoding: chunked
> < Content-Type: text/html
> * Closing connection #0
>
>
> So, my question is : How we can detect the use of a command line tool
> on a web site ? Like the site above. Thank you for your answers.


I tried both Firefox and Konqueror and they both redirected me to the
second page, so there doesn't appear to be anything different between
using wget and using a graphical browser, at least to me.

You can't detect the use of a command line tool if they set the user
agent correctly. For example:

wget --user-agent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"

followed by the url will tell the website you're using IE on Windows XP.

--
Chris Hope | www.electrictoolbox.com | www.linuxcdmall.com
Reply With Quote
  #3 (permalink)  
Old 08-09-2006
flamer die.spam@hotmail.com
 
Posts: n/a
Default Re: detection of a robot in php


giminik@gmail.com wrote:

> Hello everybody :)
>
> A friend recently showed me an odd thing while playing with the command
> wget under linux, I don't know why... But the result has surprised me :
> $ wget http://www.prizee.com/parole.php
> --02:35:29-- http://www.prizee.com/parole.php
> => `parole.php'
> Resolution de www.prizee.com... 213.186.63.5
> Connexion vers www.prizee.com|213.186.63.5|:80...connecte.
> requete HTTP transmise, en attente de la reponse...302 Found
> Emplacement: /index.php?joueur=1 [suivant]
> --02:35:30-- http://www.prizee.com/index.php?joueur=1
> => `index.php?joueur=1.1'
> Connexion vers www.prizee.com|213.186.63.5|:80...connecte.
> requete HTTP transmise, en attente de la reponse...200 OK
> Longueur: non specifie [text/html]
>
> [ <=>
>
> ] 12,521
> --.--K/s
>
> 02:35:30 (103.57 KB/s) - ? index.php?joueur=1.1 a sauvegarde [12521]
>
>
> Then, he obtains an http error code (302) which redirect him on the
> index page of the site.
> With a browser like firefox, ie, safari we get the good page without
> any redirection.
> After that, I've made some tests. I tried to change the user agent
> string with wget to identify it as mozilla, but I have the same result
> (redirection). I tried links (command line browser) and curl but same
> problem.
> Here is the result of curl command :
>
> $ curl -v http://www.prizee.com/parole.php
> * About to connect() to www.prizee.com port 80
> * Trying 213.186.63.5... connected
> * Connected to www.prizee.com (213.186.63.5) port 80
> > GET /parole.php HTTP/1.1
> > User-Agent: curl/7.15.1 (i686-pc-linux-gnu) libcurl/7.15.1 GnuTLS/1.2.10 zlib/1.2.3 libidn/0.5.15
> > Host: www.prizee.com
> > Accept: */*
> >

> < HTTP/1.1 302 Found
> < Date: Wed, 09 Aug 2006 00:02:57 GMT
> < Server: Apache/1.3.33 (Unix) PHP/4.3.10
> < X-Powered-By: PHP/4.3.10
> < X-Accelerated-By: PHPA/1.3.3r2
> < Expires: Mon, 26 Jul 1997 05:00:00 GMT
> < Last-Modified: Wed, 09 Aug 2006 00:02:59 GMT
> < Cache-Control: no-cache, must-revalidate
> < Pragma: no-cache
> < Set-Cookie: COOKIEis_accepted=1; path=/; domain=.prizee.com
> < Location: /index.php?joueur=1
> < Connection: close
> < Transfer-Encoding: chunked
> < Content-Type: text/html
> * Closing connection #0
>
>
> So, my question is : How we can detect the use of a command line tool
> on a web site ? Like the site above. Thank you for your answers.
>
> Sorry for my bad english, i'm french ;)


probably just redirects for linux users and not ms by checking the
agent-type.

Flamer.

Reply With Quote
  #4 (permalink)  
Old 08-09-2006
giminik@gmail.com
 
Posts: n/a
Default Re: detection of a robot in php

Thank for your answers.
I found the problem. It was a session cookie problem.
I've just used the wget option : --keep-session-cookies with
--load-cookies to solve the problem.

Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 10:24 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0