Re: [PHP] Parse domain from URL

This is a discussion on Re: [PHP] Parse domain from URL within the PHP General forums, part of the PHP Programming Forums category; On 6/6/07, Brad Fuller <bfuller@cpacampaigns.com> wrote: > Hey guys, > > I'm faced ...


Go Back   Usenet Forums > PHP Programming Forums > PHP General

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 06-06-2007
Daniel Brown
 
Posts: n/a
Default Re: [PHP] Parse domain from URL

On 6/6/07, Brad Fuller <bfuller@cpacampaigns.com> wrote:
> Hey guys,
>
> I'm faced with an interesting problem, and wondering if there's an easy
> solution.
>
> I need to strip out a domain name from a URL, and ignore subdomains (like
> www)
>
> I can use parse_url to get the hostname. And my first thought was to take
> the last 2 segments of the hostname to get the domain. So if the URL is
> http://www.example.com/
> Then the domain is "example.com." If the URL is http://example.org/ then
> the domain is "example.org."
>
> This seemed to work perfectly until I come across a URL like
> http://www.example.co.uk/
> My script thinks the domain is "co.uk."
>
> So I added a bit of code to account for this, basically if the 2nd to last
> segment of the hostname is "co" then take the last 3 segments.
>
> Then I stumbled across a URL like http://www.example.com.au/
>
> So it occurred to me that this is not the best solution, unless I have a
> definitive list of all exceptions to go off of.
>
> Does anyone have any suggestions?
>
> Any advice is much appreciated.
>
> Thanks,
> Brad
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>


Well, it's not very clean, but if you just need to remove the
subdomain/CNAME from the domain....

<?
$hostname = parse_url($_SERVER['SERVER_NAME']);
$domsplit = explode('.',$hostname['path']);
for($i=1;$i<count($domsplit);$i++) {
$i == (count($domsplit) - 1) ? $domain .= $domsplit[$i] :
$domain .= $domsplit[$i].".";
}
echo $domain;
?>

There's probably a much better way to do it, but in the interest
of a quick response, that's one way.

--
Daniel P. Brown
[office] (570-) 587-7080 Ext. 272
[mobile] (570-) 766-8107
Reply With Quote
  #2 (permalink)  
Old 06-06-2007
Brad Fuller
 
Posts: n/a
Default RE: [PHP] Parse domain from URL

Daniel Brown wrote:
> On 6/6/07, Brad Fuller <bfuller@cpacampaigns.com> wrote:
>> Hey guys,
>>
>> I'm faced with an interesting problem, and wondering if there's an
>> easy solution.
>>
>> I need to strip out a domain name from a URL, and ignore subdomains
>> (like www)
>>
>> I can use parse_url to get the hostname. And my first thought was to
>> take the last 2 segments of the hostname to get the domain.

> So if the
>> URL is http://www.example.com/
>> Then the domain is "example.com." If the URL is
>> http://example.org/ then the domain is "example.org."
>>
>> This seemed to work perfectly until I come across a URL like
>> http://www.example.co.uk/ My script thinks the domain is "co.uk."
>>
>> So I added a bit of code to account for this, basically if the 2nd to
>> last segment of the hostname is "co" then take the last 3 segments.
>>
>> Then I stumbled across a URL like http://www.example.com.au/
>>
>> So it occurred to me that this is not the best solution, unless I
>> have a definitive list of all exceptions to go off of.
>>
>> Does anyone have any suggestions?
>>
>> Any advice is much appreciated.
>>
>> Thanks,
>> Brad
>>
>> --
>> PHP General Mailing List (http://www.php.net/) To unsubscribe,
>> visit: http://www.php.net/unsub.php
>>
>>

>
> Well, it's not very clean, but if you just need to remove
> the subdomain/CNAME from the domain....
>
> <?
> $hostname = parse_url($_SERVER['SERVER_NAME']);
> $domsplit = explode('.',$hostname['path']);
> for($i=1;$i<count($domsplit);$i++) {
> $i == (count($domsplit) - 1) ? $domain .= $domsplit[$i] :
> $domain .= $domsplit[$i]."."; }
> echo $domain;
>>

>
> There's probably a much better way to do it, but in the
> interest of a quick response, that's one way.
>
> --
> Daniel P. Brown
> [office] (570-) 587-7080 Ext. 272
> [mobile] (570-) 766-8107



Dan,

Yes, that's basically what my code already does.

The problem is that what if the url is "http://yahoo.co.uk/" (note the lack
of a subdomain)

Your script thinks that the domain is "co.uk". Just like my existing code
does.

So we can't count on taking the last 2 segments. And we can't count on
ignoring the first segment. (The subdomain could be anything, not just www)

Thx,

Brad
Reply With Quote
  #3 (permalink)  
Old 06-06-2007
Richard Davey
 
Posts: n/a
Default Re[2]: [PHP] Parse domain from URL

Hi Brad,

Wednesday, June 6, 2007, 5:04:41 PM, you wrote:

> Yes, that's basically what my code already does.


> The problem is that what if the url is "http://yahoo.co.uk/" (note the lack
> of a subdomain)


> Your script thinks that the domain is "co.uk". Just like my existing code
> does.


> So we can't count on taking the last 2 segments. And we can't count on
> ignoring the first segment. (The subdomain could be anything, not just www)


The complete list of top-level domains is neither very long, nor
changes very often. Perhaps a new country/suffix now and again is added or
renamed, but it doesn't happen that much.

In short, I think it'd be much better for you to obtain a full list
and then just remove that element from your hostname. Then you'll know
for a fact that whatever is left is the pure domain (+ sub-domain).

Of course it won't help if someone gives you the IP of a site (or any
other similar variation) instead of a domain name :)

Cheers,

Rich
--
Zend Certified Engineer
http://www.corephp.co.uk

"Never trust a computer you can't throw out of a window"
Reply With Quote
  #4 (permalink)  
Old 06-07-2007
Robin Vickery
 
Posts: n/a
Default Re: [PHP] Parse domain from URL

On 06/06/07, Brad Fuller <bfuller@cpacampaigns.com> wrote:
> Daniel Brown wrote:
> > On 6/6/07, Brad Fuller <bfuller@cpacampaigns.com> wrote:
> >>
> >> I need to strip out a domain name from a URL, and ignore subdomains
> >> (like www)
> >>
> >> I can use parse_url to get the hostname. And my first thought was to
> >> take the last 2 segments of the hostname to get the domain.

> > So if the
> >> URL is http://www.example.com/
> >> Then the domain is "example.com." If the URL is
> >> http://example.org/ then the domain is "example.org."
> >>
> >> This seemed to work perfectly until I come across a URL like
> >> http://www.example.co.uk/ My script thinks the domain is "co.uk."
> >>
> >> So I added a bit of code to account for this, basically if the 2nd to
> >> last segment of the hostname is "co" then take the last 3 segments.
> >>
> >> Then I stumbled across a URL like http://www.example.com.au/
> >>
> >> So it occurred to me that this is not the best solution, unless I
> >> have a definitive list of all exceptions to go off of.
> >>
> >> Does anyone have any suggestions?
> >>
> >> Any advice is much appreciated.

> >
> > Well, it's not very clean, but if you just need to remove
> > the subdomain/CNAME from the domain....
> >
> > <?
> > $hostname = parse_url($_SERVER['SERVER_NAME']);
> > $domsplit = explode('.',$hostname['path']);
> > for($i=1;$i<count($domsplit);$i++) {
> > $i == (count($domsplit) - 1) ? $domain .= $domsplit[$i] :
> > $domain .= $domsplit[$i]."."; }
> > echo $domain;
> >>

> >
> > There's probably a much better way to do it, but in the
> > interest of a quick response, that's one way.

>
>
> Yes, that's basically what my code already does.
>
> The problem is that what if the url is "http://yahoo.co.uk/" (note the lack
> of a subdomain)
>
> Your script thinks that the domain is "co.uk". Just like my existing code
> does.
>
> So we can't count on taking the last 2 segments. And we can't count on
> ignoring the first segment. (The subdomain could be anything, not just www)


In that case you can't do it just by parsing alone, you need to use DNS.

<?php
function get_domain ($hostname) {
dns_get_record($hostname, DNS_A, $authns, $addt);
return $authns[0]['host'];
}

print get_domain("www.google.com") . "\n";
print get_domain("google.com") . "\n";
print get_domain("www.google.co.uk") . "\n";
print get_domain("google.co.uk") . "\n";
print get_domain("google.co.uk") . "\n";
print get_domain("google.com.au") . "\n";
print get_domain("www.google.com.au") . "\n";

/* result
google.com
google.com
google.co.uk
google.co.uk
google.co.uk
google.com.au
google.com.au
*/
?>
Reply With Quote
  #5 (permalink)  
Old 06-07-2007
Daniel Brown
 
Posts: n/a
Default Re: [PHP] Parse domain from URL

On 6/7/07, Robin Vickery <robinv@gmail.com> wrote:
> On 06/06/07, Brad Fuller <bfuller@cpacampaigns.com> wrote:
> > Daniel Brown wrote:
> > > On 6/6/07, Brad Fuller <bfuller@cpacampaigns.com> wrote:
> > >>
> > >> I need to strip out a domain name from a URL, and ignore subdomains
> > >> (like www)
> > >>
> > >> I can use parse_url to get the hostname. And my first thought was to
> > >> take the last 2 segments of the hostname to get the domain.
> > > So if the
> > >> URL is http://www.example.com/
> > >> Then the domain is "example.com." If the URL is
> > >> http://example.org/ then the domain is "example.org."
> > >>
> > >> This seemed to work perfectly until I come across a URL like
> > >> http://www.example.co.uk/ My script thinks the domain is "co.uk."
> > >>
> > >> So I added a bit of code to account for this, basically if the 2nd to
> > >> last segment of the hostname is "co" then take the last 3 segments.
> > >>
> > >> Then I stumbled across a URL like http://www.example.com.au/
> > >>
> > >> So it occurred to me that this is not the best solution, unless I
> > >> have a definitive list of all exceptions to go off of.
> > >>
> > >> Does anyone have any suggestions?
> > >>
> > >> Any advice is much appreciated.
> > >
> > > Well, it's not very clean, but if you just need to remove
> > > the subdomain/CNAME from the domain....
> > >
> > > <?
> > > $hostname = parse_url($_SERVER['SERVER_NAME']);
> > > $domsplit = explode('.',$hostname['path']);
> > > for($i=1;$i<count($domsplit);$i++) {
> > > $i == (count($domsplit) - 1) ? $domain .= $domsplit[$i] :
> > > $domain .= $domsplit[$i]."."; }
> > > echo $domain;
> > >>
> > >
> > > There's probably a much better way to do it, but in the
> > > interest of a quick response, that's one way.

> >
> >
> > Yes, that's basically what my code already does.
> >
> > The problem is that what if the url is "http://yahoo.co.uk/" (note the lack
> > of a subdomain)
> >
> > Your script thinks that the domain is "co.uk". Just like my existing code
> > does.
> >
> > So we can't count on taking the last 2 segments. And we can't count on
> > ignoring the first segment. (The subdomain could be anything, not just www)

>
> In that case you can't do it just by parsing alone, you need to use DNS.
>
> <?php
> function get_domain ($hostname) {
> dns_get_record($hostname, DNS_A, $authns, $addt);
> return $authns[0]['host'];
> }
>
> print get_domain("www.google.com") . "\n";
> print get_domain("google.com") . "\n";
> print get_domain("www.google.co.uk") . "\n";
> print get_domain("google.co.uk") . "\n";
> print get_domain("google.co.uk") . "\n";
> print get_domain("google.com.au") . "\n";
> print get_domain("www.google.com.au") . "\n";
>
> /* result
> google.com
> google.com
> google.co.uk
> google.co.uk
> google.co.uk
> google.com.au
> google.com.au
> */
> ?>
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>


Wow.... great job, Robin.... I didn't even know about the
dns_get_record() function myself until just now. I can actually think
of a few places to use that now.... email validation, for one.

--
Daniel P. Brown
[office] (570-) 587-7080 Ext. 272
[mobile] (570-) 766-8107
Reply With Quote
  #6 (permalink)  
Old 06-07-2007
Brad Fuller
 
Posts: n/a
Default RE: [PHP] Parse domain from URL

Robin Vickery wrote:
> In that case you can't do it just by parsing alone, you need to use
> DNS.
>
> <?php
> function get_domain ($hostname) {
> dns_get_record($hostname, DNS_A, $authns, $addt); return
> $authns[0]['host']; }
>
> print get_domain("www.google.com") . "\n"; print
> get_domain("google.com") . "\n"; print
> get_domain("www.google.co.uk") . "\n"; print
> get_domain("google.co.uk") . "\n"; print
> get_domain("google.co.uk") . "\n"; print
> get_domain("google.com.au") . "\n"; print
> get_domain("www.google.com.au") . "\n";
>
> /* result
> google.com
> google.com
> google.co.uk
> google.co.uk
> google.co.uk
> google.com.au
> google.com.au
> */
>>



Robin,

This is a very good solution, and I thank you for your response. However I
had been experimenting with dns_get_record() before my original post and it
produces strange results on my machine. And your example, on my machine,
produces no output.

<?
$dns_result = dns_get_record("www.google.com", DNS_A, $authns, $addt);

print_r($dns_result);
print_r($authns);
print_r($addt);

/* result
Array
(
[0] => Array
(
[host] => www.l.google.com
[type] => A
[ip] => 64.233.161.99
[class] => IN
[ttl] => 136
)

[1] => Array
(
[host] => www.l.google.com
[type] => A
[ip] => 64.233.161.147
[class] => IN
[ttl] => 136
)

[2] => Array
(
[host] => www.l.google.com
[type] => A
[ip] => 64.233.161.103
[class] => IN
[ttl] => 136
)

[3] => Array
(
[host] => www.l.google.com
[type] => A
[ip] => 64.233.161.104
[class] => IN
[ttl] => 136
)

)
Array
(
)
Array
(
)
*/

?>

Any suggestions??


Thanks,
Brad
Reply With Quote
  #7 (permalink)  
Old 06-07-2007
Tijnema
 
Posts: n/a
Default Re: [PHP] Parse domain from URL

On 6/7/07, Brad Fuller <bfuller@cpacampaigns.com> wrote:
> Robin Vickery wrote:
> > In that case you can't do it just by parsing alone, you need to use
> > DNS.
> >
> > <?php
> > function get_domain ($hostname) {
> > dns_get_record($hostname, DNS_A, $authns, $addt); return
> > $authns[0]['host']; }
> >
> > print get_domain("www.google.com") . "\n"; print
> > get_domain("google.com") . "\n"; print
> > get_domain("www.google.co.uk") . "\n"; print
> > get_domain("google.co.uk") . "\n"; print
> > get_domain("google.co.uk") . "\n"; print
> > get_domain("google.com.au") . "\n"; print
> > get_domain("www.google.com.au") . "\n";
> >
> > /* result
> > google.com
> > google.com
> > google.co.uk
> > google.co.uk
> > google.co.uk
> > google.com.au
> > google.com.au
> > */
> >>

>
>
> Robin,
>
> This is a very good solution, and I thank you for your response. However I
> had been experimenting with dns_get_record() before my original post and it
> produces strange results on my machine. And your example, on my machine,
> produces no output.
>
> <?
> $dns_result = dns_get_record("www.google.com", DNS_A, $authns, $addt);
>
> print_r($dns_result);
> print_r($authns);
> print_r($addt);
>
> /* result
> Array
> (
> [0] => Array
> (
> [host] => www.l.google.com
> [type] => A
> [ip] => 64.233.161.99
> [class] => IN
> [ttl] => 136
> )
>
> [1] => Array
> (
> [host] => www.l.google.com
> [type] => A
> [ip] => 64.233.161.147
> [class] => IN
> [ttl] => 136
> )
>
> [2] => Array
> (
> [host] => www.l.google.com
> [type] => A
> [ip] => 64.233.161.103
> [class] => IN
> [ttl] => 136
> )
>
> [3] => Array
> (
> [host] => www.l.google.com
> [type] => A
> [ip] => 64.233.161.104
> [class] => IN
> [ttl] => 136
> )
>
> )
> Array
> (
> )
> Array
> (
> )
> */
>
> ?>
>
> Any suggestions??
>
>
> Thanks,
> Brad


I have same results as you brad,
I have Apache 2.2.3 + PHP 5.2.3RC1, so if you finally get it working,
it's definitely not portable code :P
Maybe it's an option to talk to a whois server?

Tijnema
Reply With Quote
  #8 (permalink)  
Old 06-08-2007
Brad Fuller
 
Posts: n/a
Default RE: [PHP] Parse domain from URL

Tijnema wrote:
> On 6/7/07, Brad Fuller <bfuller@cpacampaigns.com> wrote:
>> Robin Vickery wrote:
>>> In that case you can't do it just by parsing alone, you need to use
>>> DNS.
>>>
>>> <?php
>>> function get_domain ($hostname) {
>>> dns_get_record($hostname, DNS_A, $authns, $addt); return
>>> $authns[0]['host']; }
>>>
>>> print get_domain("www.google.com") . "\n"; print
>>> get_domain("google.com") . "\n"; print
>>> get_domain("www.google.co.uk") . "\n"; print
>>> get_domain("google.co.uk") . "\n"; print
>>> get_domain("google.co.uk") . "\n"; print
>>> get_domain("google.com.au") . "\n"; print
>>> get_domain("www.google.com.au") . "\n";
>>>
>>> /* result
>>> google.com
>>> google.com
>>> google.co.uk
>>> google.co.uk
>>> google.co.uk
>>> google.com.au
>>> google.com.au
>>> */
>>>>

>>
>>
>> Robin,
>>
>> This is a very good solution, and I thank you for your response.
>> However I had been experimenting with dns_get_record() before my
>> original post and it produces strange results on my machine. And
>> your example, on my machine, produces no output.
>>
>> <?
>> $dns_result = dns_get_record("www.google.com", DNS_A, $authns,
>> $addt);
>>
>> print_r($dns_result);
>> print_r($authns);
>> print_r($addt);
>>
>> /* result
>> Array
>> (
>> [0] => Array
>> (
>> [host] => www.l.google.com
>> [type] => A
>> [ip] => 64.233.161.99
>> [class] => IN
>> [ttl] => 136
>> )
>>
>> [snip]
>> )
>> Array
>> (
>> )
>> Array
>> (
>> )
>> */
>>
>>>

>>
>> Any suggestions??
>>
>>
>> Thanks,
>> Brad

>
> I have same results as you brad,
> I have Apache 2.2.3 + PHP 5.2.3RC1, so if you finally get it
> working, it's definitely not portable code :P Maybe it's an
> option to talk to a whois server?
>
> Tijnema


Actually I need to get the root domain from the URL as a previous step to a
WHOIS query. You can't do a WHOIS on "www.example.com". It has to be
"example.com" only.

dig www.example.com doesn't always give the information I need either.

I think DNS is the way to go, but need to figure out why dns_get_record()
returns an empty "authns" array for some of us but works properly for Robin
and for the example in the manual. I haven't found anything yet, but I'll
keep searching.

Thanks,

Brad
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +1. The time now is 02:28 PM.


Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0