This is a discussion on Re: [PHP] Parse domain from URL within the PHP General forums, part of the PHP Programming Forums category; On 6/6/07, Brad Fuller <bfuller@cpacampaigns.com> wrote: > Hey guys, > > I'm faced ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
On 6/6/07, Brad Fuller <bfuller@cpacampaigns.com> wrote:
> Hey guys, > > I'm faced with an interesting problem, and wondering if there's an easy > solution. > > I need to strip out a domain name from a URL, and ignore subdomains (like > www) > > I can use parse_url to get the hostname. And my first thought was to take > the last 2 segments of the hostname to get the domain. So if the URL is > http://www.example.com/ > Then the domain is "example.com." If the URL is http://example.org/ then > the domain is "example.org." > > This seemed to work perfectly until I come across a URL like > http://www.example.co.uk/ > My script thinks the domain is "co.uk." > > So I added a bit of code to account for this, basically if the 2nd to last > segment of the hostname is "co" then take the last 3 segments. > > Then I stumbled across a URL like http://www.example.com.au/ > > So it occurred to me that this is not the best solution, unless I have a > definitive list of all exceptions to go off of. > > Does anyone have any suggestions? > > Any advice is much appreciated. > > Thanks, > Brad > > -- > PHP General Mailing List (http://www.php.net/) > To unsubscribe, visit: http://www.php.net/unsub.php > > Well, it's not very clean, but if you just need to remove the subdomain/CNAME from the domain.... <? $hostname = parse_url($_SERVER['SERVER_NAME']); $domsplit = explode('.',$hostname['path']); for($i=1;$i<count($domsplit);$i++) { $i == (count($domsplit) - 1) ? $domain .= $domsplit[$i] : $domain .= $domsplit[$i]."."; } echo $domain; ?> There's probably a much better way to do it, but in the interest of a quick response, that's one way. -- Daniel P. Brown [office] (570-) 587-7080 Ext. 272 [mobile] (570-) 766-8107 |
|
|||
|
Daniel Brown wrote:
> On 6/6/07, Brad Fuller <bfuller@cpacampaigns.com> wrote: >> Hey guys, >> >> I'm faced with an interesting problem, and wondering if there's an >> easy solution. >> >> I need to strip out a domain name from a URL, and ignore subdomains >> (like www) >> >> I can use parse_url to get the hostname. And my first thought was to >> take the last 2 segments of the hostname to get the domain. > So if the >> URL is http://www.example.com/ >> Then the domain is "example.com." If the URL is >> http://example.org/ then the domain is "example.org." >> >> This seemed to work perfectly until I come across a URL like >> http://www.example.co.uk/ My script thinks the domain is "co.uk." >> >> So I added a bit of code to account for this, basically if the 2nd to >> last segment of the hostname is "co" then take the last 3 segments. >> >> Then I stumbled across a URL like http://www.example.com.au/ >> >> So it occurred to me that this is not the best solution, unless I >> have a definitive list of all exceptions to go off of. >> >> Does anyone have any suggestions? >> >> Any advice is much appreciated. >> >> Thanks, >> Brad >> >> -- >> PHP General Mailing List (http://www.php.net/) To unsubscribe, >> visit: http://www.php.net/unsub.php >> >> > > Well, it's not very clean, but if you just need to remove > the subdomain/CNAME from the domain.... > > <? > $hostname = parse_url($_SERVER['SERVER_NAME']); > $domsplit = explode('.',$hostname['path']); > for($i=1;$i<count($domsplit);$i++) { > $i == (count($domsplit) - 1) ? $domain .= $domsplit[$i] : > $domain .= $domsplit[$i]."."; } > echo $domain; >> > > There's probably a much better way to do it, but in the > interest of a quick response, that's one way. > > -- > Daniel P. Brown > [office] (570-) 587-7080 Ext. 272 > [mobile] (570-) 766-8107 Dan, Yes, that's basically what my code already does. The problem is that what if the url is "http://yahoo.co.uk/" (note the lack of a subdomain) Your script thinks that the domain is "co.uk". Just like my existing code does. So we can't count on taking the last 2 segments. And we can't count on ignoring the first segment. (The subdomain could be anything, not just www) Thx, Brad |
|
|||
|
Hi Brad,
Wednesday, June 6, 2007, 5:04:41 PM, you wrote: > Yes, that's basically what my code already does. > The problem is that what if the url is "http://yahoo.co.uk/" (note the lack > of a subdomain) > Your script thinks that the domain is "co.uk". Just like my existing code > does. > So we can't count on taking the last 2 segments. And we can't count on > ignoring the first segment. (The subdomain could be anything, not just www) The complete list of top-level domains is neither very long, nor changes very often. Perhaps a new country/suffix now and again is added or renamed, but it doesn't happen that much. In short, I think it'd be much better for you to obtain a full list and then just remove that element from your hostname. Then you'll know for a fact that whatever is left is the pure domain (+ sub-domain). Of course it won't help if someone gives you the IP of a site (or any other similar variation) instead of a domain name :) Cheers, Rich -- Zend Certified Engineer http://www.corephp.co.uk "Never trust a computer you can't throw out of a window" |
|
|||
|
On 06/06/07, Brad Fuller <bfuller@cpacampaigns.com> wrote:
> Daniel Brown wrote: > > On 6/6/07, Brad Fuller <bfuller@cpacampaigns.com> wrote: > >> > >> I need to strip out a domain name from a URL, and ignore subdomains > >> (like www) > >> > >> I can use parse_url to get the hostname. And my first thought was to > >> take the last 2 segments of the hostname to get the domain. > > So if the > >> URL is http://www.example.com/ > >> Then the domain is "example.com." If the URL is > >> http://example.org/ then the domain is "example.org." > >> > >> This seemed to work perfectly until I come across a URL like > >> http://www.example.co.uk/ My script thinks the domain is "co.uk." > >> > >> So I added a bit of code to account for this, basically if the 2nd to > >> last segment of the hostname is "co" then take the last 3 segments. > >> > >> Then I stumbled across a URL like http://www.example.com.au/ > >> > >> So it occurred to me that this is not the best solution, unless I > >> have a definitive list of all exceptions to go off of. > >> > >> Does anyone have any suggestions? > >> > >> Any advice is much appreciated. > > > > Well, it's not very clean, but if you just need to remove > > the subdomain/CNAME from the domain.... > > > > <? > > $hostname = parse_url($_SERVER['SERVER_NAME']); > > $domsplit = explode('.',$hostname['path']); > > for($i=1;$i<count($domsplit);$i++) { > > $i == (count($domsplit) - 1) ? $domain .= $domsplit[$i] : > > $domain .= $domsplit[$i]."."; } > > echo $domain; > >> > > > > There's probably a much better way to do it, but in the > > interest of a quick response, that's one way. > > > Yes, that's basically what my code already does. > > The problem is that what if the url is "http://yahoo.co.uk/" (note the lack > of a subdomain) > > Your script thinks that the domain is "co.uk". Just like my existing code > does. > > So we can't count on taking the last 2 segments. And we can't count on > ignoring the first segment. (The subdomain could be anything, not just www) In that case you can't do it just by parsing alone, you need to use DNS. <?php function get_domain ($hostname) { dns_get_record($hostname, DNS_A, $authns, $addt); return $authns[0]['host']; } print get_domain("www.google.com") . "\n"; print get_domain("google.com") . "\n"; print get_domain("www.google.co.uk") . "\n"; print get_domain("google.co.uk") . "\n"; print get_domain("google.co.uk") . "\n"; print get_domain("google.com.au") . "\n"; print get_domain("www.google.com.au") . "\n"; /* result google.com google.com google.co.uk google.co.uk google.co.uk google.com.au google.com.au */ ?> |
|
|||
|
On 6/7/07, Robin Vickery <robinv@gmail.com> wrote:
> On 06/06/07, Brad Fuller <bfuller@cpacampaigns.com> wrote: > > Daniel Brown wrote: > > > On 6/6/07, Brad Fuller <bfuller@cpacampaigns.com> wrote: > > >> > > >> I need to strip out a domain name from a URL, and ignore subdomains > > >> (like www) > > >> > > >> I can use parse_url to get the hostname. And my first thought was to > > >> take the last 2 segments of the hostname to get the domain. > > > So if the > > >> URL is http://www.example.com/ > > >> Then the domain is "example.com." If the URL is > > >> http://example.org/ then the domain is "example.org." > > >> > > >> This seemed to work perfectly until I come across a URL like > > >> http://www.example.co.uk/ My script thinks the domain is "co.uk." > > >> > > >> So I added a bit of code to account for this, basically if the 2nd to > > >> last segment of the hostname is "co" then take the last 3 segments. > > >> > > >> Then I stumbled across a URL like http://www.example.com.au/ > > >> > > >> So it occurred to me that this is not the best solution, unless I > > >> have a definitive list of all exceptions to go off of. > > >> > > >> Does anyone have any suggestions? > > >> > > >> Any advice is much appreciated. > > > > > > Well, it's not very clean, but if you just need to remove > > > the subdomain/CNAME from the domain.... > > > > > > <? > > > $hostname = parse_url($_SERVER['SERVER_NAME']); > > > $domsplit = explode('.',$hostname['path']); > > > for($i=1;$i<count($domsplit);$i++) { > > > $i == (count($domsplit) - 1) ? $domain .= $domsplit[$i] : > > > $domain .= $domsplit[$i]."."; } > > > echo $domain; > > >> > > > > > > There's probably a much better way to do it, but in the > > > interest of a quick response, that's one way. > > > > > > Yes, that's basically what my code already does. > > > > The problem is that what if the url is "http://yahoo.co.uk/" (note the lack > > of a subdomain) > > > > Your script thinks that the domain is "co.uk". Just like my existing code > > does. > > > > So we can't count on taking the last 2 segments. And we can't count on > > ignoring the first segment. (The subdomain could be anything, not just www) > > In that case you can't do it just by parsing alone, you need to use DNS. > > <?php > function get_domain ($hostname) { > dns_get_record($hostname, DNS_A, $authns, $addt); > return $authns[0]['host']; > } > > print get_domain("www.google.com") . "\n"; > print get_domain("google.com") . "\n"; > print get_domain("www.google.co.uk") . "\n"; > print get_domain("google.co.uk") . "\n"; > print get_domain("google.co.uk") . "\n"; > print get_domain("google.com.au") . "\n"; > print get_domain("www.google.com.au") . "\n"; > > /* result > google.com > google.com > google.co.uk > google.co.uk > google.co.uk > google.com.au > google.com.au > */ > ?> > > -- > PHP General Mailing List (http://www.php.net/) > To unsubscribe, visit: http://www.php.net/unsub.php > > Wow.... great job, Robin.... I didn't even know about the dns_get_record() function myself until just now. I can actually think of a few places to use that now.... email validation, for one. -- Daniel P. Brown [office] (570-) 587-7080 Ext. 272 [mobile] (570-) 766-8107 |
|
|||
|
Robin Vickery wrote:
> In that case you can't do it just by parsing alone, you need to use > DNS. > > <?php > function get_domain ($hostname) { > dns_get_record($hostname, DNS_A, $authns, $addt); return > $authns[0]['host']; } > > print get_domain("www.google.com") . "\n"; print > get_domain("google.com") . "\n"; print > get_domain("www.google.co.uk") . "\n"; print > get_domain("google.co.uk") . "\n"; print > get_domain("google.co.uk") . "\n"; print > get_domain("google.com.au") . "\n"; print > get_domain("www.google.com.au") . "\n"; > > /* result > google.com > google.com > google.co.uk > google.co.uk > google.co.uk > google.com.au > google.com.au > */ >> Robin, This is a very good solution, and I thank you for your response. However I had been experimenting with dns_get_record() before my original post and it produces strange results on my machine. And your example, on my machine, produces no output. <? $dns_result = dns_get_record("www.google.com", DNS_A, $authns, $addt); print_r($dns_result); print_r($authns); print_r($addt); /* result Array ( [0] => Array ( [host] => www.l.google.com [type] => A [ip] => 64.233.161.99 [class] => IN [ttl] => 136 ) [1] => Array ( [host] => www.l.google.com [type] => A [ip] => 64.233.161.147 [class] => IN [ttl] => 136 ) [2] => Array ( [host] => www.l.google.com [type] => A [ip] => 64.233.161.103 [class] => IN [ttl] => 136 ) [3] => Array ( [host] => www.l.google.com [type] => A [ip] => 64.233.161.104 [class] => IN [ttl] => 136 ) ) Array ( ) Array ( ) */ ?> Any suggestions?? Thanks, Brad |
|
|||
|
On 6/7/07, Brad Fuller <bfuller@cpacampaigns.com> wrote:
> Robin Vickery wrote: > > In that case you can't do it just by parsing alone, you need to use > > DNS. > > > > <?php > > function get_domain ($hostname) { > > dns_get_record($hostname, DNS_A, $authns, $addt); return > > $authns[0]['host']; } > > > > print get_domain("www.google.com") . "\n"; print > > get_domain("google.com") . "\n"; print > > get_domain("www.google.co.uk") . "\n"; print > > get_domain("google.co.uk") . "\n"; print > > get_domain("google.co.uk") . "\n"; print > > get_domain("google.com.au") . "\n"; print > > get_domain("www.google.com.au") . "\n"; > > > > /* result > > google.com > > google.com > > google.co.uk > > google.co.uk > > google.co.uk > > google.com.au > > google.com.au > > */ > >> > > > Robin, > > This is a very good solution, and I thank you for your response. However I > had been experimenting with dns_get_record() before my original post and it > produces strange results on my machine. And your example, on my machine, > produces no output. > > <? > $dns_result = dns_get_record("www.google.com", DNS_A, $authns, $addt); > > print_r($dns_result); > print_r($authns); > print_r($addt); > > /* result > Array > ( > [0] => Array > ( > [host] => www.l.google.com > [type] => A > [ip] => 64.233.161.99 > [class] => IN > [ttl] => 136 > ) > > [1] => Array > ( > [host] => www.l.google.com > [type] => A > [ip] => 64.233.161.147 > [class] => IN > [ttl] => 136 > ) > > [2] => Array > ( > [host] => www.l.google.com > [type] => A > [ip] => 64.233.161.103 > [class] => IN > [ttl] => 136 > ) > > [3] => Array > ( > [host] => www.l.google.com > [type] => A > [ip] => 64.233.161.104 > [class] => IN > [ttl] => 136 > ) > > ) > Array > ( > ) > Array > ( > ) > */ > > ?> > > Any suggestions?? > > > Thanks, > Brad I have same results as you brad, I have Apache 2.2.3 + PHP 5.2.3RC1, so if you finally get it working, it's definitely not portable code :P Maybe it's an option to talk to a whois server? Tijnema |
|
|||
|
Tijnema wrote:
> On 6/7/07, Brad Fuller <bfuller@cpacampaigns.com> wrote: >> Robin Vickery wrote: >>> In that case you can't do it just by parsing alone, you need to use >>> DNS. >>> >>> <?php >>> function get_domain ($hostname) { >>> dns_get_record($hostname, DNS_A, $authns, $addt); return >>> $authns[0]['host']; } >>> >>> print get_domain("www.google.com") . "\n"; print >>> get_domain("google.com") . "\n"; print >>> get_domain("www.google.co.uk") . "\n"; print >>> get_domain("google.co.uk") . "\n"; print >>> get_domain("google.co.uk") . "\n"; print >>> get_domain("google.com.au") . "\n"; print >>> get_domain("www.google.com.au") . "\n"; >>> >>> /* result >>> google.com >>> google.com >>> google.co.uk >>> google.co.uk >>> google.co.uk >>> google.com.au >>> google.com.au >>> */ >>>> >> >> >> Robin, >> >> This is a very good solution, and I thank you for your response. >> However I had been experimenting with dns_get_record() before my >> original post and it produces strange results on my machine. And >> your example, on my machine, produces no output. >> >> <? >> $dns_result = dns_get_record("www.google.com", DNS_A, $authns, >> $addt); >> >> print_r($dns_result); >> print_r($authns); >> print_r($addt); >> >> /* result >> Array >> ( >> [0] => Array >> ( >> [host] => www.l.google.com >> [type] => A >> [ip] => 64.233.161.99 >> [class] => IN >> [ttl] => 136 >> ) >> >> [snip] >> ) >> Array >> ( >> ) >> Array >> ( >> ) >> */ >> >>> >> >> Any suggestions?? >> >> >> Thanks, >> Brad > > I have same results as you brad, > I have Apache 2.2.3 + PHP 5.2.3RC1, so if you finally get it > working, it's definitely not portable code :P Maybe it's an > option to talk to a whois server? > > Tijnema Actually I need to get the root domain from the URL as a previous step to a WHOIS query. You can't do a WHOIS on "www.example.com". It has to be "example.com" only. dig www.example.com doesn't always give the information I need either. I think DNS is the way to go, but need to figure out why dns_get_record() returns an empty "authns" array for some of us but works properly for Robin and for the example in the manual. I haven't found anything yet, but I'll keep searching. Thanks, Brad |
![]() |
| Thread Tools | |
| Display Modes | |
|
|