This is a discussion on how to stop SE's listing 2 url's within the alt.comp.lang.php forums, part of the PHP Programming Forums category; Hello Am really worried, so wondered if anyone could help. My site outgrew itself recently so we've had to ...
|
|||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
|
|||
|
Hello
Am really worried, so wondered if anyone could help. My site outgrew itself recently so we've had to make changes to the url structure. I have some important url's like this: www.mysite.com/bluewidgets/, Yet now with the expansion of the site and url structure change (had to be done) we also have urls like: www.mysite.com/country1/bluewidgets/ which serves up identical content to the above first url. Is this bad? There is no way around it, cause if i dump my old url (i have 50 important ones kept) I will have to get around 6,000 webmasters to change my link url on their pages, which i dont want to have to do. My programmer says it wont be a problem with google etc, but i'm worried. I rely on this site for my income. Is it possible to stop google from crawling and most importantly listing the 50 new url's in the new format? So it sticks with the old ones? Everything is done with php/mod rewrite rules and so its not simple for me to know. Or is it possible to have 50 redirects from the new url's to the old ones? Will that stop google listing both? How do i get round this? The problem is, because the site is very much database driven, i have no way of making it use the old format url's for the 50 in question. I hope this all makes sense. Thanks for any help, Chris |
|
|||
|
Chris wrote:
> Hello > Am really worried, so wondered if anyone could help. > > My site outgrew itself recently so we've had to make changes to the url > structure. > I have some important url's like this: www.mysite.com/bluewidgets/, Yet > now with the expansion of the site and url structure change (had to be > done) we also have urls like: www.mysite.com/country1/bluewidgets/ which > serves up identical content to the above first url. > > Is this bad? There is no way around it, cause if i dump my old url (i have > 50 important ones kept) I will have to get around 6,000 webmasters to > change my link url on their pages, which i dont want to have to do. > My programmer says it wont be a problem with google etc, but i'm worried. > I rely on this site for my income. > Is it possible to stop google from crawling and most importantly listing > the 50 new url's in the new format? So it sticks with the old ones? > Everything is done with php/mod rewrite rules and so its not simple for me > to know. Or is it possible to have 50 redirects from the new url's to the > old ones? Will that stop google listing both? > > How do i get round this? The problem is, because the site is very much > database driven, i have no way of making it use the old format url's for > the 50 in question. I hope this all makes sense. Thanks for any help, > > Chris Put a robots.txt file in your web server's root dir and tell all user agents to NOT crawl /counrty1 /couuntry2 etc. There's no way to wildcard anything with robots.txt files, so DON'T try something like "disallow: /country*" etc. I currently tell robots/spiders to leave a bunch of virtual directories alone and it works well. By "virtual" I mean, they don't really exist in the file system, they are URL's that get rewritten with Apache rewrites etc. eg, http://www.mysite.eg/gallery/foo doesn't exist, but the rewrite rules get the correct files from the right place in the file system (<webroot>/content/users/foo/gallery). I've got "disallow: /gallery/foo" in my robots.txt file and google/msn/yahoo etc, all honour that. There's heaps of info online and tools to verify robots.txt files online - just google it ;) Cheers, James -- "In short, _N is Richardian if, and only if, _N is not Richardian." |