NIWA Community Forums

NIWA Community => Wiki References => Topic started by: FlyingRagnar on May 13, 2011, 05:28:53 PM

Title: Robots.txt
Post by: FlyingRagnar on May 13, 2011, 05:28:53 PM
Here's something I don't understand.  I have done some searching but have to yet to find a good answer.

If I google "Pikachu site:bulbapedia.bulbagarden.net" I get lots of articles.  If I switch to image search, I get lots of images.
If I google "Hero site:dragon-quest.org" I get lots of articles.  If I switch to image search, I get nothing.  What gives?

Does this have anything to do with the robots.txt for a site?  I'm not sure because the 2 sites really are similar.

http://bulbapedia.bulbagarden.net/robots.txt
http://dragon-quest.org/robots.txt
Title: Re: Robots.txt
Post by: Tappy on May 14, 2011, 01:31:31 AM
Probably because all images on your wiki are under dragon-quest.org/w/ which you have disallowed crawlers from going though.

For example the pages like below are just place holders, crawlers will see it, yes... but when they try to go to the image directly it's under /w/ which according to your robots.txt you don't allow crawlers to see.
 
http://www.dragon-quest.org/wiki/File:Main_wiki_logo.png

http://www.dragon-quest.org/w/images/c/cc/Main_wiki_logo.png
Title: Re: Robots.txt
Post by: FlyingRagnar on May 14, 2011, 04:33:02 AM
Yes that's exactly right.  Your post led me to realize that Bulbapedia is storing media in a location outside of /w/, which is why their rules are similar, but images still get crawled.  I guess the solution is to move the media somewhere else.

It looks like almost all the other NIWA wikis are doing the same.  All except Lylat and Nookipedia.  Nookipedia lets crawlers go everywhere, I would assume Lylat does as well.

Title: Re: Robots.txt
Post by: Jake on May 20, 2011, 07:54:04 PM
Yes that's exactly right.  Your post led me to realize that Bulbapedia is storing media in a location outside of /w/, which is why their rules are similar, but images still get crawled.  I guess the solution is to move the media somewhere else.

It looks like almost all the other NIWA wikis are doing the same.  All except Lylat and Nookipedia.  Nookipedia lets crawlers go everywhere, I would assume Lylat does as well.


I didn't realize our robots.txt file was set up like that. There were supposed to be some rules in there about special pages and skins. Thanks for letting me know. :)
Title: Re: Robots.txt
Post by: FlyingRagnar on May 20, 2011, 10:16:08 PM
No problem.  The tough part of changing robots.txt is that you then have to wait around for google or whatever bots to visit again.  I think I have mine sorted out now, but I'm still waiting for google to pick up all of our images.