Best robots.txt Settings?

All about web hosting, reseller hosting, cpanel, optimization, using your hosting plan, etc.

Best robots.txt Settings?

Postby TheJets » Mon Dec 14, 2009 2:36 pm

I already posted a thread regarding .htaccess files: http://www.copahost.com/forum/shared-hosting-f3/best-htaccess-settings-t149.html

But i was also wondering if someone could help me and provide a robots.txt file i could use for all my sites?
TheJets
 
Posts: 10
Joined: Sat Dec 12, 2009 5:05 pm

Re: Best robots.txt Settings?

Postby Andrew » Mon Dec 14, 2009 2:45 pm

As i replied in your other thread, there is no 1 single robots.txt file that can be used for every single site on the internet. It depends what you want to block (and what you want to give access to).

Google and the robots.txt website is a great place to start.
http://www.robotstxt.org/
Andrew
 
Posts: 20
Joined: Fri Dec 11, 2009 5:56 am

Re: Best robots.txt Settings?

Postby Beaten Rice » Mon Dec 21, 2009 5:52 am

A robots.txt file,
when present in the root directory, indicates those areas of your site which should not be accessed or indexed by automated site crawlers (also called spiders) such as those used by search engines.

While spiders are supposed to follow the instructions contained within the robots.txt file, none are compelled to do so. Major search engines usually follow their instructions. Scores of other spiders, such as those used by spammers to collect email addresses, do not.
Robots.txt File Discussions

Searching robots.txt on Google will reveal scores of results. It is one of those regularly discussed topics on many discussion forums, including our own.

Opinions vary from something short and to the point, to endless lists of disallows. There are three points to really keep in mind:

* An improperly written robots.txt file can more harm than good, and disallow the indexing of content you'd like to see in a search engine.
* The robots.txt file being itself accessible, it provides a roadmap to all of the content you might want to keep private. Never consider trying to hide sensitive material by use of the robots.txt file. Any human visitor will have ready access.
* Scores of spiders ignore robots.txt files altogether. The latter include those used by spammers, but not only. Spiders used by desktop applications may ignore it as well, in order to allow their users to experience a faster browsing experience or search within bookmarks functionality.
Beaten Rice
 
Posts: 51
Joined: Mon Dec 21, 2009 5:15 am


Return to Shared hosting

Who is online

Users browsing this forum: No registered users and 1 guest

cron