HostGator Web Hosting Help

How to use robots.txt

What is the purpose of the robots file?

When a search engine crawls (visits) your website, the first thing it looks for is your robots.txt file. This file tells search engines what they should and should not index (save and make available as search results to the public). It also may indicate the location of your XML sitemap. The search engine then sends its "bot" or "robot" or "spider" to crawl your site as directed in the robots.txt file (or not send it, if you said they could not).

Google's bot is called Googlebot, and Microsoft Bing's bot is called Bingbot. Many other search engines, like Excite, Lycos, Alexa and Ask Jeeves also have their own bots. Most bots are from search engines, although sometimes other sites send out bots for various reasons. For example, some sites may ask you to put code on your website to verify you own that website, and then they send a bot to see if you put the code on your site.

Keep in mind that robots.txt works like a "No Trespassing" sign. It tells robots whether you want them to crawl your site or not. It does not actually block access. Honorable and legitimate bots will honor your directive on whether they can visit or not. Rogue bots may simply ignore robots.txt.

Read Google's official stance on the robots.txt file.

Where does robots.txt go?

The robots.txt file belongs in your document root folder.

You can simply create a blank file and name it robots.txt. This will reduce site errors and allow all search engines to rank anything they want.

Blocking Robots and Search Engines from Crawling

If you want to stop bots from visiting you site and stop search engines from ranking you, use this code:

#Code to not allow any search engines!

            User-agent: *

            Disallow: /

You can also prevent robots from crawling parts of your site, while allowing them to crawl other sections. The following example would request search engines and robots not to crawl the cgi-bin folder, the tmp folder, and the junk folder and everything in those folders on your website.

# Blocks robots from specific folders / directories

            User-agent: *

            Disallow: /cgi-bin/

            Disallow: /tmp/

            Disallow: /junk/

In the above example, http://www.yoursitesdomain.com/junk/index.html would be one of the URLs blocked, but http://www.yoursitesdomain.com/index.html and http://www.yoursitesdomain.com/someotherfolder/ would be crawlable.

Knowledgebase Article

191,919 views

bookmark

Share or save this via:

tags: optimization robots seo

Related Help Content

Site Not First in Search Engine Results

Why is my site not the first result on Google or an other search engine?

Knowledgebase Article

325,801 views

tags: bing engine google search seo yahoo

Protect Specific Pages With SSL

This article will show a way to protect specific pages of your website with SSL. This may have benefits for SEO and can be used on pages that contain forms, shopping carts or any other page where users might enter sensitive information.

Knowledgebase Article

157,938 views

tags: htaccess pages specific ssl

Logging in and Using WordPress: Preventing Spam

Prevent Spam These steps will help you prevent users and robots from posting Spam on your WordPress blog. Inside the WordPress Dashboard, select Settings in the navigation bar on the left. From the G

Getting Started Article

261,410 views

tags: comment scroll spam wordpress

Domain WHOIS Update and Verification

Per ICANN, your contact information is included on the domain's record in the WHOIS database required information includes your full name, postal address, email address, and voice telephone number

Knowledgebase Article

13,217,874 views

tags: domain verification whois

WordPress Plugins to increase Performance.

My WordPress site is slow or seems down.

Knowledgebase Article

301,797 views

tags: cache optimization speed wordpress

FTP Setup Using Dreamweaver 8

How do I setup Dreamweaver 8 for FTP?

Knowledgebase Article

297,793 views

tags: client configure dream file information party program protocol set setup steps transfer

PHP information page

How do I view the php information page?

Knowledgebase Article

233,414 views

tags: php phpinfo

SiteLock - Verifying Domain and Account Information

This article will explain the process of verifying a domain and account information for a SiteLock account.

Knowledgebase Article

192,003 views

tags: security sitelock

HostGator Web Hosting Help

How to use robots.txt

What is the purpose of the robots file?

Where does robots.txt go?

Blocking Robots and Search Engines from Crawling

Recommended Help Content

Related Help Content