Tipsglobe Logo Facebook Share Twitter Share Google Share LinkedIn Share StumbleUpon Share

0
If you own a self-hosted website or a blog, you have probably heard of this famous Robots.txt file right? In this guide, I am going to show not only why this file is important for every website, but also how to implement it properly.

The implementation of a proper robots.txt file is very important for your website search engine optimization.

About Robots.txt

It is basically a standard  on internet that prevent web crawlers and other web bots from accessing all or part of a website which is otherwise publicly viewable and crawled.
This is basically all you need to know. So you can easily prevent all search engine crawlers to access, crawl and read any folder in your root.

How This Protocol Works

As I can see, most websites today uses this standard to successfully prevent all web crawlers to see some parts of a website.
The most common examples are these:
User-agent: *
Disallow:
This above example simply allows all robots to view all files because the wildcard specifies all crawling robots.
If you want to tell all crawlers to prevent them reading your protected (example: temp) folder, then use this syntax below:
User-agent: *
Disallow: /temp/
In this above example, all web crawlers have no access to your temp folder in your website root. Of course, you can add more than one folders into your robots.txt file.
You can also prevent all crawlers to read a specific files on your website. All you need to do is to enter full path to that file in Disallow line.
Some crawlers support a sitemap protocol, allowing you to add multiple sitemaps in the same robots.txt file. This can be very useful for websites who use multiple sitemaps. All you need to to is to add the following line in your robots.txt file:
Sitemap: http://www.yoursite.com/sitemap-1.xml
Sitemap: http://www.yoursite.com/sitemap-2.xml
I use the first one and mostly used method. I allow all crawlers to access any part/folder on my website. Also, I have added a URL to my sitemap – just to make sure it is properly crawled.

Robots.txt on Popular Sites

There is a collection of robots.txt files from some popular sites on internet. You can also check this out yourself by entering /robots.txt right next to main address of any website you want.
First there are some popular sites:
And some popular blogs:
As you can see, some of the most popular blogs and websites on internet uses this protocol. Take some time and consider which folders and parts of your website you should prevent from being crawled.
This may help crawlers to be more effective when they crawl your website. Reducing their load and taking less time to crawl your website.

Post a Comment Blogger

 
Top