10/07/17 // Written by admin

What is robots.txt? A Basic Guide For Site Owners

What is robots.txt?

If you have a website, then you need to understand what robots.txt file is and how it affects your site or you could be losing traffic. And everyone knows that traffic=revenue.

All websites are indexed by the search engine (such as Google) by using bots, also known as crawlers or spiders. Google’s bot is called Googlebot, Baidu has Baiduspider and Bing has Bingbot.

Robots.txt is a text file which tells these bots how to crawl your site. This is a directive, which means that unscrupulous bots may not abide by the recommendation.  You are essentially telling the bots what you do not want them to crawl.

It is a very powerful tool – if you include something in the robots.txt file that is then ignored by Googlebot, you have effectively stopped those pages from being indexed. This means that even if the page was searched in Google, using the keywords the page was optimised for, even if your page was the greatest result for that particular search, it would not be shown to end users.

No one builds a website with the goal of hiding pages from the public, so you must be very careful to ensure nothing is hidden that you don’t intend to hide.

If your site has been built by a website developer without the added input of an SEO professional, you run the risk that robots.txt is not being used correctly. Developers may have used out of date SEO information to inform the file or it may be blocking important resources from your customers.

So how can you be sure that your site is using the robots.txt file the best way?

This is the best way to do a robots.txt file- with mean jokes

You can take a look at the robots.txt file included in your site – I can’t promise that it will be as interesting as the above image but it can show you information which could help you see if traffic is being blocked from robots.txt.

To access this, simply type /robots.txt after your homepage URL. Robots.txt file is stored in the root domain.

e.g.: www.yourwebsitedomain.co.uk/robots.txt

Doing this should bring you to a page that looks like the above. Now, to decipher some basics…

User-agent:  specifies which bots, e.g. Googlebot and Baiduspider.

Disallow:  recommends that bots do not crawl this area.

Allow:  allows robots to crawl.

Crawl-delay:  tells robots to wait a set amount of seconds before continuing to crawl. This is not honoured by Googlebot.

Sitemap:  shows the sitemap location.

Noindex:  tells Google to remove pages from their index.

       #: Line will not be read by bot- you can leave notes here

Each box below is titled to show what the code beneath is telling the bots. These directives will be applied when bots crawl your site.

This Allows Full Access
User Agent: *
Disallow:

We can see that User Agent is set to * which is a wildcard, meaning any bots.

Nothing is stipulated under disallow, so nothing is blocked. This grants full access to any bot.

This Blocks All Access
User-agent: *
Disallow: /

User Agent is set to * again, meaning all bots.

This time, Disallow is set to / meaning all. This tells all bots to not crawl the entire site- it will not be indexed (appear in search results) if they take this directive

This Blocks 1 Folder
User-agent: *
Disallow: /folder_name/

Here we see Disallow is set to block a folder named folder_name

This Blocks 1 File
User-agent: *
Disallow: /filename.html

Here we see Disallow is set to block a file named filename.html.

This is a very basic guide to understanding your robots.txt file, and is by no stretch all the information you will need to make changes. We recommend that you contact a professional to make any changes to the robots.txt file. If you want to find out if your robots.txt file is affecting your traffic, contact Ingenuity Digital.