What is a robots.txt file and how to create a perfect robots.txt file
A robots.txt file is a small text file that resides in your site’s root folder. This search engine tells Bots which part of the site to crawl and index and which part not to.
If you make any slight mistake while editing/customizing it, the search engine Bots will stop crawling and indexing your site and your site will not be visible in the search results.
In this article, I will tell you what a Robots.txt file is and how to create a Perfect Robots.txt file for SEO.
Why is robots.txt website required
When search engine Bots come to websites and blogs, they follow the robots file and crawl the content. But your site will not have a Robots.txt file, so the search engine Bots will start indexing and crawl all the content of your website which you do not want to index.
Search engine Bots search the robots file before indexing any website. When they do not get any Instructions by Robots.txt file, then they start indexing all the contents of the website. And if you get any Instructions, then follow that and index the website.
Hence Robots.txt file is required for these reasons. If we do not give instructions to search engine Bots through this file, then they index our entire site. Also, you index some data that you did not want to index.
Advantages of robots.txt file
The search engine tells Bots which part of the site to crawl and index and which not to.
A particular file, folder, image, pdf, etc. can be prevented from being indexed in the search engine.
Sometimes search engine crawlers crawl your site like a hungry lion, which affects your site performance. But you can get rid of this problem by adding crawl-delay to your robots file. However, Googlebot does not obey this command. But you can set the Crawl rate in Google Search Console. This protects your server from being overloaded.
You can private the entire section of any website.
Internal search results can prevent pages from appearing in SERPs.
You can improve your website SEO by blocking low-quality pages.
Where does the Robots.txt file reside on the website
If you are a WordPress user, it resides in your site’s root folder. If this file is not found in this location, the search engine bot starts indexing your entire website. Because the search engines do not search your entire website for the bot Robots.txt file.
If you do not know if your site has a robots.txt file? So in the search engine address bar all you have to do is type it – example.com/robots.txt
A text page will open in front of you as you can see in the screenshot.
This is the robots.txt file of InHindiHelp. If you do not see any such txt page, then you have to create a robots.txt file for your site.
Apart from this, you can check it by going to Google Search Console tools.
Robots.txt File Basic Format
The basic format of the robots.txt file is very simple and looks like this,
User-agent: [user-agent name]
Disallow: [URL or page you don’t want to crawl]
These two commands are considered a complete robots.txt file. However, a robots file can contain multiple commands of user agents and directives (disallows, allows, crawl-delays, etc.).
User-agent: Search Engines are Crawlers / Bots. If you want to give the same instruction to all search engine bots, use the * sign after user-agent. Like – User-agent: *
Disallow: This prevents files and directories from being indexed.
Allow: This search engine allows bots to crawl and index your content.
Crawl-delay: How many seconds the bots have to wait before loading and crawling the page content.
Preventing all Web Crawlers from indexing websites
Using this command in the robots.txt file can prevent all web crawlers/bots from crawling the website.
All Web Crawlers Allowed to Index All Content
This command in the robots.txt file allows all search engine bots to crawl all the pages of your site.
Blocking a Specific Folder for Specific Web Crawlers
Disallow: / example-subfolder /
This command only prevents Google crawler from crawling example-subfolder. But if you want to block all Crawlers, then your Robots file will be like this.
Disallow: / example-subfolder /
Preventing a Specific Page (Thank You Page) from being indexed
Disallow: / page URL (Thank You Page)
This will prevent all crawlers from crawling your page URL. But if you want to block Specific Crawlers, then you write it like this.
Disallow: / page URL
This command will only prevent Bingbot from crawling your page URL.
Add a Sitemap to a robots.txt file