Robots.txt

What is a robots.txt file of a website?

Robots.txt is a text file created by webmasters to guide web robots (usually search engine robots) on how to crawl pages on their domain. A robots.txt file notifies search engine crawlers which URLs on your site they can access. This is mostly intended to prevent requests from overwhelming your site; it is not a strategy for keeping a web page out of Google. 

What is the purpose of a robots.txt file?

A robots.txt file is typically used to keep a file off Google and to govern crawler traffic to your site. A robots.txt file manages web crawler activity so that they do not overburden your website or index pages that are not intended for public viewing. 

Why is Robots.txt important in SEO?

Here are a few reasons that outline the importance of SEO:

1. Increase Crawl Budget

The crawl budget is the number of pages that Google will crawl on your site in a certain time frame. The amount can vary depending on the size, health, and quantity of backlinks of your site. 

By using robots.txt to block insignificant pages, Googlebot (Google’s web crawler) may spend more crawl budget on pages that matter. 

2. Disallow Non-Public Pages

You may have pages on your website that you do not want to be indexed. For instance, you may have a staging version of a page or a login page. These pages must exist. However, you want to avoid random strangers landing on them. In this situation, you’d use robots.txt to prevent search engine crawlers and bots from accessing certain pages.

3. Prevents Unnecessary File Indexing

Implementing proper indexing directives and leveraging the robots.txt file is one effective way to prevent the indexing of unwanted files on your website, such as photos, videos, and PDFs. These measures assist search engines in determining which files should not appear in search results. 

How to Create a Robots.txt File

A robots.txt file lives at the root of your site. So, for the site www.abc.com, the robots.txt file lives at www.abc.com/robots.txt. Here are the steps to follow while creating a robots.txt file:

Step 1: Make a robots.txt file

A robots.txt file can be created in nearly any text editor. For example, Notepad, TextEdit, vi, and emacs may generate legitimate robots.txt files. Use a spreadsheet instead of a word processor; word processors frequently store files in a proprietary format and can add unusual characters, such as curly quotes, which can cause crawlers difficulties. If requested during the save file dialog, save the file with UTF-8 encoding.

Step 2: Save the robots.txt file to your computer

You’re ready to make your robots.txt file visible to search engine crawlers now that you’ve saved it to your computer. A tool can only assist you with this since how you upload the robots.txt file to your site varies depending on your site’s and server’s design.

After you’ve uploaded the robots.txt file, check if it’s public and if Google can parse it.

Step 3: Validate the robots.txt file

To see if your freshly uploaded robots.txt file is publically available, launch a private browsing window (or equivalent) in your browser and navigate to the robots.txt file’s location. For instance, see https://example.com/robots.txt. You’re ready to test the markup if you see the contents of your robots.txt file.

Step 4: Upload the robots.txt file to Google

Once you’ve submitted and tested your robots.txt file, Google’s crawlers will automatically find and use it. You are not required to do anything. Learn how to submit an updated robots.txt file if you updated your robots.txt file and need to refresh Google’s cached copy as soon as feasible.

You can learn more about the rules to follow while creating your robots.txt file on this guide offered by Google.

Related SEO glossary terms
301 Redirects Guest Blogging
302-redirect H1 tags
404-page Impressions Ranking Positions
Alt tag Indexing
Anchor text Keyword Clustering
Backlinks Keyword Difficulty
BERT Local SEO
Black hat SEO Meta Description
Bounce Rate Meta Tags
Breadcrumb Navigation No follow Link
Canonical Tag Offpage SEO
Content Hub On Page SEO
Core algorithm updates Orphan Pages
Core Algorithm Updates Page Title
Core Web Vitals PageRank
Crawl Budget Robots.txt
CTR Schema Markup
Do Follow Link Search Engine
Domain rating Search intent
Duplicate page Search volume
EEAT SEO
External Links SERP
Google Knowledge Graph Sitemap
Google Knowledge Panel Technical SEO
Google Search Console Topic Authority
Google Search Console URL Canonicalization
Google Webmaster Guidelines Web crawler
  Website traffic