Interactive: robots.txt

Level: Advanced

Interactive: robots.txt

Distilled

Get alternatives

Need more information? Get more details on the site of the provider.

Get alternatives

Starting dates and places

There are no known starting dates for this product.

View related products with starting dates: Search engine optimisation (SEO).

Description

There are a variety of ways to control the behavior of search engine crawlers. You canlearn more about the alternatives in our technical SEO module. Robots.txt is a plain-text file found in the root of a domain (e.g. www.example.com/robots.txt). It is a widely-acknowledged standard and allows webmasters to control all kinds of automated consumption of their site - not only by search engines.

In addition to reading about the protocol, robots.txt is one of the more accessible areas of SEO since you can access any site's robots.txt. Once you have completed this module, you will find value in making sure you understand the robots.txt files of some large sites (for example Google and Amazon).

W…

Read the complete description

Frequently asked questions

There are no frequently asked questions yet. If you have any more questions or need help, contact our customer service.

Get alternatives

Didn't find what you were looking for? See also: Search engine optimisation (SEO), Web Accessibility, Digital Publishing & Editing, Web Analytics, and (X)HTML & CSS.

What you will learn in this module:

How to block all robots from certain areas of your site
How to restrict your robots.txt instructions to apply only to certain robots
How to override exclusion directives to allow access to certain areas of your site despite exclusion rules
Use wildcards to apply your rules to whole swathes of your site
Other robots.txt syntax such as sitemap file directives

The most common use-case for robots.txt is to block robots from accessing specific pages. The simplest version applies the rule to all robots with a line saying User-agent: *. Subsequent lines contain specific exclusions that work cumulatively, so the code below blocks robots from accessing /secret.html.

Add another rule to block access to /secret2.html in addition to /secret.html.

Get alternatives

There are no reviews yet.

View related products with reviews: Search engine optimisation (SEO).

Share your review

Do you have experience with this course? Submit your review and help other people make the right choice. As a thank you for your effort we will donate $1.- to Stichting Edukans.

There are no frequently asked questions yet. If you have any more questions or need help, contact our customer service.