Cookie

Exclude files and directories from indexing using the file “robots.txt”

Published by TheJoe on

Estimated reading time: < 1 minute

Caution


This article was published more than a year ago, there may have been developments.
Please take this into account.

Exist in the network of standards of behavior for crawler (the offered, or even spider) for’Content indexing. I am not referring to the file “.htaccess“, that is used to configure the webserver, I'm talking about the file “robots.txt“.

The file “robots.txt” is one of configuration file simple that there are, and unlike “.htaccess” should be placed uniquely only in directory radice Site. This file communicates to the search engines that index our site indexing or less determined file the directory, and the operation is very simple:

campo : valore

You can only enter two types of fields: “User-agent” and “Allow / Disallow“.

User-Agent

With the field “User-Agent” specify a search engine accurate. Just a small Search the Internet, or a access monitoring over time, to realize the the major search engines that access the site. Usually the requests to the file “robots.txt” are carried only by the search engines, and in any case the user agent are immediately recognizable.

Allow / Disallow

With the value “Allow” the “Disallow” is declared access permit the site to the search engine that uses the user agent specified in the “User-Agent“. As an example we may want to exclude the directoryimages” indexing of “Googlebot-image“, especially if the images that we leave on the server we want to sell them with licenses different from CreativeCommons.

Let me clarify a bit’ Ideas with a fine example:

Look here:  He was born BlueGriffon, an HTML editor based on the rendering engine of Firefox

User-Agent : *
Disallow: /wp-

In this case, we stated that the crawlers that occur with any user agent do not access directories that begin with “wp-“, those dedicated to the administration of WordPress. Simple, not?


TheJoe

I keep this blog as a hobby by 2009. I am passionate about graphic, technology, software Open Source. Among my articles will be easy to find music, and some personal thoughts, but I prefer the direct line of the blog mainly to technology. For more information contact me.

2 Comments

TheJoe · 5 July 2010 at 4:36 PM

You will also affect the articles of “.htaccess” then! 😀

https://thejoe.it/wordpress/?s=htaccess

computer courses · 5 July 2010 at 3:35 PM

ottimo tip 🙂

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.