Cookie

TheJoe.it Into the (open) source

24Feb/104

Use the file “.htaccess” to block the access

googlebot1.jpg For anyone who happened to monitor the access of a site will ever come across very similar to each other access, often it is crawler (Google itself uses a sophisticated type of crawler called GoogleBot). Paraphrasing Wikipedia:

One crawler (also called spider the robot), is software that analyzes the content of a network (or a database) in a methodical, automated, typically for account of a search engine.

Crawlers are usually harmless, create a no traffic within the site (if done well) to offer Indexing Service that we all know and appreciate. [more]

However there are crawlers that, using the same mechanisms of the crawler indexing, scanning the web looking for flaws in the code pages. As we know webmasters are not always careful in planning, and sometimes we are aware of some security breaches Site (The portal). These harmful crawlers explore the length and breadth of the web, indexing the pages for themselves in order to "pierce"The site, and have access the server or worse sensitive data.

Also, in addition to not make us a good indexing service, increase the use of the band, forcing the browser to visit the site more slowly. Well ... over insult to injury.

However, it is possible via a simple text file block access to certain IPs or "user agent" once identified. I'm talking about the file. "Htaccess".

The file ". Htaccess" is a useful configuration file to the server, a very simple tool, but equally powerful, and can not be used lightly. An error in the configuration file may inhibit the webmaster access to their pages, for which Andiamoci cautious.

The safest way to know if the "user agent"Who has made a recent visit to the site is harmful crawler is a search on Google. Let him separately "user agent” e l’IP address from which we received the request.

Locking the bot through. "Htaccess"

This example, and all subsequent, can be added at the bottom of the file ".htaccess”, always it has been created. If it does not already exist you can create it: a simple text file that we will call ". htaccess" will put in "root directory"The server.

#Let's get rid of the bot %{HTTP_USER_AGENT} ^BadBot
RewriteRule ^(.*)$ http://go.away/

What does this piece of code? Simple. The few lines above tell the server to control all access whose "user agent” beginning with "BadBot”. When it finds one that coincides redirects to an address that does not exist called "http://go.away/”.

Now let's see how to block more than one:

#Let's get rid of the bots %{HTTP_USER_AGENT} ^ BadBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^ EvilScaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^FakeUser
RewriteRule ^(.*)$ http://go.away/

The code above does exactly the same things that made the first, in this case blocks all "user agent"That begin with"BadBot”, “EvilScraper”, “Fake User”. Note that when there is more than one bot to lock you need to put "[OR]"At the end of the line of all the rules, except that the last.

Locking the "thieves" bandwidth

Anyone sailing usually does not know, but it often happens that to be lighter on your own server (or simple ignorance) some webmasters include images of residents elsewhere in your pages. This lightens the load on the server that does not have the burden of hosting the image, but weights traffic on the server where the image resides, not to mention that the second server has no advertising from the work done.

Since we can not afford to change in sequence the images on our website, also in this case ".htaccess"There is help.

RewriteEngine on
RewriteCond %{HTTP_REFERER} ^http://.*somebadforum\.com [NC]
RewriteRule .* - [F]

In this way "somebadforum.com"Will be redirected to a code403 Forbidden"For each image included. The end result will be the classic symbol of missing image (broken image), e nostro server-side saves SARÀ.

To lock more than one site this is the code:

RewriteEngine on
RewriteCond %{HTTP_REFERER} ^http://.*somebadforum\.com [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*example1\.com [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://.*example2\.com [NC]
RewriteRule .* - [F]

As above, note on the finish of each line "OR”, except for the last.

Ban the IP

It can also happen that you do clever bot, and to change their rotation "user agent”, in order to have continuous access to the pages of the site. When this phenomenon happens one way to block access to the bot "imaginative" is block IP (lock only if there is a continuous access from the same IP). Also in our trusty ".htaccess"Add the following lines:

order allow,deny
deny from 192.168.44.201
deny from 1.2.3.4
deny from 5.6.7.8
allow from all

In this example we block three IP addresses, with the last line guarantee access to all other. But it is also possible to block access to the root of the address (this is. 192.168.*):

order allow,deny
deny from 192.168.
deny from 100.29.
deny from 500.699.
allow from all

With these rules, all IP addresses that begin with "192.168.” (and the following) will be blocked.

Suggest always to create a backup of ".htaccess”, things do not always go the way we want, and would not be useful to anyone not having access to their server. The most common hoster offer support to file ".htaccess”, in case your hoster not offer such support believe is the case of cambiare hoster.

About

I keep this blog as a hobby by 2009. I am passionate about graphic, technology, software Open Source. Among my articles will be easy to find music, and some personal thoughts, but I prefer the direct line of the blog mainly to technology. For more information contact me.

Filed under: SEO, Web Leave a comment
Comments (4) Trackbacks (0)
  1. sure, and we can instead deny the access always using the dns.. htaccess is a big resource..

    and of course thanks for the comment!

  2. We can also use dns instead of ip for the allow in the htaccess.

  3. Ciao LU,
    thanks for the feedback, in fact I did not specify the type of server.. I assumed it was being used apache (the most used). Then usually the service provider hosting on apache server also provides support to “.htaccess” (I speak of the major Italian hoster: aruba, register and company).

    I'm glad that my comments interesting articles, but in the night you have nothing better to do than read my articles on htaccess?? haha!!

  4. Interesting series of articles. Remember to specify that all this is true if the web server is Apache (although most of the hosting services provides just Apache) besides the fact that obviously is enabled (an authorized) file management “.htaccess”.
    Another interesting detail, there is the possibility of giving a name other than “.htaccess” files for specific configurations, although usually in hosting services can not be a configuration of grain so fine.

    😀


Leave a comment

No trackbacks yet.