Thursday, June 2, 2011

Why We Optimize Robots.txt

What is Robot.txt

The simple way to check presence of robots.txt file in a site is by typing /robots.txt after the URL. This will show the file for robots protocol for that particular site.

Whenever a crawler visit any website it first looks for the robots.txt file in that site. Which tells the robot that which pages it should not visit on the site. Point to be remember is that robots.txt file is just an instruction to the web robots, robots can ignore this.
How to Add robots.txt

For example
User-agent: * // You can also mention name of particular robot like Googlebot .
Disallow: /

In the above example "User-agent: *" means that this section applies to all robots and "Disallow: /" tells the robot that it should not visit any pages on the site. In robots.txt we don't mention "allow" instruction as it works as default.

How to Avoid Sub Page/Directories

In same way we can also exclude directories and particular file which we want that robots should not crawl.

For example
User-agent: *
Disallow: /tmp/

The easy way to disallow different files is to put all these files into a separate directory, and then disallow this special directory or you can explicitly disallow all disallowed pages like:

For example
Disallow: /spam.html
Disallow: /code.html and so on.

The other way to exclude a particular URL is to put meta tag "robots" in coding of that particular page.

For-example
// if you don't want to crawl your page.
OR .

We can also use wild cards in robots.txt. ' * ' is used when we want to give all URLs that contains that specific symbol. And '$' is used to specify matching the end of the URL.

Lets clear this with help of an example. In a dynamic website its very important to avoid crawling of some URLs that contains duplicate content like URLs created in search result.

How To Avoid Search Quarried

Like http://www.vmoptions.com/?search=hertz.co.uk and http://www.vmoptions.com/?search=travel-guard.co.uk are URLs dynamically generated in search result. The main cause of the problem is "?search". Now we can include a wildcard that blocks all urls containing this term in our robots.txt
file.

For example :
Disallow: /*?search= (will not crawl any url containing this term anywhere in the url). In same way to block an URLs that ends with .gif , you could use the following entry:
Disallow: /*.gif$.

While working with the wildcards in generating queries keep in mind that Robots.txt is case sensitive.

Google and +1 web button

Google introduces +1 web button today.

Similar to the Facebook 'like' button, the +1 button lets people recommend websites to those in their social circle.
Just a single click you can recommend or share that site, link or your favourite video to your friends and in all over world. The next time when your connections search, they could see your Fav. +1s directly in their search results, helping them find your recommendations when they're most useful.

The +1 button makes it easy for visitors to recommend your pages to friends and contacts exactly when their advice is most useful -- on Google search. As a result, you could get more and better qualified site traffic.

You'll need to add a small snippet of code on the pages where you want a +1 button to appear.

To stay current on updates to the +1 button large and small, please sign up for the Google Publisher Button Announce Group.

Just Copy and paste the following code into your site:






Add +1 Button in Your Word Press Website

For users who have a standard XHTML website (or are using Tumblr), copying and pasting before the tag on a website works just fine. WordPress users, however, will need to add that snippet to their theme files.

Go to the Appearances tab in the WP dashboard and select “editor.” Then find the footer.php file in your template listing. Scan through the file until you see the area marked , then paste in the JavaScript line. CLICK on update.

Add a Button to Your Sidebar

Drag a new “Text” widget to the sidebar location of your choice. You can add a header if you want, or you can leave it blank. In the text portion, paste the button configuration you want using the +1 button page. The standard code is

You can choose how you want the button to align itself using HTML or referring to CSS classes from your stylesheet.




Thanks for your interest! Let's try with your website.