Tuesday, August 21, 2012

HTTP and HTTPS in Seo - What to do?

A site having SSL (Secure Socket Layer Certificate) may face the problem of duplicate pages because of the similarity in content on both HTTP and HTTPS versions of the website.


HTTP stands for Hyper Text Transfer Protocol which is responsible for data communication. World Wide Web uses HTTP as a request response protocol (set of rules). HTTPS stands for Hyper Text Transfer Protocol secure. It is a combination of SSL+HTTP and it provides data security of the communication request handled by the HTTP.

Difference between HTTP and HTTPS is that HTTPS is secure while HTTP is not secure.

Why HTTPS is necessary?

 HTTPS is required for dynamic sites where user registration is done or for ecommerce websites where online transactions are done. Here, user data passes through HTTPS and remains secure.

Seo Problem 

Google sees similar pages when it indexes both the versions of the website, i.e. with http as well as with https. Example:- http://www.example.com https://www.example.com In the example above, both are same websites but with different protocols which makes it harder for the search engines to distinguish between the two. What to do? You can solve the problem of duplicate content issues with the help of Robots.txt and .htaccess files.

Step 1 

Create a separate robots.txt file for the secured version of your website. Name it robots_ssl.txt and store it in the root directory.

Step 2

Add the following code on the robots_ssl.txt file:-

 User-agent: *
 Disallow: /

Step 3 

Create .htaccess file in your root directory and add the following code:-

RewriteEngine on
Options +FollowSymlinks
RewriteCond %{SERVER_PORT} ^443$
RewriteRule ^robots.txt$ robots_ssl.txt

Its done! The created .htaccess file would redirect the search engine crawlers to robots_ssl.txt whenever they enter the 443 port used by HTTPS.

Logic behind the coding

When the search engine robots would visit the HTTP version of your site, they would be working on Port number 80, used by HTTP protocol. This would guide the crawlers to the robots.txt and they will proceed according to the instructions contained therein.

When the robots would visit the HTTPS version of the website, they would be working on Port Number 443 used by HTTPS protocols. This would guide the crawlers to the robots_ssl.txt with the help of the coding done on the .htaccess file. Hence, the crawlers would read robots_ssl.txt and proceed as per the instructions contained therein. The robots_ssl.txt would have the disallow command guiding the crawlers not to crawl and therefore index any of the url starting with HTTPS. 
Post a Comment