LemonStand Forum: Robots.txt info - LemonStand Forum

Jump to content

Page 1 of 1
  • You cannot start a new topic
  • You cannot reply to this topic

Robots.txt info

#1 User is offline   Jim 

  • Member
  • PipPipPip
  • Group: Members
  • Posts: 56
  • Joined: 12-April 10

Posted 07 July 2010 - 01:49 AM

Just as an aside from my usual 'how do you do this topics' I thought I'd post something useful.

I don't think lemonstand generates a robots.txt file automatically, and recently noticed that google.co.uk was indexing https: copies of various pages on my site, this obviously counts as duplicate content which is frowned upon. I didn't notice it with google.com for some reason.

Anyway, the reason is that I haven't got my secure pages running on a separate server/sub-domain.

The solution is to create a .htaccess command which tells anything using port 443 [SSL] to look at another robots.txt file.

So create robots_ssl.txt

User-agent: *
Disallow: /


and add the following to your .htaccess

RewriteEngine on
Options +FollowSymlinks
RewriteCond %{SERVER_PORT} ^443$
RewriteRule ^robots.txt$ robots_ssl.txt



Aleksey, is there anything that we should be specifically disallowing within a robots.txt?

This post has been edited by Jim: 07 July 2010 - 01:50 AM

0

#2 User is offline   Aleksey 

  • Co-Founder
  • Group: +Administrators
  • Posts: 3,633
  • Joined: 31-October 09

Posted 07 July 2010 - 02:37 PM

Hi, Jim!

Thank you, that is a very valuable information. We will add the default robots.txt and robots_ssl.txt files to the installation and will add the rules to the .htaccess file for hiding HTTPS pages from search engines.

Quote

Aleksey, is there anything that we should be specifically disallowing within a robots.txt?


It depends on which pages you implemented on your website. You are free to delete all pages from the default template and add any other pages. So the pages which should be hidden from search engines are different on different stores.

Thanks again for sharing

#3 User is offline   Dangermouse 

  • Member
  • Group: Members
  • Posts: 6
  • Joined: 27-August 10

Posted 27 August 2010 - 03:59 PM

Hello,

Sorry to resurrect this thread but I thought it useful to keep all the information on the topic together.

Whilst blocking pages from the engines with robots.txt is helpful I'd suggest that a better method would be to 301 redirect the HTTPS page to the HTTP version where SSL is not required? Is this possible? Ideally you shouldn't be able to access the same content at multiple URLs.

As an alternative, is it possible to manipulate the HTTP headers sent, or the <head> HTML on the basis of protocol?

Cheers,

Steve
0

#4 User is offline   Aleksey 

  • Co-Founder
  • Group: +Administrators
  • Posts: 3,633
  • Joined: 31-October 09

Posted 27 August 2010 - 04:30 PM

Hi, Steve!

LemonStand page security settings allow you to select the "HTTP only" or "HTTPS only" option, so it will not be possible to access a single page by different protocols.

I will implement 301 redirect method for the HTTPS->HTTP and HTTP->HTTPS redirections. I will let you know when we finish.

If you need to send specific headers, you can use the Pre Action Code page field for calling the PHP header() function.

Thank you

#5 User is offline   Dangermouse 

  • Member
  • Group: Members
  • Posts: 6
  • Joined: 27-August 10

Posted 27 August 2010 - 04:33 PM

Great stuff - thanks for the quick response Aleksey, and apologies for posting before I've properly investigated Lemonstand. I'm really just scoping it out at this stage.

Just to clarrify the idea behind being able to manipulate headers is to make use of the x-robots header.

Steve
0

#6 User is offline   Aleksey 

  • Co-Founder
  • Group: +Administrators
  • Posts: 3,633
  • Joined: 31-October 09

Posted 07 September 2010 - 04:43 PM

Hi, Steve!

HTTP -> HTTPS and HTTPS -> HTTP redirections now send the HTTP 301 header.

Thank you

Share this topic:


Page 1 of 1

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users