Support for

Tell search robots which of your w3shop pages to crawl

Overview

This article explains the usage of robot.txt files within your w3shop.

Within your main w3shop page you will find a 'robots.txt' panel within the SEO section. This will only display if you have your own domain set.

What is a robots.txt file?

This tells search engine crawlers (Google, bing, yahoo etc) how to visit your site. Importantly what pages or content you do NOT want indexing and therefore where crawlers should NOT visit.

This file is a very simple .txt file. As standard your w3shop robots.txt file will be setup as follows:

User-Agent: *
Disallow: /product-price-grid
Disallow: /template-details

Disallow: /basket
Disallow: /account/register
Disallow: /account/login

The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit these pages on the site.

/Product-price-grid (url may vary site by site) is the 'Product price grid' page type. For example, these pages:

/template-details (url may vary site by site) is the 'Template: details' page type. For example, these pages:

Why would I not want certain pages indexing?

Search engine crawlers have a finite amount of time to crawl your site. If a website has identical content in multiple places, this leaves less time for core pages. This identical content can cause links to be spread across two pages and force the search engine to pick one to display over the other. This shared link value will cause both pages to be weaker than if there was just one core page.

Identical content can be caused by numerous issues, for example:

  • Duplicate pages due to filters
  • www and non www versions
  • Dynamically generated pages

So using the robots.txt file tells search engines what to focus their time on. Making sure that any potentially duplicate content is avoided by search engines makes it more likely that your core 'Landing Pages' rank on Search Engine Results Pages (SERPs).

Landing pages should be the the focus entry point(s) of your website and therefore what you want to be ranked (along with you main domain).

Editing your robots.txt file
Difficulty = Easy but use with caution

You can add to, edit or remove elements from the standard robots.txt file. Simply edit the robots file text to add, change or remove the urls listed. You need a separate "Disallow" line for every URL prefix you want to exclude.

Meta 'noindex'
Difficulty = Intermediate use with caution.

CAUTION:

Only add this to specific, relevent pages. DO NOT add this to the main category 'header snippet' box.

In addition you may want to use the Meta 'noindex' - a tag added to the of a page. This tells search engines not to index that page. Ideally though, you should be using the robots.txt file because this stores everything in one central location.

However, you can choose to add the Meta 'noindex':


into the header snippet box of the page you want blocked:



 

  • noindex = Do not show this page in search results and do not show a "Cached" link in search results.
     
  • nofollow = Do not follow the links on this page

But, my site has already been indexed!

If your site has already been indexed and you'd like aspects of you site not to have been, you need to remove any 'disallow' content from your robots.txt file. Then add the Meta 'noindex' to the pages you want de-indexing.

This allows search engines to crawl those pages and see the "noindex, nofollow" tags. Once google has crawled your site again, and de-indexed the relevant pages you can add the disallow content back to the robots.txt file.

 

See also

Meta Data for your w3shop

Landing Pages

Jump to contents page of

Setup Tracking and Meta Data for your w3shop

 
 
 

All content is (c) Nettl Systems Limited, 2024 and may not be used, copied or distributed without permission.