Search engine bots are quite thorough when indexing sites. If you're using Vine Type for a blog, as I do, you'll discover that the search engines may be indexing more than you would expect and perhaps more than you want. Google was indexing all the RSS and ATOM feeds on my site. Not just the main "all articles" feed, but the individual article feeds and the archive pages (all articles from November 2007 for instance.) I took a good look at all the links in my site and decided that there were several links I did not want to be indexed -- and I updated my robots.txt file accordingly. Here's an excerpt from my blog's robots.txt file
Disallow: /default.aspx?feed=atom
Disallow: /default.aspx?feed=rss
Disallow: /default.aspx?archive=
Disallow: /default.aspx?img=
The first two are the RSS and ATOM feeds. The third one is the archive pages. The fourth one are the images generated via ViPR. And if you're wondering why I didn't specify
Disallow: /default.aspx?feed=
...it's because Google refused to accept my sitemap whose format is
default.aspx?feed=googlesitemap
I thought I might share this lesson learned.
Sincerely,
Carl
-----
vine type - content management with standards in mind -
vinetype.com-----