Got more questions? Find advice on: SQL | XML | Regular Expressions | Windows
Welcome to AspAdvice Sign in | Join | Help

Don't neglect your robots.txt file

  •  11-27-2007, 7:42 PM

    Don't neglect your robots.txt file

    Search engine bots are quite thorough when indexing sites.  If you're using Vine Type for a blog, as I do, you'll discover that the search engines may be indexing more than you would expect and perhaps more than you want.  Google was indexing all the RSS and ATOM feeds on my site.  Not just the main "all articles" feed, but the individual article feeds and the archive pages (all articles from November 2007 for instance.)  I took a good look at all the links in my site and decided that there were several links I did not want to be indexed -- and I updated my robots.txt file accordingly.  Here's an excerpt from my blog's robots.txt file


    Disallow: /default.aspx?feed=atom Disallow: /default.aspx?feed=rss Disallow: /default.aspx?archive= Disallow: /default.aspx?img=

    The first two are the RSS and ATOM feeds. The  third one is the archive pages. The fourth one are the images generated via ViPR. And if you're wondering why I didn't specify


    Disallow: /default.aspx?feed=

     ...it's because Google refused to accept my sitemap whose format is


    default.aspx?feed=googlesitemap

    I thought I  might share this lesson learned.


    Sincerely,

    Carl
    -----
    vine type - content management with standards in mind - vinetype.com
    -----
    Filed under: , ,
View Complete Thread