Welcome to AspAdvice Sign in | Join | Help

.Net Discoveries

An attempt to pass along some answers I have discovered in my .Net coding.
RSS Enable Your Website Part 1 - Creating the Feed

Prologue

Not long ago, I discovered how cool RSS can be. I was having trouble with my Peerflix account, I had movies that were in low deman, and I wasn't ever the first to get the notification to send the movie on, and therefore, I missed out on being the one to send it. Then I found that Peerflix has an RSS feed especially for movies that I need to send out. So I did some searching and found a good RSS Reader and started pulling the information on a 15 min. basis. Hasn't completely solved the problem, but I've discovered that RSS is very cool. Now I sign up for a number of blogs and newsletters that I can pull and read at my leisure without having to clog my email inbox with the stuff. Anyway, decidedly cool stuff.

So I had the idea, I was looking for ways to go above and beyond my assigned duties at work (ok, it gets me a better performance review at the end of the year, not exactly altruistic), but I was looking. We have a few areas on our company website where we could harness the power of RSS. So I undertook to add RSS feeds to our website.

If you don't know much about RSS and would like to know more, you can get more details from Wikipedia. I won't go into much of an explanation of what it is here.

Editors note: Just as I was proofreading before I posted this, I found the following link that may have other valuable information about creating an RSS Feed using an RSS object that can be downloaded, it even has a few extra features we don't create here: RSS in .NET Made Easy with XML Serialization.

Problem

Our website has a number of pages that have content that is updated somewhat regularly, and have the kind of content that people may want to be alerted about when new updates happen (i.e. press releases, RFP's, Job Postings etc.). For each of these items, when the page is viewed, the server looks in a folder, and generates the list of items (press releases for example) then displays the page. The departments only have to upload a new item into the folder, and it will appear automatically on the website. To create an RSS feed, I had a couple design considerations for how implement:

1. Do I create an RSS page for each of the feeds that I want to create (we have 7), or do I try to do it with one page?
2. Do I try to create the RSS when the feed is requested (dynamic), or do I just update an XML document each time an item is uploaded?
3. How many of our existing webpages would have to be changed and how much change would be involved?

For me, the answers were pretty easy, I wanted to create 1 page and have it serve all the RSS feeds if possible, I would use a querystring to add a RssFeedName property that will specify which feed has been requested. This means that any code only needs changed once and it will effect all the different feeds. Also, I decided that if the user entered an incorrect feed value for the RssFeedName, or if it was left out entirely, it would go to a page listing all the RSS feeds that we have available. I didn't know if this was possible, because it would require returning two different types of pages from basically the same page, without the querystring HTMLwould be returned, and with a valid querystring an XML document would be returned (you can do it by the way).

The best scenario in my mind for updating the feed was to have it be automatically generated when the user requests the feed. This means that the page would need to figure out what feed to send back, read the folder parse the results and generate the properly formatted RSS feed. Users can upload to the folder via any means they like and the feed will always be up to date.

Realistically, not many of our pages needed to be changed. I would add one page to the website, RSS.aspx (ok, actually two, I'll talk about it in my next post, an .xsl stylesheet) and then add the little RSS icon to a few pages to indicate that it has an RSS feed. Really the only thing that changed on more than just the RSS.aspx page, was the icon with the link to the correct RSS feed.

Solution

Ok, so let's get setup, open a new blank website, and let's get started. Create two new folders in your website, and name one PR, and the other Jobs. Within the PR folder, add 3 HTML pages, and in the Jobs folder, add 2 HTML pages. Name the files in the PR folder, PR1.htm, PR2.htm, etc. and same for the jobs, Job1.htm, Job2.htm. Also put a little bit into each page so that you can tell which page is being displayed (for later), I just added the filename to the body section of each page. Now, add an XML file to the project and name it RssSample.xml, we'll load it up with the XML of an RSS feed so we can check out what it looks like. OK, one last page to create, at the top level, create a new Web Form, and name it RSS.aspx. This will be the file that we concentrate on for our coding, but before we code, let's do just a little bit more setup.

In the design view of the RSS.aspx page, create two hyperlinks. These will link to our RSS feeds if the user is directed to the RSS page without a valid querystring. One link should be to the PR feed, the other to the Jobs feed. These links will include our querystrings that point to our feeds. Your HTML code should resemble:

<a href=rss.aspx?RSSFeedName=PR>Press releases</a><br /><br />
<a href=rss.aspx?RSSFeedName=Jobs>Jobs</a>

Ok, so now you should have two links on your page and nothing more you could go ahead and create a nice page, format it like all your others ect., we just won't. Now let's get started with the code. Go ahead and open your code behind page and here we go.

Now the first thing we're going to do when the page loads is determine if the page has a valid querystring and decide our course of action. If it is not valid or is blank, then we want to do nothing, just let the HTML that we created in design view show. If it has a valid querystring, we want to process the correct folder and generate our RSS feed. So go to your page load event, and let's put in some code to examine the querystring. I'm going to do a little bit of abstracting things into subroutines so that it is easier to read and work with as we go, you don't need to if you don't want. Add the following code to your Page_Load event handler.

Dim sFeedName As String = Request.QueryString("RSSFeedName")
If sFeedName <> "" Then

End If

There's our logic for checking the querystring. We create a variable and populate it with the RSSFeedName querystring. If it doesn't exist, then it will just run the HTML and skip all our other code, if it has a querystring (even an invalid one) it will begin processing, later we'll check the sting for validity, and if it isn't, then redirect the page back to the RSS.aspx page without a querystring.

Ok, now assuming that we've got a querystring, we want to clear all the HTML out of the existing page that is being returned and create an XML document that will be returned it its place. We'll be working with XML, so we need to add an imports statement to the top of our document, so that we can use some of the XML objects that are built in. Add the following line at the very top of your code behind document:

Imports System.Xml

Now we're ready to start creating the feed. If you aren't familiar with the RSS standard, there is a good article about the specification here. You can extend past what I am going to do here, there are RSS elements that I'm going to leave out. If are not familiar with XML, you'll want to get a bit of a handle on it first. Basically, RSS is nothing more than XML with a defined set of XML elements each with a defined set of XML children. Once you've got which ones are required, it isn't that hard to do. Go find an RSS feed, see, XML! What it's not? OK, do a view source. You'll see that it is in fact, just an XML document, some may have a style sheet applied. I'll talk more about stylesheets in a subsequent post. With that XML up, select all, and copy it, clear all text in your RssSample.xml file and past this in. Now, we can see the elements that we'll need to create in our XML document so let's get crackin'. First off, we need to dump all the HTML stuff that was going to get posted to the user, we do that by clearing the Response object, within the If statement in your Page_Load event, we'll start adding code, start by adding:

Response.Clear()
Response.Charset = "utf-8"
Response.ContentType = "txt/xml"

This will flush the buffer of the HTML going to the user, and then set the content to be XML, that way the browser knows how to interpret it when it receives the content. Now we'll need an XMLTextWriter Object, so that we can create an XML document, add the following:

Dim objXML As New XmlTextWriter(Response.OutputStream, Encoding.UTF8)

This creates an XMLTextWriter object, that we'll use to compose our XML document. We initialize it using the Response to the browser as the output stream and give it an encoding of UTF8 (don't ask me what that means, I couldn't tell ya... I think somthing about the character set to use...) but nevertheless, this instantiates our XML creation instrument and makes it compatible with RSS 2.0. Go back to the XML for your RSS feed that you looked at a little bit ago. notice the first element is an XML element, and specifies the UTF-8 as the encoding. Interestingly enough, we don't have to add that element, Response does. Next, we'll start loading it up the objXML with our XML, scroll on through your RssSample XML file and you'll see an <RSS> element, this starts the actual feed. You may notice an <xml-stylesheet> element previously, but we're going to ignore for the moment. Now, we haven't actually started our document yet, so let's start it:

objXML.WriteStartDocument()

objXML.WriteEndDocument()

You'll notice that we have a start AND and end to a lot of the elements we will create, with the XML text writer, some elements don't require a start AND an end, we can do it all at once, but since we want to start the document and add sub elements, we don't want to close it yet. We'll use a start and an end with elements that have children.  So let's create an RSS element and add the 2.0 version attribute so between your StartDocument() and EndDocument() lines add:

objXML.WriteStartElement("rss")
objXML.WriteAttributeString("version", "2.0")

objXML.WriteEndElement()

Now, you'll notice that there's a Start RSS element, and I added an End RSS element, we need to make sure that all our elements are closed or they won't validate and they won't work for the user. I tend to pair them up so they're easy to remember, but it's up to you. Unfortunately, Visual Studio won't allow you to indent to keep them straight, so you'll sometimes stack up a number of EndElements at the end of your code, and it's hard to tell which is which, but never mind. Now, back to our RssSample, notice the next element is <channel> so between our WriteAttributeString() and the WriteEndElement(), let's add our channel element:

objXML.WriteStartElement("channel")

objXML.WriteEndElement()

Ok, now inside this channel element is where we put the meat of the RSS feed, but each feed will have different content here so this is where we'll abstract just a little bit, add a call to a subroutine that we haven't defined yet called CreateRssBody and put it between the Start and End elements for the channel, well pass it the feed name to return, and the XMLTextObject so we can continue composing our document:

CreateRssBody(sFeedName, objXML)

Now before we leave the Page_Load event, let's wrap it up, then we'll concentrate on the CreateRssBody subroutine. We have the basic construct for an Rss document, but we haven't sent it to the browser yet. To do that, add the following AFTER you close the document (WriteEndDocument()):

objXML.Flush()
objXML.Close()
Response.End()

This will flush the output stream to the browser, then close the XmlTextWriter to allow it to be cleaned up and then closes the response object which will close the output to the browser so it can't be added to. Our completed Page_Load should look like this:

Protected Sub Page_Load(ByVal sender As Object, _
   ByVal e As System.EventArgs) Handles Me.Load
   Dim sFeedName As String = Request.QueryString("RSSFeedName")
   If sFeedName <> "" Then
      Response.Clear()
      Response.Charset = "utf-8"
      Response.ContentType = "txt/xml"

      Dim objXML As New XmlTextWriter(Response.OutputStream, _ 
         Encoding.UTF8)

      objXML.WriteStartDocument()
      objXML.WriteStartElement("rss")
      objXML.WriteAttributeString("version", "2.0")
      objXML.WriteStartElement("channel")

      CreateRssBody(sFeedName, objXML)

      objXML.WriteEndElement()
      objXML.WriteEndElement()
      objXML.WriteEndDocument()

      objXML.Flush()
      objXML.Close()
      Response.End()
   End If
End Sub

Test it if you want, run your RSS.aspx page, then click on one of your links, you'll get a very short, but correct, XML document:

<?xml version="1.0" encoding="utf-8" ?>
   <rss version="2.0">
      <channel />
   </rss>

Now let's concentrate on the CreateRssBody subroutine. Go ahead and define the subroutine:

Private Sub CreateRssBody(ByVal vsFeedToCreate As String, _
    ByRef objXML As XmlTextWriter)

End Sub

We are recieving 2 parameters, one to determine which RSS feed to generate, and the other so that we can keep adding to our XML document. Let's go back to our RssSample.xml page and see what needs created next. The channel has it's own Title, Link, Description, Copyright info, and TTL (or time to live) elements that we'll want to add. Some of these elements will be the same no matter which feed, but some of them will be different for each feed, such as the title. What we'll do is implement a select case and then load up the stuff that different for each feed and then put it all together at the end. Inside our CreateRssBody() create a select case and add a case for each of our feeds. Also add a case else for an invalid querystring value:

Select Case vsFeedToCreate
   Case "PR"
   Case "Jobs"
   Case Else
End Select

Ok, now let's take care of the else first and get it out of the way. This will be our catch-all for invalid querystrings. Handling it is pretty simple, we just want to redirect the page back to the rss.aspx page without a querystring:

Response.Redirect("rss.aspx")

Done. Now, let's get down to business on creating the feeds. We want to create some strings to hold the title and other information that's needed, but will be different for each feed. So add some variables at the top of your subroutine (just after the sub declaration, outside your select case):

Dim sTitle, sLink, sDescription, sPathPrefix As String

This will let us create a title, link, description and a base prefix for the item elements that we'll create and populate a little later. In each section, you'll want to add stuff for each of those items as follows:

Case "PR"
   sTitle = "Press Releases Headlines"
   sLink = "http://www.mywebsite.com/rss.aspx?RssFeedName=PR"
   sDescription = "The latest press releases from Me."
   sPathPrefix = "http:\\www.mywebsite.com\PR\"
Case "Jobs"
   sTitle = "Newly posted jobs"
   sLink = "http://www.mywebsite.com/rss.aspx?RssFeedName=Jobs"
   sDescription = "The latest jobs here."
   sPathPrefix = "http:\\www.mywebsite.com\Jobs\"

You can see, the title and description are pretty straightforward, the link, is the link that you would follow to get to this RSS feed, and the prefix we create so that we can direct the items in the feed the correct place. When we process the folder holding the items for the feed, we'll just get filenames. The prefix will be the link path we'll use as a prefix to the filename so that people can link directly to the item. Ok, now we can populate our Rss document with the current information, we'll have a section after the select case that allows us to do all the feed creation. Below the select case, add the following:

objXML.WriteElementString("title", sTitle)
objXML.WriteElementString("link", sLink)
objXML.WriteElementString("description", sDescription)
objXML.WriteElementString("copyright", "(c) 2006, My Company")
objXML.WriteElementString("ttl", "120")

This will add 5 elements to our RSS feed, the first 3 are specifc to the feed, but the other two aren't, so we don't create variables for them. Guess what copyright adds... and the ttl means determines the (T)ime (T)o (L)ive in minutes. This determins how long this feed can be cached (in minutes) before it should be re-cached. Now let's concentrate on the items that will be the real meat of the feed, so far we've mostly been setting metadata about the feed itself.

Now if you recall, we've setup folders that we want to iterate through and list each of the files as a new item on our feed. Without going into too much detail on the file IO stuff, we'll be using a little bit of that in our code. Start by importing the file IO namespace, add to the top of your page:

Imports System.IO

This will give us easy access to the objects we need to read the directory and get a list of the files. Now, we need to create a DirectoryInfo object, it will hold the details of the folder containing all our items, we won't use the details, but from this, well get our file list that is our item list. The directory we want info on is different for each feed, so we need to instantiate our directoryInfo object with a different directory for each feed. So to start, let's create an uninstantiated directoryInfo object before our select case:

Dim dirItems As DirectoryInfo

Now we'll do the instantiaion our dirItems within the select case statement for our particular feed. So after each sPathPrefix statement, add an instantiation line for the Press Releases feed:

dirItems = New DirectoryInfo(Server.MapPath("PR"))

and for the Jobs feed:

dirItems = New DirectoryInfo(Server.MapPath("Jobs"))

Ok, now we have directory info, all we need to do is retrieve a list of the files in the directory, then iterate through them, create our RSS elements and add the elements. Skip down to the end of your sub, after where we add the channel elements and add the code to iterate through the files in the directory:

For Each File As FileInfo In dirItems.GetFiles()

Next

Now it's just a matter of creating the item elements. First we have to Start an Item element, then add the children elements that have all the item's details, then we End the item and we're done. The put the following code inside your for next statement:

objXML.WriteStartElement("item")
objXML.WriteElementString("title", File.Name.Replace(File.Extension, ""))
objXML.WriteElementString("description", _
   File.Name.Replace(File.Extension, ""))

objXML.WriteEndElement()

So we Start our item element, then add our title and description. I had the string utilities of the name property replace the extension of our file with nothing so it removes the .htm off our title and description. Next, let's add our link element, this gives your rss feed a link to the item. Add the following just below the description element:

objXML.WriteElementString("link", Path.Combine(sPathPrefix, File.Name))

To create our link, we add the path prefix, to the filename and get our link. Now if you try to validate this feed (unfortunately, it has to be available via internt to be validated), you'll get some interesting results. It will probably complain about not having a GUID, from my research it seems the GUID is just means to make sure that all the items are somehow differentiated from each other. Somewhere I read that you can just make it the same as the link and call it good, so that's what I did (since each one's link is unique to the one item). The following takes care of that and pretty much is just a copy of the link statement, but with GUID instead, add this below the link statement:

objXML.WriteElementString("guid", Path.Combine(sPathPrefix, File.Name))

Ok, one last piece. We don't have a publication date. So let's add one. Add the following below the GUID statement:

objXML.WriteElementString("pubDate", _
   File.CreationTime.ToUniversalTime.ToString("r"))

We're just took the creation date of the file and added that as the publication date, you may choose the last updated date or something else instead. You'll also notice that it goes through a transformation to Univeral Time. If you don't do this, your feed won't validate properly, in order for it to validate, it needs to be in a specific format (RFC 822), it looks something like this:

Mon, 11 Mar 2004 10:42:02 CST

The problem comes in that the devault Date returned isn't in that format. To remedy this, we convert it to Universal Time, output it as a string, and tell it to format it as "r" and it will spit out what we want. Now, we should have a fully validating working RSS feed. Run your Rss.aspx page and click on a link. Viola' an Rss feed. This will allow a user to subsribe to your feed and you'll provide it a valid Rss feed to them.

Your final code for the CreateRssBody should look like this:

Private Sub CreateRssBody(ByVal vsFeedToCreate As String, _
   ByRef objXML As XmlTextWriter)

   Dim sTitle, sLink, sDescription, sPathPrefix As String
   Dim dirItems As DirectoryInfo

   Select Case vsFeedToCreate
      Case "PR"
         sTitle = "Press Releases Headlines"
         sLink = "http://www.mywebsite.com/rss.aspx?RssFeedName=PR"
         sDescription = "The latest press releases from Me."
         sPathPrefix = "http:\\www.mywebsite.com\PR\"
         dirItems = New DirectoryInfo(Server.MapPath("PR"))
      Case "Jobs"
         sTitle = "Newly posted jobs"
         sLink = "http://www.mywebsite.com/rss.aspx?RssFeedName=Jobs"
         sDescription = "The latest jobs here."
         sPathPrefix = "http:\\www.mywebsite.com\Jobs\"
         dirItems = New DirectoryInfo(Server.MapPath("Jobs"))
      Case Else
         Response.Redirect("rss.aspx")
   End Select

   objXML.WriteElementString("title", sTitle)
   objXML.WriteElementString("link", sLink)
   objXML.WriteElementString("description", sDescription)
   objXML.WriteElementString("copyright", "(c) 2006, My Company")
   objXML.WriteElementString("ttl", "120")

   For Each File As FileInfo In dirItems.GetFiles()
      objXML.WriteStartElement("item")
      objXML.WriteElementString("title", File.Name.Replace(File.Extension, ""))
      objXML.WriteElementString("description", _
         File.Name.Replace(File.Extension, ""))
      objXML.WriteElementString("link", Path.Combine(sPathPrefix, File.Name))
      objXML.WriteElementString("guid", _
         Path.Combine(sPathPrefix, File.Name))
      objXML.WriteElementString("pubDate", _
         File.CreationTime.ToUniversalTime.ToString("r"))
      objXML.WriteEndElement()
Next

End Sub

Epilogue

I took my code and posted it on our website, expecting that people would inquire what RSS was, and then get excited about it. Acutally, it didn't work quite like that, I had a user come in and say, "I clicked on the little orange picture expecting audio and I got some code, there must be an error." The reaction was that It wasn't very 'customer-centric' to have xml code just hanging out like that.

Since I wanted this to be a success for me, I had to do some reformatting to make the XML more customer-centric. Probably in the course of this excercise, you've found an RSS feed that has a nice nifty looking 'customer-centric' RSS page, so how do I do the same? It's actually pretty easy, and will be the subject of my next post, 'how to make it purdy'. (Hint: remember the xsl-stylesheet element we skipped over...?)

Posted: Thursday, August 31, 2006 10:32 AM by Yougotiger
Filed under: ,

Comments

No Comments

Leave a Comment

(required) 

(required) 

(optional)

(required) 

Enter the code you see below

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS