RSS as an Error Reporting Mechanism
Posted by harrisj Mon, 18 Jul 2005 01:37:00 GMT
An interesting idea occurred to me the other day...
First, a disclaimer. This is not a specification. Specifications are fixed and obsessively spelled out, and I would rather explore the idea fully first. Besides, specifications are notoriously boring and opaque, and I think that's part of the problem I want to address. So, let's get started.
The Wonders of RSS
RSS is a pretty neat thing when you think about it. In the past 3 years we have seen a veritable explosion of sites that create RSS feeds and interesting readers that consume them. It's led to some pretty sophisticated ways of reading the web, and your standard aggregator allows grouping, regular checking, unicode support and searching. In addition, newer apps even allow users to do things like create "virtual feeds" by specifying search queries that can be run across multiple feeds, while services like Bloglines allow you to take your subscriptions anywhere around the globe and will probably soon over ways of transposing RSS items into emails or SMS content or such. I imagine the next big thing will be helping us to better cope with information overload from RSS feeds. Which is fascinating stuff. Especially fascinating is that this is in spite of (or rather because of) RSS is painfully stupid. It's not the first or the "best" mechanism for distributing content across the Internet (I think many would argue that NNTP is far more elegant and scalable), but its basic simplicity and ease of manipulation has made it a winner (although IMHO it took the further simplification of RSS2.0 and Atom vs. RSS1.0 to ensure this.)
So, what was my point in a nutshell? There are several awesome things about RSS that I think should be noted in a bullet list:
- Very simple. The RSS2.0 spec is so basic that deserves the name Really Simple Syndication
- Doesn't care about transport. If HTTP is good enough for everybody else, it's good enough for us. Of course, you could put RSS on top of other things, but the important thing is that we don't need to worry about implementing the transport layer here.
- Extensible. One good thing about RSS is that you can embed additional content in the basic RSS feed. As long as you use a namespace for your elements, any good RSS reader should ignore them, but savvy ones can use them.
- Regular checks. One of the common complaints about RSS is that it's a dumb pull-based mechanism and thus can cause many redundant server accesses from every RSS client checking a feed once an hour. This however is a virtue in my mind. It keeps things simple (no subscription tracking).
- Time management. With RSS you're not so much browsing the web; instead you are monitoring it.
Read that last point again. Because this is where I went from Modern Web 101 to a Reese's peanut butter cup moment (I'll explain the product plug in a bit). When you get down to it, there is nothing in RSS2.0 (or RSS1.0 or Atom; I don't care which RSS spec you prefer) that says the items in an RSS channel have to be documents somewhere (ie, new stories, new photos, etc.). And in fact, I would argue that the meaning of an RSS item is really as an event which just happens to be tied to a particular story in many cases. But it doesn't have to be.
RSS and Error Reporting
Okay, the title here gives away where I'm going with this, but I still want to provide a little more background on things. In my office, I'm a backend guy. This means I do a lot of work tying together various systems and making sure data flows around smoothly and efficiently. It works really well actually and I'm quite proud of our simple system (more on that later). But errors are inevitable, and they occur in many different areas. Imagine you have a web server farm with applications. You can think of where errors occur in terms of domains (listed below with some standard error reporting mechanisms)
- Hardware - if the underlying machine has problems, the OS will hopefully catch it. If you're using Unix-based systems, you can configure
syslog*. Otherwise, on a Windows-machine it might get logged into the System Events and perhaps picked up by a third-party tool likePatrolor such. - Network - Routers, firewalls, etc. can go down.
SNMPseems to be the standard mechanism for reporting problems in the network. - Availabilty - A machine may be up but unreachable due to high load or other things.
NAGIOSis a way of catching this. - Web Server - the web server may have problems with some requests. This are generally logged to a local HTTP log. Logs from multiple HTTP servers can be sent to another host via
syslog, but this is rarely done. - Web Application errors - in the best cases these might be logged to a local file on the server, but usually they are just displayed on the screen to the user.
There are a few interesting things to note here:
- Some of these error reporting solutions also include a networking protocol in addition to an error format, meaning you could assemble a centralized console for error reporting; no such luck in the web server.
- For applications, it's pretty much left to each developer. As a result, errors if they are logged are often local, since not many people think to use
syslogfor error reporting. - I think the ideal thing for site maintenance is to have some sort of centralized view of errors in the system as they occur.
Anyhow, to cut an already long story somewhat short, I was working on an application that would automatically generate and process RSS feeds that I wanted to be robust and I wanted to know whenever errors occurred without having to login to the machine (ala syslog). So, I looked into some ways of reporting/logging errors when it struck me: why not use RSS as an error reporting mechanism?
You Got Your RSS in My Syslog, You Got Your Syslog in My RSS!
And so we get to my Reese's Peanut Butter Cup reference. Think about it, RSS has some real advantages to error reporting:
- Very simple. RSS has a lot of support across many platforms and with a wide range of viewers/aggregators. Does syslog or other networked error reporting mechanism. And it has support in programming language libraries too.
- Doesn't care about transport. HTTP is established and allowed through most corporate firewalls. No dumb network debugging or NAT fiddling and such.
- Extensible. One size doesn't fit all. I might want to record basic messages in some cases (ie, "printer on fire") or capture detailed stack traces in another (Rails logging). XML + Namespaces is a great way of layering stuff in there (although you need an RSS reader that will allow you to look at the underlying XML to get the extra data)
- Regular Checks. I like the pull communication of RSS because it acts like a heartbeat to your error reporting services; when something is wrong with one of my sources, I know it because my RSS reader tells me so. (to make an analogy, suppose you had a phone to be called for emergencies; does not getting any calls mean that there are no emergencies or might the line be dead?). Granted, I might not get an error immediately, but to be honest I don't think I've encountered many errors that a 5 minute delay would've made impossible to handle; don't use it your nuclear reactor though I guess.
- Time Management. So, if I wanted to I could have errors, warning, statistics and other things coming into my central console. I don't have an insight into the best way to manage all this information effectively, but I have a pretty good idea that problems of information overload will be tackled much sooner in the RSS world than they have been in the Syslog user base.
So, here's the idea. Define a basic set of RSS error fields in a particular XML namespace (like http://www.nimblecode.com/rsserror). This will have a set of optional tags that could be added to items. If users want to define their own extra tags they could just by declaring a new NS like (like http://www.nimblecode.com/rsserror/syslog) and add them. Really, you can add anything to XML, but the NS tags help consumers of the information to know what to expect (so, if I want to look for syslog-type errors). An example of this might be produced soon.
The Minimum Error Fields
So, if we were to make a minimal RSS error specification, what optional and required tags would it include. I have an idea:
<error:msg>The text of an error message. Note that this could also be put in thetitleordescriptionof the regular RSSitem, but I think it's good to define an explicit place in a namespace it will always be.
And that's it. Obviously, there are other things we might want to record. For instance in my case, I think of optional tags like timestamp, extended, code, hostIp, application, etc. Not to mention some of the error information available with syslog traces, but I want to weigh what tags should go in a given namespace before I start adding options willy-nilly (indeed, there's no reason why you couldn't define your own extensions.)
So for instance, if we were defining an addition syslog RSS error namespace, we could also add the following syslog fields as tags:
timestamp- a timestamp for the errorpriority- the priority of the message (EMERG, ERR, WARNING, INFO, etc.)facility- the facility option for syslog (AUTH, MAIL, USER, etc.)
On the other hand, if I were working on extensions for Ruby on Rail's logging format, I could include the following tags:
timestamp- the time the error occurred.severity- the severity of the errorsql- the SQL executed against the backend serverbacktrace- the call stack where the error occurred
Both of these would still use error:msg for the main message with the error. It looks like I should also make timestamp part of the standard set.

I’ve been working on a monitoring system (kind of like a cross between nagios and big brother, but in Java instead of shell / C / php), and it’s clear that an RSS event feed is crucial. However, with the wide variety of data that comes out of any significant datacenter, I was leaning towards a more RESTful / del.icio.us / tag driven approach, so each monitoring point would have it’s own virtual RSS feed- otherwise, you can get easily swamped with information.
The real key to a monitoring system is event correlation- discrete events scattered over your aggregator may not be terribly useful, so maybe a recursive / list system (like the new MS proposal) might be a way of bundling up your dependencies all at once.
I bet you could write a simple Log4J appender that fed an rss server this information as well (if it wasn’t an RSS server itself).
I’d be surprised if the next version of Nagios didn’t have an rss feed embedded, though- if they ever get to 2.0 :)
I too am curious about Information Overload. I think though the key here might be to establish some sort of virtual feed capability in your RSS aggregator (ie, show me all things tagged with severity=urgent), but I wonder how well that might be. I think the other issue might be security, but salting the URL for the feed and also limiting to subnets via a configuration file might help there.
I think you are spot on with the Log4j suggestion and I’m about to turn my attention to that to see how nice it might be to implement. The only real problem I have with Log4j and syslog is that I would want to break up the message into parts (like backtrace, etc.) before it gets placed into the logger because then it’s just one long string. But I suppose as a proof of concept, it’s a start.