Posted by Jacob Harris
Wed, 25 Jan 2006 01:12:00 GMT
Earlier this year, I made a New Year’s Resolution to write more postings to the blog, but I have failed abjectly at that so far sadly. January is sadly a busy month for me, and I usually don’t feel like coming home after a long day to stare at a computer screen some more. But occaisionally inspiration strikes, and I feel like sharing some thoughts I have about the future of Syndication via RSS and Atom feeds in the next few years.
I have to agree with my company’s CEO Steve Goldstein when he says that increasing RSS/Atom adoption will be one of the big trends in the information industry in 2006. Speculating wildly, I think that 2006 will be The Year of RSS and Atom. And so, I’m going to make a few bold predictions of my own about the state of syndication and the software industry. To avoid spending even longer without posting to the blog, I have decided to not bother with research, so if something I have predicted has actually happened already, be sure to make a snarky comment about it to me below. Thanks. And now here are big things I see happening in the next year or so:
Mainstream Feed Adoption – Currently, most of the sites producing feeds are still blogs and news sites –- those for whom the article paradigm within feeds is a natural fit. However, non-newsy sites like del.icio.us and Flickr have earned a lot of buzz not just for their front-end interfaces but for their innovative and pervasive use of feeds everywhere and using syndication for non-article content. But this is still under the radar of the typical Internet user for now. I think this will change this year. I expect a major online site to follow suit this year and provide syndication of their data in a similar fashion. For instance, I would not be surprised if Amazon began to offer feeds for user wishlists or upcoming recommendations or music alerts. That would rock.
More People Will Embrace RSS, But Still Not Know It – A survey from last year by Yahoo! revealed that only 4% of Internet users knew what RSS was, but this will most certainly change. The name is obviously somewhat to blame (there’s a joke that if it were called SpeedFeed more people would use it), but I think it’s not the real problem. In truth, most people just don’t care. For your typical web user, the Web is a toy more than a place to siphon information from; they are therefore not exactly vexed enough by the slowness of visiting sites to want to setup an aggregator. Instead, I think some different killer app beyond mere aggregation will be the only thing to make most people want to use RSS. I have no idea what that app might be, but I’m sure one of you could think of something awesome.
RSS Is A Terrible Business Though – For companies that produce or sell content, RSS will be a good fit, providing additional ways for users to access the site and contributing a small boost in sales or page visits. But companies in the RSS software business (particularly online aggregators) will have a terrible time. There are just very few barriers to entry for competitors, and the whole we’ll make money through advertising model seems so very pre-boom to me. That said, there are still some promising markets for companies doing RSS software. Software that allows filters or other mechanisms for mitigating information overload from feeds will find a market. Bridges from the pull-based mechanism of RSS/Atom to push-based messaging services like SMS, Email, or IM will probably also do well. And of course, it would be smart for makers of web analytics software to also look at measuring conversions and clickthroughs from feed links.
Syndication Will Drive REST Web Services – One of the common complaints leveled against REST-based Web Services is that the simple model of REST does not specify the XML representation for data (unlike the case of XML-RPC and SOAP). This will never officially change, but I think it will become increasingly more common to see REST Web Services that return RSS-like data for searches. Indeed, throw in Amazon’s OpenSearch extensions for RSS and you have support for presenting pagination within RSS documents. The practical upshot of this is that programmers can use the exact same code for RSS, Search Engine Plugins (if any other engines besides A9.com support OpenSearch_), and REST-based Web Services. More significantly, this makes REST more compelling to these programmers, since supporting SOAP’s data encoding would require supporting an additional data format for searches and more code to implement. Similarly, the arrival of the Atom Publishing Protocol will also bring awareness of REST Web Services to a wider market of people who will be using it without even knowing it.
Feed Extensions Will Get Wider Support – As RSS/Atom grows in its usage, it should be possible to find more aggregators and process that also support the more popular extensions to feeds (these are normally specified within the document in a separate namespace). I think a few extensions will become essential in the years ahead (and I don’t even know if they exist currently!): support for threading of RSS items (for feeds based on email lists); partial encryption of RSS documents to be decrypted in the reader (would allow banks to offer feeds for instance); geographic tagging for marking stories (see the breaking news/photos on a map). In any event, I think good aggregators will show in some fashion all tags embedded in the feed so that users could search or filter based upon them.
Syndication in the Enterprise – Finally, it is obvious that RSS/Atom will become more common within corporate intranets as well. And the credit for this belongs to Microsoft of all companies. Mainly, by placing RSS within all aspects of Windows Vista, Microsoft will drag all sorts of big IT Departments into accepting RSS as a solution for messaging and event notification. This will in turn make them more likely to also accept other solutions based on syndication. In fact, I’m optimistic enough to think that B2B Syndication-based products will do better than B2C (remember that vexing 4% recognition rate from above; CTOs can mandate use in their companies). Smaller companies will be quick to embrace the immediacy of RSS and larger companies will also enjoy it for Windows Vista integration.
Atom vs. RSS2.0 Holy Wars Won’t Matter I know it vexes some readers of this posting that I’m using RSS sometimes as a shorthand phrase for RSS2.0 or Atom-based syndication. And I agree with them that Atom is technically superior while RSS2.0 has better market share. But my ultimate feeling is that the differences between the two for the consumer will be largely academic (like caring about whether your audio player is playing MP3 or AAC), since any good aggregator or processor should just be able to handle both formats effortlessly.
I will probably follow up on some of these points in more depth anon, but feel free to comment and quibble in the comments section below. I am no industry expert, but I do so like to make guesses about the business. And I’m curious what you think as well. Check in next year and we’ll how well they turned out.
Posted in Web Services | Tags atom, prediction, rss, syndication | 1 comment
Posted by Jacob Harris
Fri, 30 Sep 2005 07:09:00 GMT
Courtesy of Tim O’Reilly’s
Foo Camp (which I am definitely not cool enough to be invited to), there is now a picture to match the exciting flow of ideas and themes coalescing into
Web 2.0. I think this assemblage of bubbles and trends is a great thing to see, especially since it serves as a better executive summary of high-level ideas than gleaning bits and pieces of the big picture from blogs and demo sites across the web.
That said, I think one thing is missing from the picture they provide. Maybe I am a bit preoccupied with the subject, but I think RSS (or Atom here, I’m just using RSS as shorthand for syndication) is really one of the biggest things driving Web2.0 services and adoption these days, but it hasn’t even gotten a mention in the top as an influencing technology (unlike blogs or Gmail). I think blogs were great at establishing RSS as a way of keeping track of changes, but the really influential aspect of Del.icio.us and Flickr is not just tagging, but establishing RSS as a mechanism for tracking any possible view of the system you might want in as light-weight and user-friendly mechanism as possible (as opposed to the awkwardness of SOAP or even REST to the end user).
I think the source of my unease here is that I’m mostly a backend guy. A lot of my work at Alacra has been making sure that all sorts of information flows agilely between processes and servers. Backend stuff. It makes it happen, but if it’s working, you never notice how critical it is to success. Similarly, AJAX and other front-end browser mechanisms are very nice in my mind. But the biggest joys and successes of Web2.0 are all driven by the fluidity and ease of RSS and REST. “Hackability”, “Data as Intel Inside”, “Right to Remix” ... RSS made this a possibility and these are what drives me to take Web2.0 seriously and not just as another wave of web hype. All I’m asking for is a little recognition. Thanks.
Update: The good news is that it seems like I’m not alone in this view. The bad news is my company is Dave Winer.
Posted in Web, Web Services | Tags rss, web2.0 | 1 comment
Posted by Jacob Harris
Fri, 23 Sep 2005 10:03:00 GMT
As you may have guessed from this site, I am hardly a person you would expect to praise Microsoft for their technical insight and forward thinking. Like the story about generals, they always seem to be fighting the last war, with their late arrival to the Internet and Web being a current and recurring example (now, rather than Netscape, it’s Google that is seen as their greatest threat). However, occasionally they get a thing right along the way, and I feel honor-bound to note it in this case.
Over at The Register, there is an article RSS Goes To Work In Windows detailing the ways in which Microsoft has jumped onto the RSS bandwagon with Windows Vista (shipping next year, maybe?). For those of you following along at home, Microsoft is planning to ship a built in RSS-store capability in Windows which will allow any application compiled against the appropriate API to be an RSS aggregator and manipulate feeds programmatically. This is a nice touch, but hardly a stretch yet.
What is interesting to me is that Microsoft discusses their plans to have the next release of their Dynamics CRM software to create RSS feeds that can be syndicated to. Other server applications with RSS extensions in the works are Sharepoint Portal, Exchange, and possible Office. As the article quotes it
“CRM is one of the first examples of how we see RSS unlocking data in the back end data systems,” Amar Gandhi, Microsoft Internet Explorer group program manager, told The Register during a recent interview. Microsoft revealed plans to RSS-enable its CRM last week at the Professional Developers’ Conference (PDC)
Chris Caposella, vice president for Microsoft’s information worker product management group, told software developers attending PDC Microsoft believes RSS would be transformed into a platform that embraces business applications.
That last paragraph was something I’d never see. Because honestly, I never thought people would “accept” RSS in the Enterprise. While we can joke about how meaningless the term Enterprise Software is, RSS is definitely not it by any usual sense of the term for several reasons:
- RSS in general has no typing or generalized schema. It’s really a schema for news stories that can be generalized for events, but it’s not for serialization of complex typed objects like SOAP.
- The RSS model is extremely simple. Pull requests over HTTP. No pushing, no notification mechanism for new content.
- In itself, RSS has no security for content (I know, Atom has support for encrypted content), and access security is generally only done through obscurity (giving an individual user a feed with a long unique token in it, maybe unguessable, but not unsniffable)
- Finally, RSS is not really controlled. There are no authorization lists, no ACLs or DRM or other such mechanisms to control who can subscribe and where they can do it.
That said, I think all of these aspects are why RSS has succeeded so widely in the world of the Web2.0, while SOAP has largely failed to gain traction. In fact, considering that one of SOAP’s biggest backers is Microsoft, this RSS announcement reveals what a disappointment SOAP has been as a web service platform. So, why is a company like Microsoft so willing to reject one of its software initiatives and embrace RSS so, in effect giving RSS an mantle of legitimacy in the Enterprise it might’ve taken years to achieve?
I think that if we want to consider the biggest Web2.0 trend for the coming year, it won’t be the proliferation of AJAX or increasing maturity of Web Frameworks. It’ll be the proliferation of RSS into every aspect of server-side content. One of the greatest things about Flickr and Del.icio.us is that you can subscribe to an RSS feed for every possible page. And since pages are dynamic content (combinations of photos, groups, tags, users), what you are really subscribing to is not a list of documents, but the latest matches for a search (ie, “find me all photos tagged Italy by this user”). And it’s easy to do, regardless of the site.
One of the cool features of Mac OSX Tiger’s Spotlight (and probably Windows Vista) is the ability to save searches as virtual folders which can be opened and browsed like regular folders in the Finder. This is a really cool way to manage ever-growing content. Now, imagine being able to do similar functionality with server-side content, and suddenly the appeal of RSS makes a lot of sense (admittedly since RSS usually shows only the last 20 items or so, there is a you snooze, you lose problem, the general metaphor holds). Most of the Web2.0 is about taking the applications of the desktop into the browser, this brings the web back into the machine and applications.
No wonder Microsoft suddenly has the RSS Religion. Done well, this might be their best chance to keep people hooked into Windows and all their software products for business. Of course, they are trying to do the old embrace, extend, extinguish trick, but this time it may not fly because Microsoft’s not in the driver’s seat for the technology. That said, I must admit that Microsoft has made a smart and proactive move for the future here, and I think it’ll benefit them well. But more importantly, it lends a large amount of authority to RSS in the Enterprise, which makes RSS and the Open Source Standards-Based community the biggest winner of all. At Alacra, we’ve felt that RSS will be much bigger in the Enterprise for a while. It’s neat to see we’re not alone.
Posted in Web Services | Tags delicious, enterprise, flickr, microsoft, rss | no comments
Posted by harrisj
Mon, 18 Jul 2005 01:37:00 GMT
An interesting idea occurred to me the other day...
First, a disclaimer. This is not a specification. Specifications are fixed and obsessively spelled out, and I would rather explore the idea fully first. Besides, specifications are notoriously boring and opaque, and I think that's part of the problem I want to address. So, let's get started.
The Wonders of RSS
RSS is a pretty neat thing when you think about it. In the past 3 years we have seen a veritable explosion of sites that create RSS feeds and interesting readers that consume them. It's led to some pretty sophisticated ways of reading the web, and your standard aggregator allows grouping, regular checking, unicode support and searching. In addition, newer apps even allow users to do things like create "virtual feeds" by specifying search queries that can be run across multiple feeds, while services like Bloglines allow you to take your subscriptions anywhere around the globe and will probably soon over ways of transposing RSS items into emails or SMS content or such. I imagine the next big thing will be helping us to better cope with information overload from RSS feeds. Which is fascinating stuff. Especially fascinating is that this is in spite of (or rather because of) RSS is painfully stupid. It's not the first or the "best" mechanism for distributing content across the Internet (I think many would argue that NNTP is far more elegant and scalable), but its basic simplicity and ease of manipulation has made it a winner (although IMHO it took the further simplification of RSS2.0 and Atom vs. RSS1.0 to ensure this.)
So, what was my point in a nutshell? There are several awesome things about RSS that I think should be noted in a bullet list:
- Very simple. The RSS2.0 spec is so basic that deserves the name Really Simple Syndication
- Doesn't care about transport. If HTTP is good enough for everybody else, it's good enough for us. Of course, you could put RSS on top of other things, but the important thing is that we don't need to worry about implementing the transport layer here.
- Extensible. One good thing about RSS is that you can embed additional content in the basic RSS feed. As long as you use a namespace for your elements, any good RSS reader should ignore them, but savvy ones can use them.
- Regular checks. One of the common complaints about RSS is that it's a dumb pull-based mechanism and thus can cause many redundant server accesses from every RSS client checking a feed once an hour. This however is a virtue in my mind. It keeps things simple (no subscription tracking).
- Time management. With RSS you're not so much browsing the web; instead you are monitoring it.
Read that last point again. Because this is where I went from Modern Web 101 to a Reese's peanut butter cup moment (I'll explain the product plug in a bit). When you get down to it, there is nothing in RSS2.0 (or RSS1.0 or Atom; I don't care which RSS spec you prefer) that says the items in an RSS channel have to be documents somewhere (ie, new stories, new photos, etc.). And in fact, I would argue that the meaning of an RSS item is really as an event which just happens to be tied to a particular story in many cases. But it doesn't have to be.
RSS and Error Reporting
Okay, the title here gives away where I'm going with this, but I still want to provide a little more background on things. In my office, I'm a backend guy. This means I do a lot of work tying together various systems and making sure data flows around smoothly and efficiently. It works really well actually and I'm quite proud of our simple system (more on that later). But errors are inevitable, and they occur in many different areas. Imagine you have a web server farm with applications. You can think of where errors occur in terms of domains (listed below with some standard error reporting mechanisms)
- Hardware - if the underlying machine has problems, the OS will hopefully catch it. If you're using Unix-based systems, you can configure
syslog*. Otherwise, on a Windows-machine it might get logged into the System Events and perhaps picked up by a third-party tool like Patrol or such.
- Network - Routers, firewalls, etc. can go down.
SNMP seems to be the standard mechanism for reporting problems in the network.
- Availabilty - A machine may be up but unreachable due to high load or other things.
NAGIOS is a way of catching this.
- Web Server - the web server may have problems with some requests. This are generally logged to a local HTTP log. Logs from multiple HTTP servers can be sent to another host via
syslog, but this is rarely done.
- Web Application errors - in the best cases these might be logged to a local file on the server, but usually they are just displayed on the screen to the user.
There are a few interesting things to note here:
- Some of these error reporting solutions also include a networking protocol in addition to an error format, meaning you could assemble a centralized console for error reporting; no such luck in the web server.
- For applications, it's pretty much left to each developer. As a result, errors if they are logged are often local, since not many people think to use
syslog for error reporting.
- I think the ideal thing for site maintenance is to have some sort of centralized view of errors in the system as they occur.
Anyhow, to cut an already long story somewhat short, I was working on an application that would automatically generate and process RSS feeds that I wanted to be robust and I wanted to know whenever errors occurred without having to login to the machine (ala syslog). So, I looked into some ways of reporting/logging errors when it struck me: why not use RSS as an error reporting mechanism?
You Got Your RSS in My Syslog, You Got Your Syslog in My RSS!
And so we get to my Reese's Peanut Butter Cup reference. Think about it, RSS has some real advantages to error reporting:
- Very simple. RSS has a lot of support across many platforms and with a wide range of viewers/aggregators. Does syslog or other networked error reporting mechanism. And it has support in programming language libraries too.
- Doesn't care about transport. HTTP is established and allowed through most corporate firewalls. No dumb network debugging or NAT fiddling and such.
- Extensible. One size doesn't fit all. I might want to record basic messages in some cases (ie, "printer on fire") or capture detailed stack traces in another (Rails logging). XML + Namespaces is a great way of layering stuff in there (although you need an RSS reader that will allow you to look at the underlying XML to get the extra data)
- Regular Checks. I like the pull communication of RSS because it acts like a heartbeat to your error reporting services; when something is wrong with one of my sources, I know it because my RSS reader tells me so. (to make an analogy, suppose you had a phone to be called for emergencies; does not getting any calls mean that there are no emergencies or might the line be dead?). Granted, I might not get an error immediately, but to be honest I don't think I've encountered many errors that a 5 minute delay would've made impossible to handle; don't use it your nuclear reactor though I guess.
- Time Management. So, if I wanted to I could have errors, warning, statistics and other things coming into my central console. I don't have an insight into the best way to manage all this information effectively, but I have a pretty good idea that problems of information overload will be tackled much sooner in the RSS world than they have been in the Syslog user base.
So, here's the idea. Define a basic set of RSS error fields in a particular XML namespace (like http://www.nimblecode.com/rsserror). This will have a set of optional tags that could be added to items. If users want to define their own extra tags they could just by declaring a new NS like (like http://www.nimblecode.com/rsserror/syslog) and add them. Really, you can add anything to XML, but the NS tags help consumers of the information to know what to expect (so, if I want to look for syslog-type errors). An example of this might be produced soon.
The Minimum Error Fields
So, if we were to make a minimal RSS error specification, what optional and required tags would it include. I have an idea:
<error:msg> The text of an error message. Note that this could also be put in the title or description of the regular RSS item, but I think it's good to define an explicit place in a namespace it will always be.
And that's it. Obviously, there are other things we might want to record. For instance in my case, I think of optional tags like timestamp, extended, code, hostIp, application, etc. Not to mention some of the error information available with syslog traces, but I want to weigh what tags should go in a given namespace before I start adding options willy-nilly (indeed, there's no reason why you couldn't define your own extensions.)
So for instance, if we were defining an addition syslog RSS error namespace, we could also add the following syslog fields as tags:
timestamp - a timestamp for the error
priority - the priority of the message (EMERG, ERR, WARNING, INFO, etc.)
facility - the facility option for syslog (AUTH, MAIL, USER, etc.)
On the other hand, if I were working on extensions for Ruby on Rail's logging format, I could include the following tags:
timestamp - the time the error occurred.
severity - the severity of the error
sql - the SQL executed against the backend server
backtrace - the call stack where the error occurred
Both of these would still use error:msg for the main message with the error. It looks like I should also make timestamp part of the standard set.
So there you have it. I will probably be working on this a bit more, but it's an interesting idea and I'm curious what other people might think.
Posted in Web Services | Tags rss, syslog | 2 comments
Posted by harrisj
Sat, 16 Apr 2005 17:43:00 GMT
David Heinemeier Hansson has posted an interested find over at his blog Loud Thinking about IBM’s stated new goal to pursue Radical Simplification in their enterprise work. Essentially, Big Blue is starting to acknowledge that most enterprise web development is just too cumbersome and daunting for agile and powerful web development. Things like SOAP and WSDL are examples of this. What should be a simple task like retrieve a weather forecast from a remote site becomes a complicated mess of debugging SOAP calls, tweaking WSDL specifications and using wizards on the server to generate code that is impossible for the end user to debug, let alone understand.
Or as Sam Ruby puts in his excellent presentation to IBM on the topic Hello From The Open Source World:
For normal people, the perceived usefulness of a computer language is inversely proportional to the amount of theory the language forces you to learn.
Sam illustrates this in his slides by showing the classic “Hello World” example in a variety of programming languages on different slides. But when he reaches WSDL to specify it, it’s a blank. Because, quite frankly, WSDL is so unwieldy to use, it’s hard to just build it on the fly, forcing many people to plunk down bucks to Microsoft or IBM to get their compilers to build it for them. It’s good business for those companies, but bad news for the Internet as a whole I think.
Ruby says that essentially for a framework to succeed on the web, it has to enable a situation he terms Zero Training, a state where it is easy to get going in the language and to adapt examples to your own needs. Good programming languages have it, some web technologies have it, SOAP and WSDL don’t. Indeed, I feel like the growth of SOAP/WSDL in the enterprise has been in spite of the difficulties of developing for it (mainly because of Microsoft and IBM pushing it). Because, quite frankly, it’s a beast for anything but the most basic RPC-style calls. The reasons:
- SOAP allows you to abstract away the underlying transport protocol. But for 99% of SOAP communication, this protocol is HTTP, and the abstraction limits the control you have over HTTP.
- SOAP needs to validate the entire message before applications can operate on it. This usually means it has to load it up into a DOM as well, so calls that return lots of data or might take a while (the fun ones) are frowned upon
- SOAP also assumes the COM/CORBA marshalling mechanism, so no streaming data or parsing data until the entire document has been received, parsed into an XML DOM tree, and then mapped into an in-memory object tree. For large amounts of data, forget it.
- WSDL has a lot of syntax to map procedure names/arguments onto HTTP. Contrast this to REST, which embraces the inherent naming conventions of HTTP and applies them to a simple procedural call model
- The assumption of these tools is that you will use their SOAP wizards instead of rolling your own code. The wizard is meant to be the only way, not an additional way.
- In fact, the idea that SOAP = Simple Object Access Protocol has become a profound irony.
In his presentation, Mr. Ruby points to PHP as an alternate model. PHP is not really a pretty programming model and it tends to shy away from enforcing higher-level abstractions in favor of lower-level models. But, it is precisely for this reason that PHP has succeeded where many other dynamic web application programming models have failed. Indeed, as it says in the page Do You PHP?
Database abstraction is mostly a myth. There is nothing wrong with direct database calls’ making use of all the tricks and cheats your chosen database has to offer, to tweak as much performance as possible out of it.
And this is the kicker here. Complicated database models tend to abstract away the lower level details of individual databases, even when those details can be the difference between fast and sluggish performance. Remote invocation models like SOAP abstract away the details of the underlying calling mechanism, even when tweaking HTTP requests can be the difference between a fast web app and a slow one. The lower-level mechanism might require more work for the developer, it might be ugly as sin, and it might be wrong and inefficient on some levels, but it works, and more importantly, it works quickly. Because ultimately, the customer doesn’t care what technology is used on the backend, they just want their data fast.
Bringing it back to my own experience, I once had to work on an application where we searched a remote document repository via SOAP and got metadata back on matching documents in XML. The service would send back all matching results for a query and in some cases, this would result in returning 150000 documents over 5 minutes of streaming XML to us. Knowing that the user probably would want to see data as quickly as possible, I decided it would be good to render whatever data I got after I received 10000 documents and tell the user to perhaps narrow their search down. And I tried to do this in SOAP in Visual Studio, and I failed. I could error out completely, but the mechanism had abstracted away the underlying XML-over-HTTP communication that I had no hooks to get lower level.
And so, I chucked it out and went back to basics. The upcoming call to the remote service was faked by filling in an XML string I grabbed from the original SOAP call via a packet sniffer. On the return, my program screams through the data with stream-based XML parsing. Once I hit 10000 records, then I can stop reading the input stream, display what I have, and tell the user that they need to narrow their search criteria. All within 30 seconds.
Those people who use SOAP are probably horrified by this and would rightly point out that I had to do a lot of lower-level work to get some of the same functionality promised by the one-click higher-level SOAP abstraction. But I think that’s precisely the problem. I needed something to work, and I didn’t need as much for it to be pretty. The SOAP framework locked me into a particular model, and once I needed more than that, it failed to deliver. Abstraction is nice, but abstraction always reduces speed, abstraction always reduces flexbility. In most cases, it is possible to strike a balance where the abstraction helps more than it hurts, but until SOAP is able to handle the heavy lifting needed for serious web work, it isn’t worth my time.
Posted in Web Services | Tags rss, soap | no comments