XSS? – Matt Croydon::Postneo

Sam Ruby has written a new essaylet on Noun vs. Verb:

Perhaps it is time for another essay. A short one, this time.

Here’s a quote from his essay, emphasis mine:

RSS could benefit from an explicit schema. RSS could benefit by more explicit rules defining whether HTML is allowed in titles or relative links are allowed in descriptions. RSS could benefit from a more clear separation of metadata from data. RSS could benefit from an ability to explicitly mark what items must be understood, and which are optional.

Sam has said as much in his essaylet, but exactly why don’t we put together a schema for site syndication? I know something like this has to be approached carefully, as you have various RSS camps. The whole RSS1.0/RDF vs. RSS0.9x/2.0 thing seems worse than Mac vs. PC.

So why not? Would it be possible to set up a schema that at least the vast majority people could agree upon? Could it be possible to have a syndication format that you could actually parse because it is well formed XML? Would it even be RSS then, or would it be something completely different.

XSS (XML Site Summary|Syndication)?

Maybe I’m missing the point. It sounds a bit like reinventing the wheel. Everyone complained when RSS 2.0 came out. They would probably freak out about something like this. At the same time, it makes sense. RSS has evolved a lot. A whole lot. Perhaps we need to take a step back, evaluate what we have, and clean it up a bit. Take what is defined as valid RSS and formalize it. Schema it up.

Something like this would probably solve our metaweblog/blogger API/etc problems. We could send a raw XSS item over HTTP with some authentication headers. We could send a raw XSS item or items over SOAP with some authentication Throw some XSS over Jabber. Do whatever you want with it, because at this point, it’s just data.

It would be important not to tolerate poorly formed XSS. Part of the problem of parsing RSS now is that we stay true to ‘be careful what you send and liberal in what you recieve.’ The problem is, aparently not everyone has been very careful about what gets produced. Now we’re regexing instead of parsing XML in order to get the job done. I’m not quite sure how this would be done, besides through a community effort. If John Smith’s blog produces invalid XSS, we would have to rag on him. We couldn’t do what we did in the past: make an exception and add some code to our RSS parsers so that we can read it. At that point, it would begin to dilute just as RSS has.

Just thinking out loud, taking Sam’s thoughts to their logical conclusion. Why not?

Danny Ayers comments that RSS 1.0 has a schema. That’s great. Unfortunately, RSS is not RSS 1.0. RSS is all versions of RSS that are in the wild, including 0.9x, 1.0, 2.0, etc.