Morbus Iff wrote:
> Should HTML in RSS *always* be encoded to its entity? (<, etc.)?

No, not always.

I looked up HTML support in the different specs [1-3], since I was curious about this FAQ. It’s, of course, different for the different RSS versions:

RSS 0.9 : no
RSS 0.91: no (by spec)
RSS 0.92: yes, entity-escaped
RSS 1.0 : no; maybe, with content module

All the “no”s are assumed: none of those specs mention HTML. Since 0.92 claims entity-escaped HTML as a new feature, 0.9 and 0.91 must allow no HTML however, 0.92 claims to be a description of then-current use of 0.91, so there are/were feeds with entity-escaped HTML claiming to be 0.91). 1.0 is presumably derived from 0.9 enough to allow no HTML, and its examples contain no HTML (though in the version I looked at, the examples had some unescaped ‘es).

If an RSS 1.0 document makes use of the content module [4], it will have a that may specify XHTML, and may have a . If the format is XHTML and the encoding is not given, character encoding (like 0.92) is assumed. The other encoding option the spec names by name is well-formed XML, which is the only case in all of the RSS specs in which there’s HTML that isn’t encoded in character data.

So for 0.9, vanilla 1.0, and 0.91, all character data is for display to the user. In 0.92, the character data is for interpretation by an HTML-aware user-agent. Some 0.92 files may claim to be 0.91. In 1.0 with the content module, and tell what to do.

Anyone who knows better (such as anyone involved in RSS 1.0 development, on RSS 1.0) should feel free to correct me. Anyone compiling a FAQ should feel free to swipe from this post.

[1] http://backend.userland.com/rss091
[2] http://backend.userland.com/rss092
[3] http://purl.org/rss/1.0/spec
[4] http://purl.org/rss/1.0/modules/content/

Mark Paschal