Re: Adding a datatype for HTML literals to RDF (ISSUE-63)

A technical question, though.

For XML Literals, we have a nice definition (proposal) that the value space consists of XML Infosets, and that means we can say whether two literals should be considered identical.

What is the equivalent notion in HTML5? I have had a very short chat with Mike Smith (staff contact at the HTML5 WG) and he has not seen any formal definition on when would two HTML5 fragment be considered as identical. Any bright ideas here?

The HTML5 spec goes in great detail on how an HTML5 document/fragment should be parsed into a DOM. That even handles cases when the HTML5 source is invalid. So... is there a formal definition on when two DOM trees are identical? Maybe it is obvious (at first glance it looks like it...) and we could say that the value space consists of (HTML5) DOM trees.

Which leads to another issue: *if* we define HTML5 that way, ie, relying on the identity of DOM Trees, maybe it is worth re-thinking the XML Literal case and use the same mechanism. Just for the sake of consistency....

Just some food for thoughts...


On May 1, 2012, at 18:41 , Gavin Carothers wrote:

> On Tue, May 1, 2012 at 6:46 AM, Richard Cyganiak <> wrote:
>> All,
>> The 2004 WG worked under the assumption that the future of HTML was XHTML, and that the use case of shipping HTML markup fragments as RDF payloads would be addressed by rdf:XMLLiteral. But in 2012, shipping HTML fragments really means HTML5. Is rdf:XMLLiteral still adequate for this task? Is a new datatype with a lexical space consisting of HTML5 fragments needed? This question is ISSUE-63.
>> I think it would be useful to have a straw poll sometime soon on this question:
>> PROPOSAL: RDF-WG will work on an HTML datatype that would be defined in RDF Concepts.
> +1, and for internationalization should be a required datatype, might
> also have a simple syntax in Turtle (though would likely require a new
> last call but a Web formating that doesn't understand HTML doesn't
> seem like much of a web format)
>> If there is general support for this, then we could start work on the details of the datatype definition (lexical space, value space, L2V mapping and so on).
>> All the best,
>> Richard

Ivan Herman, W3C Semantic Web Activity Lead
mobile: +31-641044153

Received on Wednesday, 2 May 2012 13:19:52 UTC