W3C home > Mailing lists > Public > public-rdf-wg@w3.org > May 2012

Re: Adding a datatype for HTML literals to RDF (ISSUE-63)

From: Ivan Herman <ivan@w3.org>
Date: Wed, 2 May 2012 16:35:51 +0200
Cc: Gavin Carothers <gavin@carothers.name>, RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <72009FF1-A4FB-47C2-94F9-E87DB3538BC9@w3.org>
To: Richard Cyganiak <richard@cyganiak.de>

On May 2, 2012, at 16:26 , Richard Cyganiak wrote:

> Hi Ivan,
> 
> On 2 May 2012, at 14:22, Ivan Herman wrote:
>> For XML Literals, we have a nice definition (proposal) that the value space consists of XML Infosets, and that means we can say whether two literals should be considered identical.
>> 
>> What is the equivalent notion in HTML5? I have had a very short chat with Mike Smith (staff contact at the HTML5 WG) and he has not seen any formal definition on when would two HTML5 fragment be considered as identical. Any bright ideas here?
>> 
>> The HTML5 spec goes in great detail on how an HTML5 document/fragment should be parsed into a DOM. That even handles cases when the HTML5 source is invalid. So... is there a formal definition on when two DOM trees are identical? Maybe it is obvious (at first glance it looks like it...) and we could say that the value space consists of (HTML5) DOM trees.
>> 
>> Which leads to another issue: *if* we define HTML5 that way, ie, relying on the identity of DOM Trees, maybe it is worth re-thinking the XML Literal case and use the same mechanism. Just for the sake of consistency....
> 
> HTML5 also defines the XHTML5 syntax,

yes

> and the spec for this includes an algorithm for serializing HTML DOMs to XHTML fragments or XHTML documents.

Is this formally defined in the HTML5 document? I was looking for it, but I may have missed it.

> 
> And I guess in theory, DOMs and XML Infosets should be isomorphic, no?

In theory:-) To be checked. There may be corner cases.

> 
> Between all these transformations, there should be something that works for us. The devil is in the details of course.

Exactly...

> 
> Or we could just avoid all of that trouble and simply define the value space of the HTML datatype as identical to the lexical space.

And then we are back to the same issue as we had with XML Literals. Except that... there is no such thing as a formal canonical HTML5

Ivan

> 
> Best,
> Richard
> 
> 
>> 
>> Just some food for thoughts...
>> 
>> Ivan
>> 
>> 
>> On May 1, 2012, at 18:41 , Gavin Carothers wrote:
>> 
>>> On Tue, May 1, 2012 at 6:46 AM, Richard Cyganiak <richard@cyganiak.de> wrote:
>>>> All,
>>>> 
>>>> The 2004 WG worked under the assumption that the future of HTML was XHTML, and that the use case of shipping HTML markup fragments as RDF payloads would be addressed by rdf:XMLLiteral. But in 2012, shipping HTML fragments really means HTML5. Is rdf:XMLLiteral still adequate for this task? Is a new datatype with a lexical space consisting of HTML5 fragments needed? This question is ISSUE-63.
>>>> 
>>>> I think it would be useful to have a straw poll sometime soon on this question:
>>>> 
>>>> PROPOSAL: RDF-WG will work on an HTML datatype that would be defined in RDF Concepts.
>>> 
>>> +1, and for internationalization should be a required datatype, might
>>> also have a simple syntax in Turtle (though would likely require a new
>>> last call but a Web formating that doesn't understand HTML doesn't
>>> seem like much of a web format)
>>> 
>>>> 
>>>> If there is general support for this, then we could start work on the details of the datatype definition (lexical space, value space, L2V mapping and so on).
>>>> 
>>>> All the best,
>>>> Richard
>>> 
>> 
>> 
>> ----
>> Ivan Herman, W3C Semantic Web Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> FOAF: http://www.ivan-herman.net/foaf.rdf
>> 
>> 
>> 
>> 
>> 
>> 
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Wednesday, 2 May 2012 14:33:18 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:25:48 GMT