Re: Add-on to the XML Literal discussion from Richard Cyganiak on 2011-11-23 (public-rdf-wg@w3.org from November 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Wed, 23 Nov 2011 15:05:49 +0000
To: Ivan Herman <ivan@w3.org>
Cc: W3C RDF WG <public-rdf-wg@w3.org>
Message-Id: <A1D923FB-68D2-49A5-86BF-C31B9E94B34B@cyganiak.de>
On 23 Nov 2011, at 13:30, Ivan Herman wrote:
> I remembered the HTML Datatype issue, thanks for digging out the number. But I guess whatever we do with XML Literals affect that one, too, after all they are related in practice. Actually, there is no canonicalization algorithm that I know of for HTML, so that may be something to remember...

The closest is perhaps infoset coercion:
http://www.whatwg.org/specs/web-apps/current-work/#coercing-an-html-dom-into-an-infoset

I think that XML infosets are 1:1 isomorphic to XML-C14N strings?

> But I was also saying is that, in practice, XML Literals may cover the use cases for HTML literals because HTML Parsers build a DOM tree that can then be used for an XML Literal.

But the syntax rules of HTML are very different from the syntax rules of XML. Many strings that are not well-formed XML are perfectly fine HTML. I don't see how building a DOM tree, which I assume is in value space, helps here.

I guess you could require the parser to already build the DOM tree from the HTML and store it as an rdf:XMLLiteral in the graph, but then you'd require RDF parsers to ship with HTML parsers, which I think is even less desirable than the current XML parser requirement.

Best,
Richard



> But the only way to use these is not to involve canonicalization...
> 
> Ivan
> 
> ----
> Ivan Herman
> Tel:+31 641044153
> http://www.ivan-herman.net
> 
> 
> 
> On 23 Nov 2011, at 13:52, Richard Cyganiak <richard@cyganiak.de> wrote:
> 
>> Ivan,
>> 
>> We have a separate issue for considering an HTML datatype:
>> http://www.w3.org/2011/rdf-wg/track/issues/63
>> 
>> I would think that rdf:XMLLiteral isn't appropriate for that.
>> 
>> Best,
>> Richard
>> 
>> 
>> On 23 Nov 2011, at 09:36, Ivan Herman wrote:
>> 
>>> This is mostly a FYI. There is currently a discussion on the HTML Data Task force (looking at schema.org, microdata, RDFa, that sort of things) where the necessity of having some feature to store structured (HTML) data came up. See, for example, Jeni's mail:
>>> 
>>> http://lists.w3.org/Archives/Public/public-html-data-tf/2011Nov/0162.html
>>> 
>>> referring to 
>>> 
>>> http://www.w3.org/wiki/HTML_Data_Improvements#Structured_Values
>>> 
>>> The bottom line is that there seem to be a need to store structured content in an (RDF) output, too.
>>> 
>>> In some sense, however, this may just muddle the waters here, because we are talking about HTML(5) structured data, which is SGML but not XML. In other words, the current XML Literal would not cover that use case properly.
>>> 
>>> (Well... there is a caveat to that. Current HTML5 parsers accept non-XML data but, afaik, they create a DOM tree. Taking the serialized output of a subtree in that DOM tree would produce an XML Literal after all, which is not textually identical to the original text, but is identical in the, say, infoset sense. Such mechanism is highly relevant to HTML5+RDFa or to a microdata->RDF conversion result. But that may mean that XML Literals may be o.k. after all.)
>>> 
>>> We certainly have a use case here which is definitely not related to RDF/XML. (Ie, I would propose to forget about the RDF/XML motivation in this discussion. It is not the relevant factor in my view.)
>>> 
>>> Ivan
>>> 
>>> ----
>>> Ivan Herman, W3C Semantic Web Activity Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>
Received on Wednesday, 23 November 2011 15:06:43 UTC