Re: Add-on to the XML Literal discussion from Ivan Herman on 2011-11-23 (public-rdf-wg@w3.org from November 2011)

From: Ivan Herman <ivan@w3.org>
Date: Wed, 23 Nov 2011 17:03:15 +0100
To: Richard Cyganiak <richard@cyganiak.de>
Cc: W3C RDF WG <public-rdf-wg@w3.org>
Message-Id: <EC794663-969C-4D88-B66B-A756A3DA6C35@w3.org>
On Nov 23, 2011, at 16:05 , Richard Cyganiak wrote:

> On 23 Nov 2011, at 13:30, Ivan Herman wrote:
>> I remembered the HTML Datatype issue, thanks for digging out the number. But I guess whatever we do with XML Literals affect that one, too, after all they are related in practice. Actually, there is no canonicalization algorithm that I know of for HTML, so that may be something to remember...
> 
> The closest is perhaps infoset coercion:
> http://www.whatwg.org/specs/web-apps/current-work/#coercing-an-html-dom-into-an-infoset
> 
> I think that XML infosets are 1:1 isomorphic to XML-C14N strings?
> 
>> But I was also saying is that, in practice, XML Literals may cover the use cases for HTML literals because HTML Parsers build a DOM tree that can then be used for an XML Literal.
> 
> But the syntax rules of HTML are very different from the syntax rules of XML. Many strings that are not well-formed XML are perfectly fine HTML. I don't see how building a DOM tree, which I assume is in value space, helps here.
> 
> I guess you could require the parser to already build the DOM tree from the HTML and store it as an rdf:XMLLiteral in the graph, but then you'd require RDF parsers to ship with HTML parsers, which I think is even less desirable than the current XML parser requirement.
> 

Yes, that is correct, you are right.

So, _if_ we want to go down that route, we may have to have a separate HTML Literal datatype (I already hear people shouting at me NOOOO:-). But if we play with this idea, surely some sort of a symmetry between XML Literals and HTML Literals would be good; that would effectively kill the approach of using C14N in the specification (which I would be happy with), leaving with either no check and comparison on values at all, or using the infoset approach which would still be a valid way of comparing things (if an RDF environment implements a comparison, that is).

Cheers

Ivan


> Best,
> Richard
> 
> 
> 
>> But the only way to use these is not to involve canonicalization...
>> 
>> Ivan
>> 
>> ----
>> Ivan Herman
>> Tel:+31 641044153
>> http://www.ivan-herman.net
>> 
>> 
>> 
>> On 23 Nov 2011, at 13:52, Richard Cyganiak <richard@cyganiak.de> wrote:
>> 
>>> Ivan,
>>> 
>>> We have a separate issue for considering an HTML datatype:
>>> http://www.w3.org/2011/rdf-wg/track/issues/63
>>> 
>>> I would think that rdf:XMLLiteral isn't appropriate for that.
>>> 
>>> Best,
>>> Richard
>>> 
>>> 
>>> On 23 Nov 2011, at 09:36, Ivan Herman wrote:
>>> 
>>>> This is mostly a FYI. There is currently a discussion on the HTML Data Task force (looking at schema.org, microdata, RDFa, that sort of things) where the necessity of having some feature to store structured (HTML) data came up. See, for example, Jeni's mail:
>>>> 
>>>> http://lists.w3.org/Archives/Public/public-html-data-tf/2011Nov/0162.html
>>>> 
>>>> referring to 
>>>> 
>>>> http://www.w3.org/wiki/HTML_Data_Improvements#Structured_Values
>>>> 
>>>> The bottom line is that there seem to be a need to store structured content in an (RDF) output, too.
>>>> 
>>>> In some sense, however, this may just muddle the waters here, because we are talking about HTML(5) structured data, which is SGML but not XML. In other words, the current XML Literal would not cover that use case properly.
>>>> 
>>>> (Well... there is a caveat to that. Current HTML5 parsers accept non-XML data but, afaik, they create a DOM tree. Taking the serialized output of a subtree in that DOM tree would produce an XML Literal after all, which is not textually identical to the original text, but is identical in the, say, infoset sense. Such mechanism is highly relevant to HTML5+RDFa or to a microdata->RDF conversion result. But that may mean that XML Literals may be o.k. after all.)
>>>> 
>>>> We certainly have a use case here which is definitely not related to RDF/XML. (Ie, I would propose to forget about the RDF/XML motivation in this discussion. It is not the relevant factor in my view.)
>>>> 
>>>> Ivan
>>>> 
>>>> ----
>>>> Ivan Herman, W3C Semantic Web Activity Lead
>>>> Home: http://www.w3.org/People/Ivan/
>>>> mobile: +31-641044153
>>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Wednesday, 23 November 2011 16:00:21 UTC