Re: XMLLiteral and HTML from Richard Cyganiak on 2013-12-10 (public-rdf-comments@w3.org from December 2013)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Tue, 10 Dec 2013 07:59:19 +0000
To: Richard Light <richard@light.demon.co.uk>
Cc: "public-rdf-comments@w3.org" <public-rdf-comments@w3.org>
Message-Id: <82F58BBF-05A6-4993-8EB6-4855E16A2B7E@cyganiak.de>
Dear Richard,

Thank you for you comment on the RDF 1.1 Concepts document.

You question the purpose of the XMLLiteral and HTML datatypes, and raise concerns about their implementation cost. Let me try to address both questions.

The purpose of both datatypes is to enable text with markup in HTML graphs. The XMLLiteral datatype was added to the original 2004 spec due to i18n requirements (e.g., bidirectional text, mixed-language text, and Ruby markup). This datatype is now widely deployed for a number of use cases, and removing it is realistically no longer possible.

Since XHTML has not seen the adoption that was expected back in the days of the previous WG, the HTML datatype has now been added as a more author-friendly alternative that addresses the same requirements.

The only RDF-WG specification that requires an XML parser for a conforming implementation is RDF/XML. There are no conformance criteria on any of the other documents that require an XML parser or HTML parser.

Implementing, for example, graph equivalence over these datatypes would require such a parser, but no entailment regime requires that these datatypes be recognised. Simpler put, the datatypes are optional. Implementations may elect to not support them, which means they simply treat these datatypes like any other unrecognised datatype: as strings that carry a marker for a certain syntax.

Implementing XMLLiteral in RDF 1.1 is considerably easier than before because the requirement for XML canonicalisation has been removed.

The most natural way to associate HTML or XML resources with an RDF graph is perhaps not what you propose, but something more like this:

 <example.com/mydocument.xml> dc:format "text/xml".

This has been possible since RDF 2004.

Please respond to this message and let us know whether this addresses your concerns.

All the best,
Richard



> On 4 Dec 2013, at 15:39, Richard Light <richard@light.demon.co.uk> wrote:
> 
> Hi,
> 
> Following an interesting exchange about the fate of the RDF API [1] over on public-lod, I have just had a look through the RDF 1.1 Concepts CR document [2] to bring myself up to date on the core RDF standard.
> 
> There it is noted that the rdf:HTML and rdf:XMLLiteral datatypes may     be made non-normative.
> 
> Although (being an XML-head) I was pleased when I discovered some time ago that you can validly dump chunks of XML into an RDF resource, on reading the spec afresh I do wonder what business chunks of XML and HTML have, to be floating around in the innards of an RDF graph.  Wouldn't the RDF model be significantly easier to implement if they were removed?  To support them, you presumably have to bring XML and HTML parsing capabilities into your core RDF engine, together with suitable DOMs to hold the result of parsing.  I'm assuming that no-one is suggesting that the semantic payload of these embedded resources is in any way relevant to the RDF graph.  
> 
> I can see that it can be argued that these are just "special string types", but surely there is an order of magnitude of difference between interpreting a date datatype from its lexical space to its lexical value, and parsing an XML document fragment?
> 
> Surely the natural way to associate HTML and XML resources with an RDF graph is to point to them with a URI?  What you could do, is to invent an RDF mechanism for "linked document type" which is analogous to datatypes for literal values.  Then you could express a node as e.g.:
> 
> <example.com/mydocument.xml>^^<http://www.w3.org/TR/REC-xml/>
> or:
> <example.com/mydocument.xml>^^"text/xml"
> 
> and then have a loose coupling between the RDF engine and the processing (or not) of the linked resource.  By including a linked document type, you also enable the use of content negotiation, so that variant forms can be retrieved from the same URL.  
> 
> This generalized mechanism would mean that other content types (e.g. JSON) could be supported without a further extension to the RDF Concepts recommendation being required.
> 
> Richard
> 
> [1] http://www.w3.org/TR/rdf-api/
> [2] http://www.w3.org/TR/rdf11-concepts/
> 
> -- 
> Richard Light
Received on Tuesday, 10 December 2013 07:59:44 UTC