Re: Adding a datatype for HTML literals to RDF (ISSUE-63) from Ivan Herman on 2012-05-09 (public-rdf-wg@w3.org from May 2012)

From: Ivan Herman <ivan@w3.org>
Date: Wed, 9 May 2012 13:39:44 +0200
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: Steve Harris <steve.harris@garlik.com>, Richard Cyganiak <richard@cyganiak.de>, public-rdf-wg@w3.org
Message-Id: <79F9CBE3-1222-41BB-859D-AFD5E2929261@w3.org>

On May 9, 2012, at 13:22 , Andy Seaborne wrote:

> 
> 
> On 09/05/12 11:54, Ivan Herman wrote:
>> As Richard emphasized in his mail, XML Literal and, if approved,
>> HTML5 Literals are optional. If implementation do not want to
>> implement equality checking on these literals, that is fine. However,
>> if they _do_ want to do that, than we should define what equality
>> means. That is where the value space issue comes into the picture.
>> 
>> I think that real issue we have to solve, however, is to keep the
>> lexical space as unconstrained as possible. The current XML Literal
>> definition seemed to be very ambiguous in this respect and it was
>> never 100% clear whether an RDF file in, say, Turtle, should include
>> a canonical XML for the literal or not.
> 
> My reading of the current state:
> 
> The lexical space of rdf:XMLLiterals is exclusive Canonical XML (RDF concepts).
> 
> The requirement to perform canonicalization is in the RDF/XML syntax doc and not elsewhere.  That does apply to Turtle or N-Triples.
> 
> A non-canonical literal for ^^rdf:XMLLiteral is an illegal literal.
> Like writing "foo"^^xsd:integer.

I actually agree. And this is practically impossible to do in many cases. If I write an RDF file by hand, I have to know the canonicalization rules from the top of my head which I may/do not. (Eg, I *think* that <...attr='val'...> is incorrect, it should be <...attr="val"...>). If I generate the literal in some process, unless that process have an access to the XML canonicalization algorithm then either I have to write it by hand or I ignore the issue.

Bottom line: the current definition of XML Literal is just way too demanding and I would expect that a large percentage of RDF files using XML Literals are, technically, invalid.

The main advantage of the proposal we now have is to remove this issue. As long as it is clear that <...attr='val'...> and <...attr="val"...> identical in terms of infosets (or DOM trees), they are both valid in RDF and smarter implementations can establish their equality. I think that is a real plus to the community.

> 
>> This led to long discussions
>> among, eg, RDFa implementers at a time on what _exactly_ should be
>> generated by an RDFa processor. If we say that the lexical space is
>> very lax, and we have a clear definition of equality (whether we
>> define equality via Infosets or DOM tree equality is a detail in this
>> respect) then the situation becomes clear and, because these
>> datatypes are not required, there is no issue with conformance
>> either.
> 
> What did RDFa decide?

Well... the RDFa document itself is silent (as it should be) because it refers back to the RDF documents. The issue was more on the testing of RDF processors which proved to be a mess for the reasons above and we decided to be fairly lax on that particular test...

Ivan

> 
> 	Andy
> 
>> 
>> Ivan
> 

----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Wednesday, 9 May 2012 11:36:59 UTC