W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > July to September 2009

Re: Exact format for XML Literals?

From: Ivan Herman <ivan@w3.org>
Date: Mon, 14 Sep 2009 06:58:52 +0200
Message-ID: <4AADCD8C.4020100@w3.org>
To: "Seaborne, Andy" <andy.seaborne@hp.com>
CC: Axel Polleres <axel.polleres@deri.org>, W3C SPARQL Working Group <public-rdf-dawg@w3.org>
Andy,

Here is a concrete example. Say our data is:

<rdf:RDF xmlnsrdf="..." xmlns:ex="...">
<rdf:Description rdf:about="">
    <ex:p rdf:parseType="Literal">
       <ex:bla1   a="something" q="and" b="something else"    />
    </ex:p>
</rdf:Description>
</rdf:RDF>

My question is: what is the result of

PREFIX ex: <...>
ASK WHERE {
    ?a ex:p
      "<ex:bla1 q="and"
          b="something else"     a="something"/>^^rdf:Literal .
}

My feeling is that the answer should be 'true', regardless of the fact 
that the two literals are different in the order of the attributes and 
the usage of white spaces. The RDF/XML spec explicitly says that, in the 
case above, the XML part is transformed into the 'correct' lexical form 
when creating the abstract RDF triple (which is defined in the term of 
canonicalized XML). Does the SPARQL spec says the same?

Note that this is _not_ the case as if we replaced the two literals 
with, say, 1.0 and 1.00 declaring both to be floats. The way XML Literal 
is currently defined is such that the lexical form (not the value 
space!) is the canonical XML version. Ie, by referring to the fact that 
the comparison of literal should be done in the value space does not 
cover the XML Literal case.

Cheers

Ivan

Seaborne, Andy wrote:
> Ivan,
> 
> What is the use case from RDFa?  Can we have a concrete example to discuss?
> In particular, why is the literal given not already canonicalized when forming the query?
> 
> SPARQL already allows bad lexical forms ("hello"^^xsd:decimal) - the definition of the datatype says something and the data is wrong with respect to that in the same way as with XMLLiteral.
> 
> 
> There are other ways to consider such as provide an explicit operation to produce a canonical form:
> 
> { ?s ?p ?o .
>   FILTER (?o = XC14N("bla   b='something' a='else'>and else</bla>"^^rdf:XMLLiteral)
> }
> 
> At the moment, a SPARQL engine is not required to have special understanding of XML-Literals in FILTERs.  We could document what XMLLiteral casting means and that it includes canonicalization (or be a warning/error - more consistent - in which case have a "canonical" function).
>  
> { ?s ?p ?o .
>   FILTER (?o = rdf:XMLLiteral("bla   b='something' a='else'>and else</bla>")
> }
> 
> (defintion of XMLLiteral)
>>>> [[[
>>>> The lexical space is the set of all strings:
>>>> - which are well-balanced, self-contained XML content [XML];
>>>> - for which encoding as UTF-8 [RFC 2279] yields exclusive Canonical XML
>>>> [...][XML-XC14N]
>>>> - for which embedding between an arbitrary XML start tag and an end tag
>>>> yields a document conforming to XML Namespaces [XML-NS]
>>>> ]]]
> 
> The definition defines the lexical space as a set of strings which are UTF-8 encoded canonical forms and says nothing outside that.  It does not say canonicalization must be applied to produce a legal lexical form from otherwise illegal forms.
> 
> This seems the same to me as the way XSD primitive datatypes are defined [3] e.g.
> 
> [[[
> 3.2.3.1 Lexical representation
> 
> decimal has a lexical representation consisting of a finite-length sequence of decimal digits (#x30-#x39) separated by a period as a decimal indicator. An optional leading sign is allowed. If the sign is omitted, "+" is assumed. Leading and trailing zeroes are optional. If the fractional part is zero, the period and following zero(es) can be omitted. For example: -1.23, 12678967.543233, +100000.00, 210.
> ]]]
> 
>>>> Note that the RDF/XML specification goes a little bit further: in point
>>>> 7.2.17 of the RDF/XML spec[2] it explicitly
>>>>
>>>> [[[
>>>> l is transformed into the lexical form of an XML literal in the RDF graph
>>>> ]]]
>>>>
>>>> and refers to the XC14N algorithm explicitly. Ie, the XML extract above
>>>> is perfectly valid for RDF/XML. However, the current SPARQL spec is
>>>> silent about this.
> 
> This text in the RDF/XML Syntax Specification and applies to RDF/XML syntax and to parsing RDF/XML.
> It makes sense to me in the context of XML processing because in XML there are external (in the character string being processed) factors like namespace and language which nest in the whole document.  SPARQL isn't in the same situation.
> 
> 	Andy
> 
> [3] http://www.w3.org/TR/xmlschema-2/#decimal
> 
>> -----Original Message-----
>> From: public-rdf-dawg-request@w3.org [mailto:public-rdf-dawg-request@w3.org]
>> On Behalf Of Ivan Herman
>> Sent: 09 September 2009 11:52
>> To: Axel Polleres
>> Cc: W3C SPARQL Working Group
>> Subject: Re: Exact format for XML Literals?
>>
>> Axel, that quote is in the RDF Concept standard[1], the SPARQL group
>> will not change that...
>>
>> What I think we ought to do is to put something like the RDF/XML spec
>> says, ie, that the literal in the graph pattern is 'transformed' into an
>> RDF XML Literal.
>>
>> Ivan
>>
>>
>>
>> [1] http://www.w3.org/TR/rdf-concepts
>>
>> Axel Polleres wrote:
>>> I guess just dropping
>>> "
>>>> - for which encoding as UTF-8 [RFC 2279] yields exclusive Canonical XML
>>>> [...][XML-XC14N]
>>> "
>>> is not sufficient?
>>>
>>> I.e. aren't the first and third item enough?
>>> What do I miss here?
>>>
>>> Thanks,
>>> Axel
>>>
>>> On 8 Sep 2009, at 08:24, Ivan Herman wrote:
>>>
>>>> Guys,
>>>>
>>>> an issue came up in the RDFa task force that has relevance on the SPARQL
>>>> syntax. It may be that this will lead to a need to tighten up the SPARQL
>>>> language specification's language (no new feature here). It is related
>>>> to the way XML Literals are represented in the query language (well,
>>>> essentially, in Turtle...). The question is whether the following
>>>> extract is valid or not:
>>>>
>>>> a:bla b:blabla
>>>>  "<bla   b='something' a='else'>and else</bla>"^^rdf:XMLLiteral.
>>>>
>>>> The lexical space of XML Literal is defined by the RDF concept document
>>>> and it says:
>>>>
>>>> [[[
>>>> The lexical space is the set of all strings:
>>>> - which are well-balanced, self-contained XML content [XML];
>>>> - for which encoding as UTF-8 [RFC 2279] yields exclusive Canonical XML
>>>> [...][XML-XC14N]
>>>> - for which embedding between an arbitrary XML start tag and an end tag
>>>> yields a document conforming to XML Namespaces [XML-NS]
>>>> ]]]
>>>>
>>>> the important point is the usage of XC14N. A cursory read of this text
>>>> would mean that, in SPARQL, one would have to write a canonical XML for
>>>> an XML Literal (which is not the case in the case above).
>>>>
>>>> Note that the RDF/XML specification goes a little bit further: in point
>>>> 7.2.17 of the RDF/XML spec[2] it explicitly
>>>>
>>>> [[[
>>>> l is transformed into the lexical form of an XML literal in the RDF graph
>>>> ]]]
>>>>
>>>> and refers to the XC14N algorithm explicitly. Ie, the XML extract above
>>>> is perfectly valid for RDF/XML. However, the current SPARQL spec is
>>>> silent about this.
>>>>
>>>> It is fairly obvious that the same should happen in SPARQL (and in
>>>> Turtle): the parser should, conceptually, apply a canonicalization
>>>> algorithm on the XML content in the literal. But it may be better to say
>>>> that explicitly in the document, similarly to RDF/XML...
>>>>
>>>> Do I miss something?
>>>>
>>>> Ivan
>>>>
>>>> [1] http://www.w3.org/TR/rdf-concepts/#section-XMLLiteral
>>>> [2] http://www.w3.org/TR/rdf-syntax-grammar/#section-grammar-productions
>>>>
>>>> --
>>>>
>>>> Ivan Herman, W3C Semantic Web Activity Lead
>>>> Home: http://www.w3.org/People/Ivan/
>>>> mobile: +31-641044153
>>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>> --
>>> Dr. Axel Polleres
>>> Digital Enterprise Research Institute, National University of Ireland,
>>> Galway
>>> email: axel.polleres@deri.org <mailto:axel.polleres@deri.org>  url:
>>> http://www.polleres.net/
>>>
>>>
>>>
>> --
>>
>> Ivan Herman, W3C Semantic Web Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>> FOAF: http://www.ivan-herman.net/foaf.rdf

-- 

Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Monday, 14 September 2009 04:56:15 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:08:26 GMT