RE: Exact format for XML Literals?

Ivan,

What is the use case from RDFa?  Can we have a concrete example to discuss?
In particular, why is the literal given not already canonicalized when forming the query?

SPARQL already allows bad lexical forms ("hello"^^xsd:decimal) - the definition of the datatype says something and the data is wrong with respect to that in the same way as with XMLLiteral.


There are other ways to consider such as provide an explicit operation to produce a canonical form:

{ ?s ?p ?o .
  FILTER (?o = XC14N("bla   b='something' a='else'>and else</bla>"^^rdf:XMLLiteral)
}

At the moment, a SPARQL engine is not required to have special understanding of XML-Literals in FILTERs.  We could document what XMLLiteral casting means and that it includes canonicalization (or be a warning/error - more consistent - in which case have a "canonical" function).
 
{ ?s ?p ?o .
  FILTER (?o = rdf:XMLLiteral("bla   b='something' a='else'>and else</bla>")
}

(defintion of XMLLiteral)
> >> [[[
> >> The lexical space is the set of all strings:
> >> - which are well-balanced, self-contained XML content [XML];
> >> - for which encoding as UTF-8 [RFC 2279] yields exclusive Canonical XML
> >> [...][XML-XC14N]
> >> - for which embedding between an arbitrary XML start tag and an end tag
> >> yields a document conforming to XML Namespaces [XML-NS]
> >> ]]]

The definition defines the lexical space as a set of strings which are UTF-8 encoded canonical forms and says nothing outside that.  It does not say canonicalization must be applied to produce a legal lexical form from otherwise illegal forms.

This seems the same to me as the way XSD primitive datatypes are defined [3] e.g.

[[[
3.2.3.1 Lexical representation

decimal has a lexical representation consisting of a finite-length sequence of decimal digits (#x30-#x39) separated by a period as a decimal indicator. An optional leading sign is allowed. If the sign is omitted, "+" is assumed. Leading and trailing zeroes are optional. If the fractional part is zero, the period and following zero(es) can be omitted. For example: -1.23, 12678967.543233, +100000.00, 210.
]]]

> >> Note that the RDF/XML specification goes a little bit further: in point
> >> 7.2.17 of the RDF/XML spec[2] it explicitly
> >>
> >> [[[
> >> l is transformed into the lexical form of an XML literal in the RDF graph
> >> ]]]
> >>
> >> and refers to the XC14N algorithm explicitly. Ie, the XML extract above
> >> is perfectly valid for RDF/XML. However, the current SPARQL spec is
> >> silent about this.

This text in the RDF/XML Syntax Specification and applies to RDF/XML syntax and to parsing RDF/XML.
It makes sense to me in the context of XML processing because in XML there are external (in the character string being processed) factors like namespace and language which nest in the whole document.  SPARQL isn't in the same situation.

 Andy

[3] http://www.w3.org/TR/xmlschema-2/#decimal


> -----Original Message-----
> From: public-rdf-dawg-request@w3.org [mailto:public-rdf-dawg-request@w3.org]
> On Behalf Of Ivan Herman
> Sent: 09 September 2009 11:52
> To: Axel Polleres
> Cc: W3C SPARQL Working Group
> Subject: Re: Exact format for XML Literals?
> 
> Axel, that quote is in the RDF Concept standard[1], the SPARQL group
> will not change that...
> 
> What I think we ought to do is to put something like the RDF/XML spec
> says, ie, that the literal in the graph pattern is 'transformed' into an
> RDF XML Literal.
> 
> Ivan
> 
> 
> 
> [1] http://www.w3.org/TR/rdf-concepts

> 
> Axel Polleres wrote:
> > I guess just dropping
> > "
> >> - for which encoding as UTF-8 [RFC 2279] yields exclusive Canonical XML
> >> [...][XML-XC14N]
> > "
> > is not sufficient?
> >
> > I.e. aren't the first and third item enough?
> > What do I miss here?
> >
> > Thanks,
> > Axel
> >
> > On 8 Sep 2009, at 08:24, Ivan Herman wrote:
> >
> >> Guys,
> >>
> >> an issue came up in the RDFa task force that has relevance on the SPARQL
> >> syntax. It may be that this will lead to a need to tighten up the SPARQL
> >> language specification's language (no new feature here). It is related
> >> to the way XML Literals are represented in the query language (well,
> >> essentially, in Turtle...). The question is whether the following
> >> extract is valid or not:
> >>
> >> a:bla b:blabla
> >>  "<bla   b='something' a='else'>and else</bla>"^^rdf:XMLLiteral.
> >>
> >> The lexical space of XML Literal is defined by the RDF concept document
> >> and it says:
> >>
> >> [[[
> >> The lexical space is the set of all strings:
> >> - which are well-balanced, self-contained XML content [XML];
> >> - for which encoding as UTF-8 [RFC 2279] yields exclusive Canonical XML
> >> [...][XML-XC14N]
> >> - for which embedding between an arbitrary XML start tag and an end tag
> >> yields a document conforming to XML Namespaces [XML-NS]
> >> ]]]
> >>
> >> the important point is the usage of XC14N. A cursory read of this text
> >> would mean that, in SPARQL, one would have to write a canonical XML for
> >> an XML Literal (which is not the case in the case above).
> >>
> >> Note that the RDF/XML specification goes a little bit further: in point
> >> 7.2.17 of the RDF/XML spec[2] it explicitly
> >>
> >> [[[
> >> l is transformed into the lexical form of an XML literal in the RDF graph
> >> ]]]
> >>
> >> and refers to the XC14N algorithm explicitly. Ie, the XML extract above
> >> is perfectly valid for RDF/XML. However, the current SPARQL spec is
> >> silent about this.
> >>
> >> It is fairly obvious that the same should happen in SPARQL (and in
> >> Turtle): the parser should, conceptually, apply a canonicalization
> >> algorithm on the XML content in the literal. But it may be better to say
> >> that explicitly in the document, similarly to RDF/XML...
> >>
> >> Do I miss something?
> >>
> >> Ivan
> >>
> >> [1] http://www.w3.org/TR/rdf-concepts/#section-XMLLiteral

> >> [2] http://www.w3.org/TR/rdf-syntax-grammar/#section-grammar-productions

> >>
> >> --
> >>
> >> Ivan Herman, W3C Semantic Web Activity Lead
> >> Home: http://www.w3.org/People/Ivan/

> >> mobile: +31-641044153
> >> PGP Key: http://www.ivan-herman.net/pgpkey.html

> >> FOAF: http://www.ivan-herman.net/foaf.rdf

> >
> > --
> > Dr. Axel Polleres
> > Digital Enterprise Research Institute, National University of Ireland,
> > Galway
> > email: axel.polleres@deri.org <mailto:axel.polleres@deri.org>  url:
> > http://www.polleres.net/

> >
> >
> >
> 
> --
> 
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/

> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html

> FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Sunday, 13 September 2009 13:30:47 UTC