W3C home > Mailing lists > Public > public-rdf-wg@w3.org > November 2011

Re: XML literals poll

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Mon, 21 Nov 2011 22:37:56 +0000
Message-ID: <4ECAD2C4.9020500@epimorphics.com>
To: public-rdf-wg@w3.org

On 21/11/11 19:32, Richard Cyganiak wrote:
> Below are six questions on XML literals. Please help the WG get a
> feeling for the general opinion within the group by answering the
> questions. Answers in the usual +1/±0/-1 style are appropriate.
> Thanks, Richard

Q0: should XML Literals be optional?


(I'm pulling this out as I see it not as a consequence of Q2 but as a 
design requirement)

> Q1. Should the specs define a way to compare XML literals based on
> value?
> In other words, in the same way that integers 7 and 007 have the
> same  value, should<foo/> and<foo></foo> be defined as having the same value?


A user has worked with GML literals, which some times have 2 or more 
attributes.  The sorting requirement of exclusive-canonicalization was a 
surprise to them and meant that putting output from a geospatial 
database into RDF using ^^rdf:XMLLiteral didn't work.

You could argue that it should not be an XML literal, but it seems more 
reasonable to make it a derived type of XML literal (it is XML after 
all) then the canonicalization rules would apply.

It's a tradeoff.  I favour weaker equality for more usability.  Is 
processed with XML value-quality in XML tool chains?

> Q2. Should the specs say that RDF implementations MUST support
> value-based comparison?
> In other words, assuming the specs define a value space that answers
> Q1 in the affirmative, is it required that all RDF toolkits implement
> some sort of canonicalization somewhere in the process?


It should be possible to have an RDF toolkit that is XML-free.  It can 
content-negotiate for Turtle or N-triples.

> Q3. Should the *lexical* space be in canonical form?
> In other words, should
>    <>  ex:value "<foo/>"^^rdf:XMLLiteral.
>    <>  ex:value "<foo></foo>"^^rdf:XMLLiteral.
> result in a graph with one triple (canonicalize) or two (don't
> canonicalize)? Note that if you answer “two”, then it is unavoidable
> that round-tripping an XML literal, or storing the same XML literal
> in two different formats (say, RDF/XML and Turtle) and reading it
> again, will sometimes result in a different triple (with the same
> value though).


But might can leave as a requirement for RDF/XML because it pulls in the 
outer xml:lang, base and namespaces.

> Q4. Should *invalid XML* be allowed in the lexical space?
> In other words, should "</bar !!!>"^^rdf:XMLLiteral be ill-typed
> (just like "AAA"^^xsd:integer) or well-typed (just like"</bar
> !!!>"^^xsd:string)?

Certainly not encouraged but it's going to happen.

"2011"^^xsd:date is not exactly rare.

> Q5. Should the specs say that RDF/XML parsers MUST canonicalize when
> handling parseType="literal"?
> RDF/XML parsers are often implemented on top of an XML parser, and
> hence they don't have access to a low-level representation of the XML
> literal, e.g., did it use single or double quotes in the attributes,
> what order where the attributes in, or how many spaces were between
> them? If they don't canonicalize, then two different RDF/XML parsers
> would be pretty much guaranteed to parse the same RDF/XML file into
> different triples (or even different runs of the same parser over the
> same file could yield different triples).


Maintain current behavior for RDF/XML.

> Q6. Should it be required that producers of XML literals in concrete
> syntaxes (Turtle, N-Triples, other parseTypes in RDF/XML)
> canonicalize the literals themselves?
> If the lexical space is canonicalized (see Q3), then it means that
> canonicalization either has to be done by parsers (see Q5), or by
> content producers.


> (FWIW, the RDF 2004 design is: Q1: Yes. Q2: Yes. Q3: Yes. Q4: No. Q5:
> Yes. Q6: Yes.)

Advice and suggestion about good use of XML literals is OK.

Received on Monday, 21 November 2011 22:38:37 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:04:10 UTC