Re: Proposal for ISSUE-12, string literals from Ivan Herman on 2011-05-12 (public-rdf-wg@w3.org from May 2011)

From: Ivan Herman <ivan@w3.org>
Date: Thu, 12 May 2011 10:22:05 +0200
To: Richard Cyganiak <richard@cyganiak.de>
Cc: RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <CEF28F81-16B7-4CE8-AB7D-95657F337D22@w3.org>
Richard,

I personally do not see any problems with the proposal as is, everything that I say is *in addition* to your changes. And I know that these go a little bit beyond the core ISSUE-12 on xsd:string and friends, but I think these are necessary for a proper closure.

- You make the remark on the wiki page on 'extending this to numeric literals', which I would rather say 'extending this to any datatype' (eg, xsd:dateTime, too). I have the impression that this is also a consequence of what you write already. You emphasize the 'lexical equality', and you also say "Implementations MAY replace any literal with a canonical form if both are syntactically different, but have the same value." which does not look like being bound to string literals. Do you think there is anything missing in this document to make that picture complete (except, editorially, to possibly add non-string examples)?

- I would also propose to make some tiny changes in the Semantics document. At the moment, the document defines what D-interpretation (and, hence, D-entailment) means which is the semantic equivalent of your 'MAY' in the sentence. However, in 5.1, the document says "If D is a datatype map, a D-interpretation of a vocabulary V is any rdfs-interpretation...". In other words, if a reasoner implements the various interpretations and entailments defined in the standard, it has to build the D-interpretatin on top of RDFS interpretation, and a D-entailment should also include a full RDFS-entailment. This is probably unnecessary and extra load on implementation (if they want to follow the standard by the letter).

My proposal would be simply to say, instead, "If D is a datatype map, a D-interpretation of a vocabulary V is any rdf-interpretation..." (ie, rdfs->rdf). I believe building the D interpretation on top of an rdf-interpretation is right, because that is the layer that defines the interpretation of XML Literals that we all love:-). It is not ideal, because RDF interpretations also include other things that have nothing to do with literals (what is a property...) and the corresponding axiomatic triples also include the potentially infinite container membership properties, but making this change would still be the smallest possible change.

Alternatively, we may even refer to simple interpretations only for the datatype entailments, ie, say rdfs->simple in the original sentence; that means a D-entailment, by default, does not have any additional information on XML Literals but that might be all right...

- Again in the semantics document: isn't it necessary, in view of what is said below, to add entailment rules denoting the plainLiteral thing? In effect, translating what you just describe in concepts, into entailment rules (in the table right before Appendix A):

uuu aaa "sss".       => uuu aaa "sss@"^^rdf:PlainLiteral .
uuu aaa "sss"@ln .   => uuu aaa "sss@ln"^^rdf:PlainLiteral .

(as we know, the entailment rules are really the tool for implementers, so they should be complete...)

Ivan


On May 11, 2011, at 23:23 , Richard Cyganiak wrote:

> I took an action today to draft text for RDF Concepts that resolves ISSUE-12. I put it on the wiki here:
> http://www.w3.org/2011/rdf-wg/wiki/StringLiterals/EntailmentProposal
> A plain text copy is attached below.
> 
> Best,
> Richard
> 
> 
> 
> SHORT SUMMARY
> 
> 1. RDF Concepts puts more emphasis on the distinction between (syntactic) “literal equality” and (semantic, important for applications) “value equality”
> 2. RDF Concepts explicitly points out the specific string value equalities that already arise from RDF Semantics
> 3. RDF Concepts declares one of the string literal forms as canonical
> 4. Implementations MAY canonicalize, but don't have to
> 5. The canonical form is plain literals.
> 
> 
> WHY?
> 
> 1. No changes to the abstract syntax required
> 2. No changes to any concrete syntax or parser required
> 3. No changes to any implementations of any of the existing entailment regimes required
> 4. Those who are ok with canonicalization can do that, and don't need to deal with entailment
> 5. Those who don't want to canonicalize, have the option of supporting only string value equality at query time, without RDFS- and D-Entailment
> 6. “MAY canonicalize” softly discourages the use of xsd:string typed literals, without abolishing them outright or declaring them archaic
> 7. Standardizing on xsd:string was never an option because of language tags
> 8. Standardizing on rdf:PlainLiteral was never an option because it MUST NOT be used in serializations that support plain literals
> 
> 
> CHANGES TO 6.5.2 The Value Corresponding to a Typed Literal
> http://www.w3.org/TR/rdf-concepts/#section-Literal-Value
> 
> 
> §1 Rename it to “6.5.1 The Value Corresponding to a Literal” and move it ahead of 6.5.1
> 
> §2 Add to the beginning:
> “The value of a plain literal without language tag is the same Unicode string as its lexical form.
> 
> The value of a plain literal with language tag is a pair consisting of 1. the same Unicode string as its lexical form, and 2. its language tag.
> 
> For typed literals, …” (continue with rest of section as is)
> 
> §3 Remove the Note at the end of the section
> 
> 
> CHANGES TO 6.5.1 Literal Equality
> http://www.w3.org/TR/rdf-concepts/#section-Literal-Equality
> 
> 
> §4 Rename section to “6.5.2 Literal Equality and Canonical Forms”
> 
> §5 Add to the beginning:
> “Equality of literals can be evaluated based on their syntax, or based on their value.”
> 
> §6 Change “Two literals are equal …” to: “Two literals are syntactically equal …” in the current first paragraph.
> 
> §7 Add to the end:
> “In application contexts, comparing the values of literals (see section 6.5.1) is usually more helpful than comparing their syntactic forms. Literals with different lexical forms and with different datatypes can have the same value. In particular:
> 
> - A plain literal with lexical form aaa and no language tag has the same value as a typed literal with lexical form aaa and datatype IRI xsd:string
> - A plain literal with lexical form aaa and no language tag has the same value as a typed literal with lexical form aaa@ and datatype IRI rdf:PlainLiteral
> - A plain literal with lexical form aaa and language tag xx has the same value as a typed literal with lexical form aaa@xx and datatype IRI rdf:PlainLiteral”
> 
> §8 “Some literals are canonical forms. Implementations MAY replace any literal with a canonical form if both are syntactically different, but have the same value. All plain literals, with or without language tag, are canonical forms.”
> 
> 
> CHANGES TO 6.3 Graph Equivalence
> http://www.w3.org/TR/rdf-concepts/#section-graph-equality
> 
> 
> §9 Append this leftover sentence, which was removed from 6.5.1:
> “Note: For comparing RDF Graphs, semantic notions of entailment (see [RDF-SEMANTICS]) are usually more helpful than the syntactic equivalence defined here.”
> 
> 
> EXTENDING THIS TO NUMERIC LITERALS???
> 
> (While we're at it, we might also cover equalities between the built-in numeric XSD types, and between different lexical forms of the same built-in XSD datatype.)


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Thursday, 12 May 2011 08:22:54 UTC