W3C home > Mailing lists > Public > public-rdf-wg@w3.org > May 2011

Re: Proposal for ISSUE-12, string literals

From: Pat Hayes <phayes@ihmc.us>
Date: Thu, 12 May 2011 20:40:50 -0500
Cc: RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <04E96898-1521-4F33-8DBC-5058A77F6DA1@ihmc.us>
To: Richard Cyganiak <richard@cyganiak.de>

On May 12, 2011, at 12:06 PM, Richard Cyganiak wrote:

> On 12 May 2011, at 16:52, Pat Hayes wrote:
>> I agree with all of this (though I think we could maybe be harsher on xsd:string) but suggest we should additionally explicitly endorse the idea that plain literals are understood as typed with the datatype rdf:PlainLiteral, so that all RDF literals are considered to have a type. And that this should be stated explicitly in Concepts and Semantics, and built into the RDF entailment regime (along with rdf:XMLLIteral). 
> 
> Can you explain the mechanism that you have in mind when you say "plain literals are understood as typed with the datatype rdf:PlainLiteral"?
> 
> "foo"@en is a plain literal.
> 
> What datatype does it have? None, or rdf:PlainLiteral?

rdf:PlainLIteral.  The idea behind rdf:PlainLIteral, as I understand it, is that *all* RDF literals have a datatype, even plain ones. Otherwise, there really is no point to having it around. 

> 
> What is its lexical form? "foo" or "foo@en"?

"foo@en"

is (unfortunately) the only possible answer. The awkward case, which you didn't ask, is that lexical form of the plain literal "foo" is "foo@". The final '@' signals the lack of a language tag (or, if we prefer, the empty language tag.) 

Put this all another way, the RDF plain literal surface forms "foo" and "foo"@en are treated as sugared syntax for the real underlying forms "foo@"^^rdf:PlainLIteral and "foo@en"^^rdf:PlainLIteral. The semantics treats the former as though they were written like the latter, with the datatype mapping "sss@" --> "sss" and "sss@ttt" --> <"sss", 'ttt'>.

>> I would suggest one more extension, an additional datatype rdf:PlainLIteralString, which is also built into basic RDF. This is similar to PlainLIteral but ignores the language tag, so it treats "foo"+EN as equal to "foo". This would help the users that Andy mentioned who want to ignore language tags in queries. We can build this into the basic RDF entailment regime along with PlainLiteral. 
> 
> I don't think that this helps the users that Andy mentioned.

Andy seems to agree. Well in that case, forget the idea. 

Pat


> The problem is that "foo" != "foo"@en in SPARQL, and this confuses people who have not wrapped their head around the idea that strings in SPARQL can have this extra bit called a language tag attached. Introducing a new string data type doesn't change anything about this situation.
> 
> Best,
> Richard
> 
> 
> 
>> 
>> These two datatypes are unique in that they apply to plain literal syntax, which is a good 'theoretical' reason to include them in the RDF layer of the specs in any case. 
>> 
>> Pat
>> 
>> On May 11, 2011, at 4:23 PM, Richard Cyganiak wrote:
>> 
>>> I took an action today to draft text for RDF Concepts that resolves ISSUE-12. I put it on the wiki here:
>>> http://www.w3.org/2011/rdf-wg/wiki/StringLiterals/EntailmentProposal
>>> A plain text copy is attached below.
>>> 
>>> Best,
>>> Richard
>>> 
>>> 
>>> 
>>> SHORT SUMMARY
>>> 
>>> 1. RDF Concepts puts more emphasis on the distinction between (syntactic) “literal equality” and (semantic, important for applications) “value equality”
>>> 2. RDF Concepts explicitly points out the specific string value equalities that already arise from RDF Semantics
>>> 3. RDF Concepts declares one of the string literal forms as canonical
>>> 4. Implementations MAY canonicalize, but don't have to
>>> 5. The canonical form is plain literals.
>>> 
>>> 
>>> WHY?
>>> 
>>> 1. No changes to the abstract syntax required
>>> 2. No changes to any concrete syntax or parser required
>>> 3. No changes to any implementations of any of the existing entailment regimes required
>>> 4. Those who are ok with canonicalization can do that, and don't need to deal with entailment
>>> 5. Those who don't want to canonicalize, have the option of supporting only string value equality at query time, without RDFS- and D-Entailment
>>> 6. “MAY canonicalize” softly discourages the use of xsd:string typed literals, without abolishing them outright or declaring them archaic
>>> 7. Standardizing on xsd:string was never an option because of language tags
>>> 8. Standardizing on rdf:PlainLiteral was never an option because it MUST NOT be used in serializations that support plain literals
>>> 
>>> 
>>> CHANGES TO 6.5.2 The Value Corresponding to a Typed Literal
>>> http://www.w3.org/TR/rdf-concepts/#section-Literal-Value
>>> 
>>> 
>>> §1 Rename it to “6.5.1 The Value Corresponding to a Literal” and move it ahead of 6.5.1
>>> 
>>> §2 Add to the beginning:
>>> “The value of a plain literal without language tag is the same Unicode string as its lexical form.
>>> 
>>> The value of a plain literal with language tag is a pair consisting of 1. the same Unicode string as its lexical form, and 2. its language tag.
>>> 
>>> For typed literals, …” (continue with rest of section as is)
>>> 
>>> §3 Remove the Note at the end of the section
>>> 
>>> 
>>> CHANGES TO 6.5.1 Literal Equality
>>> http://www.w3.org/TR/rdf-concepts/#section-Literal-Equality
>>> 
>>> 
>>> §4 Rename section to “6.5.2 Literal Equality and Canonical Forms”
>>> 
>>> §5 Add to the beginning:
>>> “Equality of literals can be evaluated based on their syntax, or based on their value.”
>>> 
>>> §6 Change “Two literals are equal …” to: “Two literals are syntactically equal …” in the current first paragraph.
>>> 
>>> §7 Add to the end:
>>> “In application contexts, comparing the values of literals (see section 6.5.1) is usually more helpful than comparing their syntactic forms. Literals with different lexical forms and with different datatypes can have the same value. In particular:
>>> 
>>> - A plain literal with lexical form aaa and no language tag has the same value as a typed literal with lexical form aaa and datatype IRI xsd:string
>>> - A plain literal with lexical form aaa and no language tag has the same value as a typed literal with lexical form aaa@ and datatype IRI rdf:PlainLiteral
>>> - A plain literal with lexical form aaa and language tag xx has the same value as a typed literal with lexical form aaa@xx and datatype IRI rdf:PlainLiteral”
>>> 
>>> §8 “Some literals are canonical forms. Implementations MAY replace any literal with a canonical form if both are syntactically different, but have the same value. All plain literals, with or without language tag, are canonical forms.”
>>> 
>>> 
>>> CHANGES TO 6.3 Graph Equivalence
>>> http://www.w3.org/TR/rdf-concepts/#section-graph-equality
>>> 
>>> 
>>> §9 Append this leftover sentence, which was removed from 6.5.1:
>>> “Note: For comparing RDF Graphs, semantic notions of entailment (see [RDF-SEMANTICS]) are usually more helpful than the syntactic equivalence defined here.”
>>> 
>>> 
>>> EXTENDING THIS TO NUMERIC LITERALS???
>>> 
>>> (While we're at it, we might also cover equalities between the built-in numeric XSD types, and between different lexical forms of the same built-in XSD datatype.)
>>> 
>> 
>> ------------------------------------------------------------
>> IHMC                                     (850)434 8903 or (650)494 3973   
>> 40 South Alcaniz St.           (850)202 4416   office
>> Pensacola                            (850)202 4440   fax
>> FL 32502                              (850)291 0667   mobile
>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Friday, 13 May 2011 01:41:24 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:25:42 GMT