Re: Proposal for ISSUE-12, string literals from Lee Feigenbaum on 2011-05-13 (public-rdf-wg@w3.org from May 2011)

From: Lee Feigenbaum <lee@thefigtrees.net>
Date: Fri, 13 May 2011 10:48:12 -0400
To: Alex Hall <alexhall@revelytix.com>
CC: Pat Hayes <phayes@ihmc.us>, Richard Cyganiak <richard@cyganiak.de>, RDF Working Group WG <public-rdf-wg@w3.org>
Message-ID: <4DCD44AC.8070004@thefigtrees.net>
On 5/13/2011 10:33 AM, Alex Hall wrote:
> On Thu, May 12, 2011 at 9:40 PM, Pat Hayes <phayes@ihmc.us
> <mailto:phayes@ihmc.us>> wrote:
>
>
>     On May 12, 2011, at 12:06 PM, Richard Cyganiak wrote:
>
>      > On 12 May 2011, at 16:52, Pat Hayes wrote:
>      >> I agree with all of this (though I think we could maybe be
>     harsher on xsd:string) but suggest we should additionally explicitly
>     endorse the idea that plain literals are understood as typed with
>     the datatype rdf:PlainLiteral, so that all RDF literals are
>     considered to have a type. And that this should be stated explicitly
>     in Concepts and Semantics, and built into the RDF entailment regime
>     (along with rdf:XMLLIteral).
>      >
>      > Can you explain the mechanism that you have in mind when you say
>     "plain literals are understood as typed with the datatype
>     rdf:PlainLiteral"?
>      >
>      > "foo"@en is a plain literal.
>      >
>      > What datatype does it have? None, or rdf:PlainLiteral?
>
>     rdf:PlainLIteral.  The idea behind rdf:PlainLIteral, as I understand
>     it, is that *all* RDF literals have a datatype, even plain ones.
>     Otherwise, there really is no point to having it around.
>
>      >
>      > What is its lexical form? "foo" or "foo@en"?
>
>     "foo@en"
>
>     is (unfortunately) the only possible answer. The awkward case, which
>     you didn't ask, is that lexical form of the plain literal "foo" is
>     "foo@". The final '@' signals the lack of a language tag (or, if we
>     prefer, the empty language tag.)
>
>     Put this all another way, the RDF plain literal surface forms "foo"
>     and "foo"@en are treated as sugared syntax for the real underlying
>     forms "foo@"^^rdf:PlainLIteral and "foo@en"^^rdf:PlainLIteral. The
>     semantics treats the former as though they were written like the
>     latter, with the datatype mapping "sss@" --> "sss" and "sss@ttt" -->
>     <"sss", 'ttt'>.
>
>
> It's for this reason that I'd prefer to keep rdf:PlainLiteral out of the
> core RDF specs and reserve it for exchanging language-tagged literals
> with systems that don't support that notion.  Having to deal with the
> extraneous '@' for literals without language tags seems like needless
> complexity for what should be a simple string manipulation.
>
> If we're going to say that everything has a datatype, I'd prefer to see
> "foo" get normalized to "foo"^^xsd:string.  But my reasons there are
> more aesthetic; it just seems wrong to single out that one particular
> primitive datatype and say that it should not be used.
>
> FWIW, my preferred approach would be to:
> 1. Say that every literal has *either* a datatype *or* a language tag.
> 2. Say that the datatype of the surface form "foo" is xsd:string.

I also prefer this approach. I don't really understand the preference 
for normalizing to a plain literal with no datatype or language tag. I 
know Andy talked about users wanting similarity between language tagged 
literals and simple string literals, but I don't really even know what 
wanting that similarity means.

Also, note that (as has been mentioned already), the SPARQL 
datatype(...) function already specifically says datatype("foo") is 
xsd:string.

> I also recognize that I seem to be in the minority on this one.  As long
> as the surface forms "foo" and "foo"^^xsd:string get normalized to the
> same thing (or systems have permission to do such normalization) then
> I'm happy.

Yes, I can live with this outcome as well.

Lee


> -Alex
>
>
>      >> I would suggest one more extension, an additional datatype
>     rdf:PlainLIteralString, which is also built into basic RDF. This is
>     similar to PlainLIteral but ignores the language tag, so it treats
>     "foo"+EN as equal to "foo". This would help the users that Andy
>     mentioned who want to ignore language tags in queries. We can build
>     this into the basic RDF entailment regime along with PlainLiteral.
>      >
>      > I don't think that this helps the users that Andy mentioned.
>
>     Andy seems to agree. Well in that case, forget the idea.
>
>     Pat
>
>
>      > The problem is that "foo" != "foo"@en in SPARQL, and this
>     confuses people who have not wrapped their head around the idea that
>     strings in SPARQL can have this extra bit called a language tag
>     attached. Introducing a new string data type doesn't change anything
>     about this situation.
>      >
>      > Best,
>      > Richard
>      >
>      >
>      >
>      >>
>      >> These two datatypes are unique in that they apply to plain
>     literal syntax, which is a good 'theoretical' reason to include them
>     in the RDF layer of the specs in any case.
>      >>
>      >> Pat
>      >>
>      >> On May 11, 2011, at 4:23 PM, Richard Cyganiak wrote:
>      >>
>      >>> I took an action today to draft text for RDF Concepts that
>     resolves ISSUE-12. I put it on the wiki here:
>      >>>
>     http://www.w3.org/2011/rdf-wg/wiki/StringLiterals/EntailmentProposal
>      >>> A plain text copy is attached below.
>      >>>
>      >>> Best,
>      >>> Richard
>      >>>
>      >>>
>      >>>
>      >>> SHORT SUMMARY
>      >>>
>      >>> 1. RDF Concepts puts more emphasis on the distinction between
>     (syntactic) “literal equality” and (semantic, important for
>     applications) “value equality”
>      >>> 2. RDF Concepts explicitly points out the specific string value
>     equalities that already arise from RDF Semantics
>      >>> 3. RDF Concepts declares one of the string literal forms as
>     canonical
>      >>> 4. Implementations MAY canonicalize, but don't have to
>      >>> 5. The canonical form is plain literals.
>      >>>
>      >>>
>      >>> WHY?
>      >>>
>      >>> 1. No changes to the abstract syntax required
>      >>> 2. No changes to any concrete syntax or parser required
>      >>> 3. No changes to any implementations of any of the existing
>     entailment regimes required
>      >>> 4. Those who are ok with canonicalization can do that, and
>     don't need to deal with entailment
>      >>> 5. Those who don't want to canonicalize, have the option of
>     supporting only string value equality at query time, without RDFS-
>     and D-Entailment
>      >>> 6. “MAY canonicalize” softly discourages the use of xsd:string
>     typed literals, without abolishing them outright or declaring them
>     archaic
>      >>> 7. Standardizing on xsd:string was never an option because of
>     language tags
>      >>> 8. Standardizing on rdf:PlainLiteral was never an option
>     because it MUST NOT be used in serializations that support plain
>     literals
>      >>>
>      >>>
>      >>> CHANGES TO 6.5.2 The Value Corresponding to a Typed Literal
>      >>> http://www.w3.org/TR/rdf-concepts/#section-Literal-Value
>      >>>
>      >>>
>      >>> §1 Rename it to “6.5.1 The Value Corresponding to a Literal”
>     and move it ahead of 6.5.1
>      >>>
>      >>> §2 Add to the beginning:
>      >>> “The value of a plain literal without language tag is the same
>     Unicode string as its lexical form.
>      >>>
>      >>> The value of a plain literal with language tag is a pair
>     consisting of 1. the same Unicode string as its lexical form, and 2.
>     its language tag.
>      >>>
>      >>> For typed literals, …” (continue with rest of section as is)
>      >>>
>      >>> §3 Remove the Note at the end of the section
>      >>>
>      >>>
>      >>> CHANGES TO 6.5.1 Literal Equality
>      >>> http://www.w3.org/TR/rdf-concepts/#section-Literal-Equality
>      >>>
>      >>>
>      >>> §4 Rename section to “6.5.2 Literal Equality and Canonical Forms”
>      >>>
>      >>> §5 Add to the beginning:
>      >>> “Equality of literals can be evaluated based on their syntax,
>     or based on their value.”
>      >>>
>      >>> §6 Change “Two literals are equal …” to: “Two literals are
>     syntactically equal …” in the current first paragraph.
>      >>>
>      >>> §7 Add to the end:
>      >>> “In application contexts, comparing the values of literals (see
>     section 6.5.1) is usually more helpful than comparing their
>     syntactic forms. Literals with different lexical forms and with
>     different datatypes can have the same value. In particular:
>      >>>
>      >>> - A plain literal with lexical form aaa and no language tag has
>     the same value as a typed literal with lexical form aaa and datatype
>     IRI xsd:string
>      >>> - A plain literal with lexical form aaa and no language tag has
>     the same value as a typed literal with lexical form aaa@ and
>     datatype IRI rdf:PlainLiteral
>      >>> - A plain literal with lexical form aaa and language tag xx has
>     the same value as a typed literal with lexical form aaa@xx and
>     datatype IRI rdf:PlainLiteral”
>      >>>
>      >>> §8 “Some literals are canonical forms. Implementations MAY
>     replace any literal with a canonical form if both are syntactically
>     different, but have the same value. All plain literals, with or
>     without language tag, are canonical forms.”
>      >>>
>      >>>
>      >>> CHANGES TO 6.3 Graph Equivalence
>      >>> http://www.w3.org/TR/rdf-concepts/#section-graph-equality
>      >>>
>      >>>
>      >>> §9 Append this leftover sentence, which was removed from 6.5.1:
>      >>> “Note: For comparing RDF Graphs, semantic notions of entailment
>     (see [RDF-SEMANTICS]) are usually more helpful than the syntactic
>     equivalence defined here.”
>      >>>
>      >>>
>      >>> EXTENDING THIS TO NUMERIC LITERALS???
>      >>>
>      >>> (While we're at it, we might also cover equalities between the
>     built-in numeric XSD types, and between different lexical forms of
>     the same built-in XSD datatype.)
>      >>>
>      >>
>      >> ------------------------------------------------------------
>      >> IHMC                                     (850)434 8903 or
>     (650)494 3973
>      >> 40 South Alcaniz St.           (850)202 4416   office
>      >> Pensacola                            (850)202 4440   fax
>      >> FL 32502                              (850)291 0667   mobile
>      >> phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
>      >>
>      >>
>      >>
>      >>
>      >>
>      >>
>      >
>      >
>      >
>
>     ------------------------------------------------------------
>     IHMC                                     (850)434 8903 or (650)494 3973
>     40 South Alcaniz St.           (850)202 4416   office
>     Pensacola                            (850)202 4440   fax
>     FL 32502                              (850)291 0667   mobile
>     phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
>
>
>
>
>
>
>
Received on Friday, 13 May 2011 14:48:35 UTC