Re: Proposal for ISSUE-12, string literals

On Thu, May 12, 2011 at 9:40 PM, Pat Hayes <phayes@ihmc.us> wrote:

>
> On May 12, 2011, at 12:06 PM, Richard Cyganiak wrote:
>
> > On 12 May 2011, at 16:52, Pat Hayes wrote:
> >> I agree with all of this (though I think we could maybe be harsher on
> xsd:string) but suggest we should additionally explicitly endorse the idea
> that plain literals are understood as typed with the datatype
> rdf:PlainLiteral, so that all RDF literals are considered to have a type.
> And that this should be stated explicitly in Concepts and Semantics, and
> built into the RDF entailment regime (along with rdf:XMLLIteral).
> >
> > Can you explain the mechanism that you have in mind when you say "plain
> literals are understood as typed with the datatype rdf:PlainLiteral"?
> >
> > "foo"@en is a plain literal.
> >
> > What datatype does it have? None, or rdf:PlainLiteral?
>
> rdf:PlainLIteral.  The idea behind rdf:PlainLIteral, as I understand it, is
> that *all* RDF literals have a datatype, even plain ones. Otherwise, there
> really is no point to having it around.
>
> >
> > What is its lexical form? "foo" or "foo@en"?
>
> "foo@en"
>
> is (unfortunately) the only possible answer. The awkward case, which you
> didn't ask, is that lexical form of the plain literal "foo" is "foo@". The
> final '@' signals the lack of a language tag (or, if we prefer, the empty
> language tag.)
>
> Put this all another way, the RDF plain literal surface forms "foo" and
> "foo"@en are treated as sugared syntax for the real underlying forms "foo@"^^rdf:PlainLIteral
> and "foo@en"^^rdf:PlainLIteral. The semantics treats the former as though
> they were written like the latter, with the datatype mapping "sss@" -->
> "sss" and "sss@ttt" --> <"sss", 'ttt'>.
>

It's for this reason that I'd prefer to keep rdf:PlainLiteral out of the
core RDF specs and reserve it for exchanging language-tagged literals with
systems that don't support that notion.  Having to deal with the extraneous
'@' for literals without language tags seems like needless complexity for
what should be a simple string manipulation.

If we're going to say that everything has a datatype, I'd prefer to see
"foo" get normalized to "foo"^^xsd:string.  But my reasons there are more
aesthetic; it just seems wrong to single out that one particular primitive
datatype and say that it should not be used.

FWIW, my preferred approach would be to:
1. Say that every literal has *either* a datatype *or* a language tag.
2. Say that the datatype of the surface form "foo" is xsd:string.

I also recognize that I seem to be in the minority on this one.  As long as
the surface forms "foo" and "foo"^^xsd:string get normalized to the same
thing (or systems have permission to do such normalization) then I'm happy.

-Alex



>
> >> I would suggest one more extension, an additional datatype
> rdf:PlainLIteralString, which is also built into basic RDF. This is similar
> to PlainLIteral but ignores the language tag, so it treats "foo"+EN as equal
> to "foo". This would help the users that Andy mentioned who want to ignore
> language tags in queries. We can build this into the basic RDF entailment
> regime along with PlainLiteral.
> >
> > I don't think that this helps the users that Andy mentioned.
>
> Andy seems to agree. Well in that case, forget the idea.
>
> Pat
>
>
> > The problem is that "foo" != "foo"@en in SPARQL, and this confuses people
> who have not wrapped their head around the idea that strings in SPARQL can
> have this extra bit called a language tag attached. Introducing a new string
> data type doesn't change anything about this situation.
> >
> > Best,
> > Richard
> >
> >
> >
> >>
> >> These two datatypes are unique in that they apply to plain literal
> syntax, which is a good 'theoretical' reason to include them in the RDF
> layer of the specs in any case.
> >>
> >> Pat
> >>
> >> On May 11, 2011, at 4:23 PM, Richard Cyganiak wrote:
> >>
> >>> I took an action today to draft text for RDF Concepts that resolves
> ISSUE-12. I put it on the wiki here:
> >>> http://www.w3.org/2011/rdf-wg/wiki/StringLiterals/EntailmentProposal
> >>> A plain text copy is attached below.
> >>>
> >>> Best,
> >>> Richard
> >>>
> >>>
> >>>
> >>> SHORT SUMMARY
> >>>
> >>> 1. RDF Concepts puts more emphasis on the distinction between
> (syntactic) “literal equality” and (semantic, important for applications)
> “value equality”
> >>> 2. RDF Concepts explicitly points out the specific string value
> equalities that already arise from RDF Semantics
> >>> 3. RDF Concepts declares one of the string literal forms as canonical
> >>> 4. Implementations MAY canonicalize, but don't have to
> >>> 5. The canonical form is plain literals.
> >>>
> >>>
> >>> WHY?
> >>>
> >>> 1. No changes to the abstract syntax required
> >>> 2. No changes to any concrete syntax or parser required
> >>> 3. No changes to any implementations of any of the existing entailment
> regimes required
> >>> 4. Those who are ok with canonicalization can do that, and don't need
> to deal with entailment
> >>> 5. Those who don't want to canonicalize, have the option of supporting
> only string value equality at query time, without RDFS- and D-Entailment
> >>> 6. “MAY canonicalize” softly discourages the use of xsd:string typed
> literals, without abolishing them outright or declaring them archaic
> >>> 7. Standardizing on xsd:string was never an option because of language
> tags
> >>> 8. Standardizing on rdf:PlainLiteral was never an option because it
> MUST NOT be used in serializations that support plain literals
> >>>
> >>>
> >>> CHANGES TO 6.5.2 The Value Corresponding to a Typed Literal
> >>> http://www.w3.org/TR/rdf-concepts/#section-Literal-Value
> >>>
> >>>
> >>> §1 Rename it to “6.5.1 The Value Corresponding to a Literal” and move
> it ahead of 6.5.1
> >>>
> >>> §2 Add to the beginning:
> >>> “The value of a plain literal without language tag is the same Unicode
> string as its lexical form.
> >>>
> >>> The value of a plain literal with language tag is a pair consisting of
> 1. the same Unicode string as its lexical form, and 2. its language tag.
> >>>
> >>> For typed literals, …” (continue with rest of section as is)
> >>>
> >>> §3 Remove the Note at the end of the section
> >>>
> >>>
> >>> CHANGES TO 6.5.1 Literal Equality
> >>> http://www.w3.org/TR/rdf-concepts/#section-Literal-Equality
> >>>
> >>>
> >>> §4 Rename section to “6.5.2 Literal Equality and Canonical Forms”
> >>>
> >>> §5 Add to the beginning:
> >>> “Equality of literals can be evaluated based on their syntax, or based
> on their value.”
> >>>
> >>> §6 Change “Two literals are equal …” to: “Two literals are
> syntactically equal …” in the current first paragraph.
> >>>
> >>> §7 Add to the end:
> >>> “In application contexts, comparing the values of literals (see section
> 6.5.1) is usually more helpful than comparing their syntactic forms.
> Literals with different lexical forms and with different datatypes can have
> the same value. In particular:
> >>>
> >>> - A plain literal with lexical form aaa and no language tag has the
> same value as a typed literal with lexical form aaa and datatype IRI
> xsd:string
> >>> - A plain literal with lexical form aaa and no language tag has the
> same value as a typed literal with lexical form aaa@ and datatype IRI
> rdf:PlainLiteral
> >>> - A plain literal with lexical form aaa and language tag xx has the
> same value as a typed literal with lexical form aaa@xx and datatype IRI
> rdf:PlainLiteral”
> >>>
> >>> §8 “Some literals are canonical forms. Implementations MAY replace any
> literal with a canonical form if both are syntactically different, but have
> the same value. All plain literals, with or without language tag, are
> canonical forms.”
> >>>
> >>>
> >>> CHANGES TO 6.3 Graph Equivalence
> >>> http://www.w3.org/TR/rdf-concepts/#section-graph-equality
> >>>
> >>>
> >>> §9 Append this leftover sentence, which was removed from 6.5.1:
> >>> “Note: For comparing RDF Graphs, semantic notions of entailment (see
> [RDF-SEMANTICS]) are usually more helpful than the syntactic equivalence
> defined here.”
> >>>
> >>>
> >>> EXTENDING THIS TO NUMERIC LITERALS???
> >>>
> >>> (While we're at it, we might also cover equalities between the built-in
> numeric XSD types, and between different lexical forms of the same built-in
> XSD datatype.)
> >>>
> >>
> >> ------------------------------------------------------------
> >> IHMC                                     (850)434 8903 or (650)494 3973
> >> 40 South Alcaniz St.           (850)202 4416   office
> >> Pensacola                            (850)202 4440   fax
> >> FL 32502                              (850)291 0667   mobile
> >> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> >>
> >>
> >>
> >>
> >>
> >>
> >
> >
> >
>
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>
>
>

Received on Friday, 13 May 2011 14:34:09 UTC