Re: Proposal for ISSUE-12, string literals

On May 12, 2011, at 10:49 AM, Ivan Herman wrote:

> 
> On May 12, 2011, at 15:27 , Richard Cyganiak wrote:
> 
>> On 12 May 2011, at 13:06, Ivan Herman wrote:
>>>> I'd be tempted to go further and make only the primitive types such as xsd:decimal into RDF canonical forms. This would mean that systems MAY canonicalize all numbers to a single numeric datatype.
>>> 
>>> Do you mean like the 'canonical' forms in Turtle? I may miss something here.
>> 
>> No. Turtle has syntactic sugar for certain numeric literals; this has nothing to do with canonicalization.
>> 
>> (This all goes way beyond ISSUE-12 anyways...)
>> 
>> I was suggesting that perhaps, instead of this:
>> "+0013"^^xsd:byte => "13"^^xsd:byte
>> 
>> I'd like to say that implementations MAY do this:
>> "+0013"^^xsd:byte => "13.0"^^xsd:decimal
>> 
> 
> I have not made up my mind on this, just thinking out 'loud': in many programming environment I would like to have access to the fact that something is a byte and not a decimal because the implementation of the latter might be way more complex and slow than the former. In other words, I am not sure RDF should be too 'smart' about it.

+1

Pat

> If the user decided to define something as a byte, we should keep it as a byte...
> 
> Ivan
> 
>> They'd end up with all numbers represented in a single data type, with a single canonical representation. This makes comparisons quite a bit easier.
>> 
>> Best,
>> Richard
>> 
>> 
>> 
>> 
>>> 
>>> Ivan
>>> 
>>> 
>>> 
>>>> Best,
>>>> Richard
>>>> 
>>>> 
>>>> 
>>>>> 
>>>>> Le 12/05/2011 12:19, Richard Cyganiak a écrit :
>>>>>> On 12 May 2011, at 09:22, Ivan Herman wrote:
>>>>>>> - You make the remark on the wiki page on 'extending this to
>>>>>>> numeric literals', which I would rather say 'extending this to any
>>>>>>> datatype' (eg, xsd:dateTime, too).
>>>>>> 
>>>>>> Right -- I changed the section heading on the wiki.
>>>>>> 
>>>>>>> I have the impression that this is also a consequence of what you
>>>>>>> write already. You emphasize the 'lexical equality', and you also
>>>>>>> say "Implementations MAY replace any literal with a canonical form
>>>>>>> if both are syntactically different, but have the same value."
>>>>>>> which does not look like being bound to string literals.
>>>>>> 
>>>>>> The way I wrote it, the only literals marked as canonical forms are
>>>>>> plain string literals. So the sentence doesn't license replacement
>>>>>> of, say, +00013 with 13, because no numeric literals have been marked
>>>>>> as canonical forms. That could be easily changed, of course.
>>>>>> 
>>>>>>> Do you think there is anything missing in this document to make
>>>>>>> that picture complete (except, editorially, to possibly add
>>>>>>> non-string examples)?
>>>>>> 
>>>>>> If we only want to address string literals, then I think the proposal
>>>>>> is complete.
>>>>>> 
>>>>>> If we want to address other XSD literals as well, then some bullet
>>>>>> points should be added to the list of equalities, and the canonical
>>>>>> lexical form of some XSD datatypes (e.g., "13.0"^^xsd:decimal) should
>>>>>> be defined to be canonical forms so that other same-valued literals
>>>>>> can be replaced with the canonical form. This requires a detailed
>>>>>> reading of the XSD spec (which I have not done so far).
>>>>>> 
>>>>>> (RDF Concepts should probably contain a paragraph or two introducing
>>>>>> the rdf:PlainLiteral datatype and referencing the relevant spec, but
>>>>>> let's treat that as a separate issue.)
>>>>>> 
>>>>>>> - I would also propose to make some tiny changes in the Semantics
>>>>>>> document.
>>>>>> 
>>>>>> I'll let the editors of that document comment.
>>>>>> 
>>>>>> Best, Richard
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> Ivan
>>>>>>> 
>>>>>>> 
>>>>>>> On May 11, 2011, at 23:23 , Richard Cyganiak wrote:
>>>>>>> 
>>>>>>>> I took an action today to draft text for RDF Concepts that
>>>>>>>> resolves ISSUE-12. I put it on the wiki here:
>>>>>>>> http://www.w3.org/2011/rdf-wg/wiki/StringLiterals/EntailmentProposal
>>>>>>>> 
>>>>>>>> 
>>>>> A plain text copy is attached below.
>>>>>>>> 
>>>>>>>> Best, Richard
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> SHORT SUMMARY
>>>>>>>> 
>>>>>>>> 1. RDF Concepts puts more emphasis on the distinction between
>>>>>>>> (syntactic) “literal equality” and (semantic, important for
>>>>>>>> applications) “value equality” 2. RDF Concepts explicitly points
>>>>>>>> out the specific string value equalities that already arise from
>>>>>>>> RDF Semantics 3. RDF Concepts declares one of the string literal
>>>>>>>> forms as canonical 4. Implementations MAY canonicalize, but don't
>>>>>>>> have to 5. The canonical form is plain literals.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> WHY?
>>>>>>>> 
>>>>>>>> 1. No changes to the abstract syntax required 2. No changes to
>>>>>>>> any concrete syntax or parser required 3. No changes to any
>>>>>>>> implementations of any of the existing entailment regimes
>>>>>>>> required 4. Those who are ok with canonicalization can do that,
>>>>>>>> and don't need to deal with entailment 5. Those who don't want to
>>>>>>>> canonicalize, have the option of supporting only string value
>>>>>>>> equality at query time, without RDFS- and D-Entailment 6. “MAY
>>>>>>>> canonicalize” softly discourages the use of xsd:string typed
>>>>>>>> literals, without abolishing them outright or declaring them
>>>>>>>> archaic 7. Standardizing on xsd:string was never an option
>>>>>>>> because of language tags 8. Standardizing on rdf:PlainLiteral was
>>>>>>>> never an option because it MUST NOT be used in serializations
>>>>>>>> that support plain literals
>>>>>>>> 
>>>>>>>> 
>>>>>>>> CHANGES TO 6.5.2 The Value Corresponding to a Typed Literal
>>>>>>>> http://www.w3.org/TR/rdf-concepts/#section-Literal-Value
>>>>>>>> 
>>>>>>>> 
>>>>>>>> §1 Rename it to “6.5.1 The Value Corresponding to a Literal” and
>>>>>>>> move it ahead of 6.5.1
>>>>>>>> 
>>>>>>>> §2 Add to the beginning: “The value of a plain literal without
>>>>>>>> language tag is the same Unicode string as its lexical form.
>>>>>>>> 
>>>>>>>> The value of a plain literal with language tag is a pair
>>>>>>>> consisting of 1. the same Unicode string as its lexical form, and
>>>>>>>> 2. its language tag.
>>>>>>>> 
>>>>>>>> For typed literals, …” (continue with rest of section as is)
>>>>>>>> 
>>>>>>>> §3 Remove the Note at the end of the section
>>>>>>>> 
>>>>>>>> 
>>>>>>>> CHANGES TO 6.5.1 Literal Equality
>>>>>>>> http://www.w3.org/TR/rdf-concepts/#section-Literal-Equality
>>>>>>>> 
>>>>>>>> 
>>>>>>>> §4 Rename section to “6.5.2 Literal Equality and Canonical
>>>>>>>> Forms”
>>>>>>>> 
>>>>>>>> §5 Add to the beginning: “Equality of literals can be evaluated
>>>>>>>> based on their syntax, or based on their value.”
>>>>>>>> 
>>>>>>>> §6 Change “Two literals are equal …” to: “Two literals are
>>>>>>>> syntactically equal …” in the current first paragraph.
>>>>>>>> 
>>>>>>>> §7 Add to the end: “In application contexts, comparing the values
>>>>>>>> of literals (see section 6.5.1) is usually more helpful than
>>>>>>>> comparing their syntactic forms. Literals with different lexical
>>>>>>>> forms and with different datatypes can have the same value. In
>>>>>>>> particular:
>>>>>>>> 
>>>>>>>> - A plain literal with lexical form aaa and no language tag has
>>>>>>>> the same value as a typed literal with lexical form aaa and
>>>>>>>> datatype IRI xsd:string - A plain literal with lexical form aaa
>>>>>>>> and no language tag has the same value as a typed literal with
>>>>>>>> lexical form aaa@ and datatype IRI rdf:PlainLiteral - A plain
>>>>>>>> literal with lexical form aaa and language tag xx has the same
>>>>>>>> value as a typed literal with lexical form aaa@xx and datatype
>>>>>>>> IRI rdf:PlainLiteral”
>>>>>>>> 
>>>>>>>> §8 “Some literals are canonical forms. Implementations MAY
>>>>>>>> replace any literal with a canonical form if both are
>>>>>>>> syntactically different, but have the same value. All plain
>>>>>>>> literals, with or without language tag, are canonical forms.”
>>>>>>>> 
>>>>>>>> 
>>>>>>>> CHANGES TO 6.3 Graph Equivalence
>>>>>>>> http://www.w3.org/TR/rdf-concepts/#section-graph-equality
>>>>>>>> 
>>>>>>>> 
>>>>>>>> §9 Append this leftover sentence, which was removed from 6.5.1:
>>>>>>>> “Note: For comparing RDF Graphs, semantic notions of entailment
>>>>>>>> (see [RDF-SEMANTICS]) are usually more helpful than the syntactic
>>>>>>>> equivalence defined here.”
>>>>>>>> 
>>>>>>>> 
>>>>>>>> EXTENDING THIS TO NUMERIC LITERALS???
>>>>>>>> 
>>>>>>>> (While we're at it, we might also cover equalities between the
>>>>>>>> built-in numeric XSD types, and between different lexical forms
>>>>>>>> of the same built-in XSD datatype.)
>>>>>>> 
>>>>>>> 
>>>>>>> ---- Ivan Herman, W3C Semantic Web Activity Lead Home:
>>>>>>> http://www.w3.org/People/Ivan/ mobile: +31-641044153 PGP Key:
>>>>>>> http://www.ivan-herman.net/pgpkey.html FOAF:
>>>>>>> http://www.ivan-herman.net/foaf.rdf
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> Antoine Zimmermann
>>>>> Researcher at:
>>>>> Laboratoire d'InfoRmatique en Image et Systèmes d'information
>>>>> Database Group
>>>>> 7 Avenue Jean Capelle
>>>>> 69621 Villeurbanne Cedex
>>>>> France
>>>>> Tel: +33(0)4 72 43 61 74 - Fax: +33(0)4 72 43 87 13
>>>>> Lecturer at:
>>>>> Institut National des Sciences Appliquées de Lyon
>>>>> 20 Avenue Albert Einstein
>>>>> 69621 Villeurbanne Cedex
>>>>> France
>>>>> antoine.zimmermann@insa-lyon.fr
>>>>> http://zimmer.aprilfoolsreview.com/
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> ----
>>> Ivan Herman, W3C Semantic Web Activity Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
> 
> 
> 
> 
> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes

Received on Thursday, 12 May 2011 20:07:50 UTC