Re: Proposal for ISSUE-12, string literals

On May 12, 2011, at 15:27 , Richard Cyganiak wrote:

> On 12 May 2011, at 13:06, Ivan Herman wrote:
>>> I'd be tempted to go further and make only the primitive types such as xsd:decimal into RDF canonical forms. This would mean that systems MAY canonicalize all numbers to a single numeric datatype.
>> 
>> Do you mean like the 'canonical' forms in Turtle? I may miss something here.
> 
> No. Turtle has syntactic sugar for certain numeric literals; this has nothing to do with canonicalization.
> 
> (This all goes way beyond ISSUE-12 anyways...)
> 
> I was suggesting that perhaps, instead of this:
> "+0013"^^xsd:byte => "13"^^xsd:byte
> 
> I'd like to say that implementations MAY do this:
> "+0013"^^xsd:byte => "13.0"^^xsd:decimal
> 

I have not made up my mind on this, just thinking out 'loud': in many programming environment I would like to have access to the fact that something is a byte and not a decimal because the implementation of the latter might be way more complex and slow than the former. In other words, I am not sure RDF should be too 'smart' about it. If the user decided to define something as a byte, we should keep it as a byte...

Ivan

> They'd end up with all numbers represented in a single data type, with a single canonical representation. This makes comparisons quite a bit easier.
> 
> Best,
> Richard
> 
> 
> 
> 
>> 
>> Ivan
>> 
>> 
>> 
>>> Best,
>>> Richard
>>> 
>>> 
>>> 
>>>> 
>>>> Le 12/05/2011 12:19, Richard Cyganiak a écrit :
>>>>> On 12 May 2011, at 09:22, Ivan Herman wrote:
>>>>>> - You make the remark on the wiki page on 'extending this to
>>>>>> numeric literals', which I would rather say 'extending this to any
>>>>>> datatype' (eg, xsd:dateTime, too).
>>>>> 
>>>>> Right -- I changed the section heading on the wiki.
>>>>> 
>>>>>> I have the impression that this is also a consequence of what you
>>>>>> write already. You emphasize the 'lexical equality', and you also
>>>>>> say "Implementations MAY replace any literal with a canonical form
>>>>>> if both are syntactically different, but have the same value."
>>>>>> which does not look like being bound to string literals.
>>>>> 
>>>>> The way I wrote it, the only literals marked as canonical forms are
>>>>> plain string literals. So the sentence doesn't license replacement
>>>>> of, say, +00013 with 13, because no numeric literals have been marked
>>>>> as canonical forms. That could be easily changed, of course.
>>>>> 
>>>>>> Do you think there is anything missing in this document to make
>>>>>> that picture complete (except, editorially, to possibly add
>>>>>> non-string examples)?
>>>>> 
>>>>> If we only want to address string literals, then I think the proposal
>>>>> is complete.
>>>>> 
>>>>> If we want to address other XSD literals as well, then some bullet
>>>>> points should be added to the list of equalities, and the canonical
>>>>> lexical form of some XSD datatypes (e.g., "13.0"^^xsd:decimal) should
>>>>> be defined to be canonical forms so that other same-valued literals
>>>>> can be replaced with the canonical form. This requires a detailed
>>>>> reading of the XSD spec (which I have not done so far).
>>>>> 
>>>>> (RDF Concepts should probably contain a paragraph or two introducing
>>>>> the rdf:PlainLiteral datatype and referencing the relevant spec, but
>>>>> let's treat that as a separate issue.)
>>>>> 
>>>>>> - I would also propose to make some tiny changes in the Semantics
>>>>>> document.
>>>>> 
>>>>> I'll let the editors of that document comment.
>>>>> 
>>>>> Best, Richard
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Ivan
>>>>>> 
>>>>>> 
>>>>>> On May 11, 2011, at 23:23 , Richard Cyganiak wrote:
>>>>>> 
>>>>>>> I took an action today to draft text for RDF Concepts that
>>>>>>> resolves ISSUE-12. I put it on the wiki here:
>>>>>>> http://www.w3.org/2011/rdf-wg/wiki/StringLiterals/EntailmentProposal
>>>>>>> 
>>>>>>> 
>>>> A plain text copy is attached below.
>>>>>>> 
>>>>>>> Best, Richard
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> SHORT SUMMARY
>>>>>>> 
>>>>>>> 1. RDF Concepts puts more emphasis on the distinction between
>>>>>>> (syntactic) “literal equality” and (semantic, important for
>>>>>>> applications) “value equality” 2. RDF Concepts explicitly points
>>>>>>> out the specific string value equalities that already arise from
>>>>>>> RDF Semantics 3. RDF Concepts declares one of the string literal
>>>>>>> forms as canonical 4. Implementations MAY canonicalize, but don't
>>>>>>> have to 5. The canonical form is plain literals.
>>>>>>> 
>>>>>>> 
>>>>>>> WHY?
>>>>>>> 
>>>>>>> 1. No changes to the abstract syntax required 2. No changes to
>>>>>>> any concrete syntax or parser required 3. No changes to any
>>>>>>> implementations of any of the existing entailment regimes
>>>>>>> required 4. Those who are ok with canonicalization can do that,
>>>>>>> and don't need to deal with entailment 5. Those who don't want to
>>>>>>> canonicalize, have the option of supporting only string value
>>>>>>> equality at query time, without RDFS- and D-Entailment 6. “MAY
>>>>>>> canonicalize” softly discourages the use of xsd:string typed
>>>>>>> literals, without abolishing them outright or declaring them
>>>>>>> archaic 7. Standardizing on xsd:string was never an option
>>>>>>> because of language tags 8. Standardizing on rdf:PlainLiteral was
>>>>>>> never an option because it MUST NOT be used in serializations
>>>>>>> that support plain literals
>>>>>>> 
>>>>>>> 
>>>>>>> CHANGES TO 6.5.2 The Value Corresponding to a Typed Literal
>>>>>>> http://www.w3.org/TR/rdf-concepts/#section-Literal-Value
>>>>>>> 
>>>>>>> 
>>>>>>> §1 Rename it to “6.5.1 The Value Corresponding to a Literal” and
>>>>>>> move it ahead of 6.5.1
>>>>>>> 
>>>>>>> §2 Add to the beginning: “The value of a plain literal without
>>>>>>> language tag is the same Unicode string as its lexical form.
>>>>>>> 
>>>>>>> The value of a plain literal with language tag is a pair
>>>>>>> consisting of 1. the same Unicode string as its lexical form, and
>>>>>>> 2. its language tag.
>>>>>>> 
>>>>>>> For typed literals, …” (continue with rest of section as is)
>>>>>>> 
>>>>>>> §3 Remove the Note at the end of the section
>>>>>>> 
>>>>>>> 
>>>>>>> CHANGES TO 6.5.1 Literal Equality
>>>>>>> http://www.w3.org/TR/rdf-concepts/#section-Literal-Equality
>>>>>>> 
>>>>>>> 
>>>>>>> §4 Rename section to “6.5.2 Literal Equality and Canonical
>>>>>>> Forms”
>>>>>>> 
>>>>>>> §5 Add to the beginning: “Equality of literals can be evaluated
>>>>>>> based on their syntax, or based on their value.”
>>>>>>> 
>>>>>>> §6 Change “Two literals are equal …” to: “Two literals are
>>>>>>> syntactically equal …” in the current first paragraph.
>>>>>>> 
>>>>>>> §7 Add to the end: “In application contexts, comparing the values
>>>>>>> of literals (see section 6.5.1) is usually more helpful than
>>>>>>> comparing their syntactic forms. Literals with different lexical
>>>>>>> forms and with different datatypes can have the same value. In
>>>>>>> particular:
>>>>>>> 
>>>>>>> - A plain literal with lexical form aaa and no language tag has
>>>>>>> the same value as a typed literal with lexical form aaa and
>>>>>>> datatype IRI xsd:string - A plain literal with lexical form aaa
>>>>>>> and no language tag has the same value as a typed literal with
>>>>>>> lexical form aaa@ and datatype IRI rdf:PlainLiteral - A plain
>>>>>>> literal with lexical form aaa and language tag xx has the same
>>>>>>> value as a typed literal with lexical form aaa@xx and datatype
>>>>>>> IRI rdf:PlainLiteral”
>>>>>>> 
>>>>>>> §8 “Some literals are canonical forms. Implementations MAY
>>>>>>> replace any literal with a canonical form if both are
>>>>>>> syntactically different, but have the same value. All plain
>>>>>>> literals, with or without language tag, are canonical forms.”
>>>>>>> 
>>>>>>> 
>>>>>>> CHANGES TO 6.3 Graph Equivalence
>>>>>>> http://www.w3.org/TR/rdf-concepts/#section-graph-equality
>>>>>>> 
>>>>>>> 
>>>>>>> §9 Append this leftover sentence, which was removed from 6.5.1:
>>>>>>> “Note: For comparing RDF Graphs, semantic notions of entailment
>>>>>>> (see [RDF-SEMANTICS]) are usually more helpful than the syntactic
>>>>>>> equivalence defined here.”
>>>>>>> 
>>>>>>> 
>>>>>>> EXTENDING THIS TO NUMERIC LITERALS???
>>>>>>> 
>>>>>>> (While we're at it, we might also cover equalities between the
>>>>>>> built-in numeric XSD types, and between different lexical forms
>>>>>>> of the same built-in XSD datatype.)
>>>>>> 
>>>>>> 
>>>>>> ---- Ivan Herman, W3C Semantic Web Activity Lead Home:
>>>>>> http://www.w3.org/People/Ivan/ mobile: +31-641044153 PGP Key:
>>>>>> http://www.ivan-herman.net/pgpkey.html FOAF:
>>>>>> http://www.ivan-herman.net/foaf.rdf
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Antoine Zimmermann
>>>> Researcher at:
>>>> Laboratoire d'InfoRmatique en Image et Systèmes d'information
>>>> Database Group
>>>> 7 Avenue Jean Capelle
>>>> 69621 Villeurbanne Cedex
>>>> France
>>>> Tel: +33(0)4 72 43 61 74 - Fax: +33(0)4 72 43 87 13
>>>> Lecturer at:
>>>> Institut National des Sciences Appliquées de Lyon
>>>> 20 Avenue Albert Einstein
>>>> 69621 Villeurbanne Cedex
>>>> France
>>>> antoine.zimmermann@insa-lyon.fr
>>>> http://zimmer.aprilfoolsreview.com/
>>>> 
>>> 
>>> 
>> 
>> 
>> ----
>> Ivan Herman, W3C Semantic Web Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>> FOAF: http://www.ivan-herman.net/foaf.rdf
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf

Received on Thursday, 12 May 2011 15:50:29 UTC