Re: Plain literals in Canonical N-triples from David Booth on 2014-12-29 (public-rdf-comments@w3.org from December 2014)

From: David Booth <david@dbooth.org>
Date: Mon, 29 Dec 2014 15:08:59 -0500
To: Pat Hayes <phayes@ihmc.us>
CC: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>, "public-rdf-comments@w3.org" <public-rdf-comments@w3.org>
Message-ID: <54A1B4DB.6090109@dbooth.org>
On 12/29/2014 02:38 PM, Pat Hayes wrote:
>
> On Dec 29, 2014, at 12:52 PM, David Booth <david@dbooth.org> wrote:
>
>> P.S. Or to put it differently, it would be harmful if anyone
>> interpreted the existing ambiguity to be intentional.
>
> Well, there is no actual ambiguity. In RDF 1.1, the datatype of plain
> literals (without a language tag) is xsd:string, unambiguously. That
> type URI appears explictly in the RDF 1.1 abstract (graph) syntax,
> unambiguously. But the RDF specs do not define all possible surface
> syntaxes for RDF, and they explicitly allow a surface syntax to omit
> the xsd:string typing URI as a form of syntactic sugar, since it is
> implied in all cases, so its omission does not introduce any
> ambiguity.

Sure, but the question was about the Canonical N-Triples *serialization* 
-- not the RDF 1.1 abstract graph.  As Stian Soiland-Reyes pointed out, 
the definition of Canonical N-Triples is currently ambiguous about 
whether xsd:string literals should be serialized like "foo" or like
"foo"^^<http://www.w3.org/2001/XMLSchema#string> .   As both Andy and 
Gregg recalled, it appears that this detail was discussed (in favor of 
"foo") but was omitted from the spec.

David Booth

>
> Pat
>
>>
>> On 12/29/2014 01:36 PM, David Booth wrote:
>>> FWIW, it certainly seems to me like this detail was omitted
>>> unintentionally and would be helpful to include in the errata.
>>>
>>> David Booth
>>>
>>> On 12/29/2014 12:50 PM, Stian Soiland-Reyes wrote:
>>>> OK, thank you all for recollecting! So I'll settle for the
>>>> "naked" literal in output of an xsd:string.
>>>>
>>>> Should this go into an errata or is it too much of a change?
>>>>
>>>> On 29 Dec 2014 07:41, "Andy Seaborne" <andy@apache.org
>>>> <mailto:andy@apache.org>> wrote:
>>>>
>>>> On 29/12/14 06:31, Pat Hayes wrote:
>>>>
>>>>
>>>> On Dec 28, 2014, at 6:10 PM, Gregg Kellogg
>>>> <gregg@greggkellogg.com <mailto:gregg@greggkellogg.com>>
>>>> wrote:
>>>>
>>>> On Dec 28, 2014, at 3:32 PM, Pat Hayes <phayes@ihmc.us
>>>> <mailto:phayes@ihmc.us>> wrote:
>>>>
>>>>
>>>>
>>>> On Dec 28, 2014, at 5:40 AM, Andy Seaborne <andy@apache.org
>>>> <mailto:andy@apache.org>> wrote:
>>>>
>>>> On 28/12/14 05:04, Pat Hayes wrote:
>>>>
>>>> On Dec 27, 2014, at 9:24 PM, Stian Soiland-Reyes
>>>> <soiland-reyes@cs.manchester.__ac.uk
>>>> <mailto:soiland-reyes@cs.manchester.ac.uk>> wrote:
>>>>
>>>> No, for once I am not coming from OWL :)
>>>>
>>>> I'm just writing a simple n-triples serializer, and I am not
>>>> sure if I should simply always include the type if there is no
>>>> @lang (e.g. ^^xsd:string)
>>>>
>>>>
>>>> It was certainly the intention of the RDF 1.1 WG that every
>>>> literal should have a type. We even provided a special 'type'
>>>> for the @lang case, to preserve this intention. It seems to me
>>>> that one should not ever go wrong by including the
>>>> ^^xsd:string, which was semantically correct even in original
>>>> RDF, whereas really plain plain literals now have the shadow of
>>>> deprecation hanging over them, at the very least.
>>>>
>>>> Hope this helps.
>>>>
>>>> Pat Hayes
>>>>
>>>>
>>>> And for serialization, the WG intention IIRC was that all
>>>> ^^xsd:strings should be written without the ^^xsd:string in all
>>>> formats where possible.
>>>>
>>>>
>>>> Really? I have no recollection of that, but I may have missed
>>>> some discussions. Can you find this in the minutes or emails
>>>> anywhere?
>>>>
>>>>
>>>> I share Andy's recollection
>>>>
>>>>
>>>> OK, two is enough :-) I bow to your superior recollection, and
>>>> withdraw my implicit advice to use explicit xsd:string typing.
>>>> Apologies to all concerned.
>>>>
>>>>
>>>> I went looking (OK, a bit of looking) the first time but
>>>> couldn't find spec text except the MAY.  This discussion was
>>>> over an extended period.
>>>>
>>>> The examples for Turtle are without xsd:string (except to show
>>>> they are the same).
>>>>
>>>>> From memory, the line of argument was that simple literals
>>>>> were
>>>> more common than explicit ^^xsd:string though the community of
>>>> use is going to be a major factor.
>>>>
>>>> Like Gregg, Jena outputs without explicit datatype as the best
>>>> choice overall.
>>>>
>>>> Andy
>>>>
>>>>
>>>> Pat
>>>>
>>>> , and that is how my serializer behaves. Shame that the
>>>> spec-text doesn't cspture that.
>>>>
>>>> Gregg
>>>>
>>>> It look nicer.
>>>>
>>>>
>>>> Maybe, but it also can produce uncertainty, as for example:
>>>>
>>>> "Before rdf 1.1 the norm tended to be to NOT express xsd:string
>>>> unless it really was a character-by-character string (e.g. a
>>>> genome identifier), and not when it was human text (but in
>>>> unknown or mixed language)."
>>>>
>>>> Even in RDF 1.0, plain literals were specified to be
>>>> semantically identical to xsd:string-typed literals, but this
>>>> was buried in the semantics dociument which nobody read, and
>>>> because the syntactic distinction was available, people assumed
>>>> it meant something. As long as a syntax offers both choices,
>>>> this misreading process will continue to operate, even now RDF
>>>> 1.1 has said explicitly that plain literals are only syntactic
>>>> sugar for the typed version.
>>>>
>>>>
>>>> http://www.w3.org/TR/rdf11-__concepts/#section-Graph-__Literal
>>>>
>>>> <http://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal>
>>>> only says "MAY" -- that is mainly so as not to suggest much RDF
>>>> 1.0 data output by pre-existing software is suddenly
>>>> invalidated, which it isn't.
>>>>
>>>>
>>>> Certainly, plain literal surface syntax is not *invalidated* by
>>>> RDF 1.1. Sorry if I gave that impression.
>>>>
>>>> Pat
>>>>
>>>>
>>>>
>>>> Andy
>>>>
>>>>
>>>>
>>>> ..Or if I should have a special case to output anything with
>>>> type xsd:string as a classic "plain literal", e.g. no @ or ^^.
>>>>
>>>> Surely just one of these should be in the canonical version ?
>>>> My guts says to always include the type for non-lang, but the
>>>> spec is ambigous on this - if xsd:string is implied, should I
>>>> then prefer to generate this implied version?
>>>>
>>>> Before rdf 1.1 the norm tended to be to NOT express xsd:string
>>>> unless it really was a character-by-character string (e.g. a
>>>> genome identifier), and not when it was human text (but in
>>>> unknown or mixed language).
>>>>
>>>> As we SHOULD be generating the Canonical N-Triples, then it
>>>> would be good to know if there already is a silent de facto
>>>> agreement that is just not expressed in the spec.
>>>>
>>>> You might know the code base -
>>>>
>>>> https://github.com/stain/__commons-rdf/blob/tests/src/__test/java/com/github/__commonsrdf/dummyimpl/__LiteralImpl.java#L99
>>>>
>>>>
>>>>
>>>>
<https://github.com/stain/commons-rdf/blob/tests/src/test/java/com/github/commonsrdf/dummyimpl/LiteralImpl.java#L99>
>>>>
>>>>
>>>> On 27 Dec 2014 17:14, "Peter Ansell" <ansell.peter@gmail.com
>>>> <mailto:ansell.peter@gmail.com>> wrote: Hi Stian,
>>>>
>>>> RDF-1.1 does not have the concept of plain literals [1]. Hence,
>>>> it is difficult to map the OWL-WG-derived rdf:PlainLiteral set
>>>> to RDF-1.1, if that is where you are coming at the issue from
>>>> [2].
>>>>
>>>> Cheers,
>>>>
>>>> Peter
>>>>
>>>> [1]
>>>>
>>>> http://www.w3.org/TR/2014/REC-__rdf11-concepts-20140225/#__section-Graph-Literal
>>>>
>>>>
>>>>
>>>>
<http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#section-Graph-Literal>
>>>>
>>>> [2] https://github.com/owlcs/__owlapi/issues/172
>>>> <https://github.com/owlcs/owlapi/issues/172>
>>>>
>>>> On 27 December 2014 at 16:37, Stian Soiland-Reyes
>>>> <soiland-reyes@cs.manchester.__ac.uk
>>>> <mailto:soiland-reyes@cs.manchester.ac.uk>> wrote:
>>>>
>>>> In
>>>>
>>>> http://www.w3.org/TR/n-__triples/#canonical-ntriples
>>>>
>>>> <http://www.w3.org/TR/n-triples/#canonical-ntriples> I read:
>>>>
>>>> Canonical N-Triples has the following additional constraints
>>>> on layout:
>>>>
>>>> The whitespace following subject, predicate, and object MUST be
>>>> a single space, (U+0020). All other locations that allow
>>>> whitespace MUST be empty. There MUST be no comments. HEX MUST
>>>> use only uppercase letters ([A-F]). Characters MUST NOT be
>>>> represented by UCHAR. Within STRING_LITERAL_QUOTE, only the
>>>> characters U+0022, U+005C, U+000A, U+000D are encoded using
>>>> ECHAR. ECHAR MUST NOT be used for characters that are allowed
>>>> directly in STRING_LITERAL_QUOTE.
>>>>
>>>>
>>>>
>>>> and in
>>>>
>>>> http://www.w3.org/TR/n-__triples/#sec-parsing-terms
>>>>
>>>> <http://www.w3.org/TR/n-triples/#sec-parsing-terms>
>>>>
>>>> If neither a language tag nor a datatype IRI is provided, the
>>>> literal has a datatype of xsd:string.
>>>>
>>>>
>>>>
>>>> and in
>>>>
>>>> http://www.w3.org/TR/n-__triples/#sec-literals
>>>>
>>>> <http://www.w3.org/TR/n-triples/#sec-literals>
>>>>
>>>> If there is no datatype IRI and no language tag it is a simple
>>>> literal and the datatype is
>>>>
>>>> http://www.w3.org/2001/__XMLSchema#string
>>>>
>>>> <http://www.w3.org/2001/XMLSchema#string>.
>>>>
>>>>
>>>> Example 3 <http://example.org/show/218>
>>>>
>>>> <http://www.w3.org/2000/01/__rdf-schema#label
>>>>
>>>> <http://www.w3.org/2000/01/rdf-schema#label>> "That Seventies
>>>>
>>>> Show"^^<http://www.w3.org/__2001/XMLSchema#string
>>>>
>>>> <http://www.w3.org/2001/XMLSchema#string>> . # literal with XML
>>>> Schema string datatype <http://example.org/show/218>
>>>>
>>>> <http://www.w3.org/2000/01/__rdf-schema#label
>>>>
>>>> <http://www.w3.org/2000/01/rdf-schema#label>> "That Seventies
>>>> Show" . # same as above
>>>>
>>>>
>>>>
>>>> So I am not any wiser with regards to how to serialize plain
>>>> literals in RDF 1.1 Canoical N-Triples..
>>>>
>>>>
>>>> Are both of the two examples allowed in Canonical N-Triples?
>>>> (it seems so by the spec.. :-( ).
>>>>
>>>> Which variant should I generate?
>>>>
>>>>
>>>> -- Stian Soiland-Reyes, myGrid team School of Computer Science
>>>> The University of Manchester
>>>> http://soiland-reyes.com/__stian/work/
>>>> <http://soiland-reyes.com/stian/work/>
>>>> http://orcid.org/0000-0001-__9842-9718
>>>> <http://orcid.org/0000-0001-9842-9718>
>>>>
>>>>
>>>>
>>>> ------------------------------__------------------------------
>>>> IHMC (850)434 8903 home 40 South Alcaniz St.
>>>> (850)202 4416 office Pensacola
>>>> (850)202 4440   fax FL 32502
>>>> (850)291 0667   mobile (preferred) phayes@ihmc.us
>>>> <mailto:phayes@ihmc.us> http://www.ihmc.us/users/__phayes
>>>> <http://www.ihmc.us/users/phayes>
>>>>
>>>>
>>>>
>>>> ------------------------------__------------------------------
>>>> IHMC                                     (850)434 8903 home 40
>>>> South Alcaniz St.            (850)202 4416   office Pensacola
>>>> (850)202 4440   fax FL 32502
>>>> (850)291 0667 mobile (preferred) phayes@ihmc.us
>>>> <mailto:phayes@ihmc.us> http://www.ihmc.us/users/__phayes
>>>> <http://www.ihmc.us/users/phayes>
>>>>
>>>>
>>>> ------------------------------__------------------------------
>>>> IHMC                                     (850)434 8903 home 40
>>>> South Alcaniz St.            (850)202 4416   office Pensacola
>>>> (850)202 4440   fax FL 32502
>>>> (850)291 0667   mobile (preferred) phayes@ihmc.us
>>>> <mailto:phayes@ihmc.us> http://www.ihmc.us/users/__phayes
>>>> <http://www.ihmc.us/users/phayes>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>
>>
>
> ------------------------------------------------------------ IHMC
> (850)434 8903 home 40 South Alcaniz St.            (850)202 4416
> office Pensacola                            (850)202 4440   fax FL
> 32502                              (850)291 0667   mobile
> (preferred) phayes@ihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>
>
>
>
>
Received on Monday, 29 December 2014 20:09:28 UTC