Re: Plain literals in Canonical N-triples from Stian Soiland-Reyes on 2014-12-29 (public-rdf-comments@w3.org from December 2014)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Mon, 29 Dec 2014 11:50:23 -0600
To: Andy Seaborne <andy@apache.org>
Cc: Gregg Kellogg <gregg@greggkellogg.com>, Pat Hayes <phayes@ihmc.us>, "public-rdf-comments@w3.org" <public-rdf-comments@w3.org>
Message-ID: <CAPRnXtnZqdRszoiik2p42gMwyVRSY2wLYH0_TqyVQRELUmeaFA@mail.gmail.com>
OK, thank you all for recollecting! So I'll settle for the "naked" literal
in output of an xsd:string.

Should this go into an errata or is it too much of a change?
On 29 Dec 2014 07:41, "Andy Seaborne" <andy@apache.org> wrote:

> On 29/12/14 06:31, Pat Hayes wrote:
>
>>
>> On Dec 28, 2014, at 6:10 PM, Gregg Kellogg <gregg@greggkellogg.com>
>> wrote:
>>
>>  On Dec 28, 2014, at 3:32 PM, Pat Hayes <phayes@ihmc.us> wrote:
>>>
>>>>
>>>>
>>>>  On Dec 28, 2014, at 5:40 AM, Andy Seaborne <andy@apache.org> wrote:
>>>>>
>>>>>  On 28/12/14 05:04, Pat Hayes wrote:
>>>>>>
>>>>>>  On Dec 27, 2014, at 9:24 PM, Stian Soiland-Reyes <
>>>>>>> soiland-reyes@cs.manchester.ac.uk> wrote:
>>>>>>>
>>>>>>> No, for once I am not coming from OWL :)
>>>>>>>
>>>>>>> I'm just writing a simple n-triples serializer, and I am not sure if
>>>>>>> I should simply always include the type if there is no @lang (e.g.
>>>>>>> ^^xsd:string)
>>>>>>>
>>>>>>
>>>>>> It was certainly the intention of the RDF 1.1 WG that every literal
>>>>>> should have a type. We even provided a special 'type' for the @lang case,
>>>>>> to preserve this intention. It seems to me that one should not ever go
>>>>>> wrong by including the ^^xsd:string, which was semantically correct even in
>>>>>> original RDF, whereas really plain plain literals now have the shadow of
>>>>>> deprecation hanging over them, at the very least.
>>>>>>
>>>>>> Hope this helps.
>>>>>>
>>>>>> Pat Hayes
>>>>>>
>>>>>
>>>>> And for serialization, the WG intention IIRC was that all
>>>>> ^^xsd:strings should be written without the ^^xsd:string in all formats
>>>>> where possible.
>>>>>
>>>>
>>>> Really? I have no recollection of that, but I may have missed some
>>>> discussions. Can you find this in the minutes or emails anywhere?
>>>>
>>>
>>> I share Andy's recollection
>>>
>>
>> OK, two is enough :-) I bow to your superior recollection, and withdraw
>> my implicit advice to use explicit xsd:string typing. Apologies to all
>> concerned.
>>
>
> I went looking (OK, a bit of looking) the first time but couldn't find
> spec text except the MAY.  This discussion was over an extended period.
>
> The examples for Turtle are without xsd:string (except to show they are
> the same).
>
> From memory, the line of argument was that simple literals were more
> common than explicit ^^xsd:string though the community of use is going to
> be a major factor.
>
> Like Gregg, Jena outputs without explicit datatype as the best choice
> overall.
>
>         Andy
>
>
>> Pat
>>
>>  , and that is how my serializer behaves.
>>> Shame that the spec-text doesn't cspture that.
>>>
>>> Gregg
>>>
>>>  It look nicer.
>>>>>
>>>>
>>>> Maybe, but it also can produce uncertainty, as for example:
>>>>
>>>> "Before rdf 1.1 the norm tended to be to NOT express xsd:string unless
>>>> it really was a character-by-character string (e.g. a genome identifier),
>>>> and not when it was human text (but in unknown or mixed language)."
>>>>
>>>> Even in RDF 1.0, plain literals were specified to be semantically
>>>> identical to xsd:string-typed literals, but this was buried in the
>>>> semantics dociument which nobody read, and because the syntactic
>>>> distinction was available, people assumed it meant something. As long as a
>>>> syntax offers both choices, this misreading process will continue to
>>>> operate, even now RDF 1.1 has said explicitly that plain literals are only
>>>> syntactic sugar for the typed version.
>>>>
>>>>  http://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal only says
>>>>> "MAY" -- that is mainly so as not to suggest much RDF 1.0 data output by
>>>>> pre-existing software is suddenly invalidated, which it isn't.
>>>>>
>>>>
>>>> Certainly, plain literal surface syntax is not *invalidated* by RDF
>>>> 1.1. Sorry if I gave that impression.
>>>>
>>>> Pat
>>>>
>>>>
>>>>
>>>>>    Andy
>>>>>
>>>>>
>>>>>
>>>>>>  ..Or if I should have a special case to output anything with type
>>>>>>> xsd:string as a classic "plain literal", e.g. no @ or ^^.
>>>>>>>
>>>>>>> Surely just one of these should be in the canonical version ? My
>>>>>>> guts says to always include the type for non-lang, but the spec is ambigous
>>>>>>> on this - if xsd:string is implied, should I then prefer to generate this
>>>>>>> implied version?
>>>>>>>
>>>>>>> Before rdf 1.1 the norm tended to be to NOT express xsd:string
>>>>>>> unless it really was a character-by-character string (e.g. a genome
>>>>>>> identifier), and not when it was human text (but in unknown or mixed
>>>>>>> language).
>>>>>>>
>>>>>>> As we SHOULD be generating the Canonical N-Triples, then it would be
>>>>>>> good to know if there already is a silent de facto agreement that is just
>>>>>>> not expressed in the spec.
>>>>>>>
>>>>>>> You might know the code base -
>>>>>>> https://github.com/stain/commons-rdf/blob/tests/src/
>>>>>>> test/java/com/github/commonsrdf/dummyimpl/LiteralImpl.java#L99
>>>>>>>
>>>>>>> On 27 Dec 2014 17:14, "Peter Ansell" <ansell.peter@gmail.com> wrote:
>>>>>>> Hi Stian,
>>>>>>>
>>>>>>> RDF-1.1 does not have the concept of plain literals [1]. Hence, it is
>>>>>>> difficult to map the OWL-WG-derived rdf:PlainLiteral set to RDF-1.1,
>>>>>>> if that is where you are coming at the issue from [2].
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Peter
>>>>>>>
>>>>>>> [1] http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#
>>>>>>> section-Graph-Literal
>>>>>>> [2] https://github.com/owlcs/owlapi/issues/172
>>>>>>>
>>>>>>> On 27 December 2014 at 16:37, Stian Soiland-Reyes
>>>>>>> <soiland-reyes@cs.manchester.ac.uk> wrote:
>>>>>>>
>>>>>>>> In http://www.w3.org/TR/n-triples/#canonical-ntriples I read:
>>>>>>>>
>>>>>>>>  Canonical N-Triples has the following additional constraints on
>>>>>>>>> layout:
>>>>>>>>>
>>>>>>>>>    The whitespace following subject, predicate, and object MUST be
>>>>>>>>> a single space, (U+0020). All other locations that allow whitespace MUST be
>>>>>>>>> empty.
>>>>>>>>>    There MUST be no comments.
>>>>>>>>>    HEX MUST use only uppercase letters ([A-F]).
>>>>>>>>>    Characters MUST NOT be represented by UCHAR.
>>>>>>>>>    Within STRING_LITERAL_QUOTE, only the characters U+0022,
>>>>>>>>> U+005C, U+000A, U+000D are encoded using ECHAR. ECHAR MUST NOT be used for
>>>>>>>>> characters that are allowed directly in STRING_LITERAL_QUOTE.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> and in http://www.w3.org/TR/n-triples/#sec-parsing-terms
>>>>>>>>
>>>>>>>>  If neither a language tag nor a datatype IRI is provided, the
>>>>>>>>> literal has a datatype of xsd:string.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> and in http://www.w3.org/TR/n-triples/#sec-literals
>>>>>>>>
>>>>>>>>  If there is no datatype IRI and no language tag it is a simple
>>>>>>>>> literal and the datatype is http://www.w3.org/2001/
>>>>>>>>> XMLSchema#string.
>>>>>>>>>
>>>>>>>>
>>>>>>>>  Example 3
>>>>>>>>>    <http://example.org/show/218> <http://www.w3.org/2000/01/
>>>>>>>>> rdf-schema#label> "That Seventies Show"^^<http://www.w3.org/
>>>>>>>>> 2001/XMLSchema#string> . # literal with XML Schema string datatype
>>>>>>>>>    <http://example.org/show/218> <http://www.w3.org/2000/01/
>>>>>>>>> rdf-schema#label> "That Seventies Show" . # same as above
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> So I am not any wiser with regards to how to serialize plain
>>>>>>>> literals
>>>>>>>> in RDF 1.1 Canoical N-Triples..
>>>>>>>>
>>>>>>>>
>>>>>>>> Are both of the two examples allowed in Canonical N-Triples? (it
>>>>>>>> seems
>>>>>>>> so by the spec.. :-( ).
>>>>>>>>
>>>>>>>> Which variant should I generate?
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Stian Soiland-Reyes, myGrid team
>>>>>>>> School of Computer Science
>>>>>>>> The University of Manchester
>>>>>>>> http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-
>>>>>>>> 9842-9718
>>>>>>>>
>>>>>>>
>>>>>> ------------------------------------------------------------
>>>>>> IHMC                                     (850)434 8903 home
>>>>>> 40 South Alcaniz St.            (850)202 4416   office
>>>>>> Pensacola                            (850)202 4440   fax
>>>>>> FL 32502                              (850)291 0667   mobile
>>>>>> (preferred)
>>>>>> phayes@ihmc.us       http://www.ihmc.us/users/phayes
>>>>>>
>>>>>
>>>> ------------------------------------------------------------
>>>> IHMC                                     (850)434 8903 home
>>>> 40 South Alcaniz St.            (850)202 4416   office
>>>> Pensacola                            (850)202 4440   fax
>>>> FL 32502                              (850)291 0667   mobile (preferred)
>>>> phayes@ihmc.us       http://www.ihmc.us/users/phayes
>>>>
>>>
>> ------------------------------------------------------------
>> IHMC                                     (850)434 8903 home
>> 40 South Alcaniz St.            (850)202 4416   office
>> Pensacola                            (850)202 4440   fax
>> FL 32502                              (850)291 0667   mobile (preferred)
>> phayes@ihmc.us       http://www.ihmc.us/users/phayes
>>
>>
>>
>>
>>
>>
>>
>
>
Received on Monday, 29 December 2014 17:50:52 UTC