Re: Plain literals in Canonical N-triples from Andy Seaborne on 2014-12-29 (public-rdf-comments@w3.org from December 2014)

From: Andy Seaborne <andy@apache.org>
Date: Mon, 29 Dec 2014 13:39:16 +0000
To: Pat Hayes <phayes@ihmc.us>, Gregg Kellogg <gregg@greggkellogg.com>
CC: "public-rdf-comments@w3.org" <public-rdf-comments@w3.org>
Message-ID: <54A15984.4040606@apache.org>
On 29/12/14 06:31, Pat Hayes wrote:
>
> On Dec 28, 2014, at 6:10 PM, Gregg Kellogg <gregg@greggkellogg.com> wrote:
>
>> On Dec 28, 2014, at 3:32 PM, Pat Hayes <phayes@ihmc.us> wrote:
>>>
>>>
>>>> On Dec 28, 2014, at 5:40 AM, Andy Seaborne <andy@apache.org> wrote:
>>>>
>>>>> On 28/12/14 05:04, Pat Hayes wrote:
>>>>>
>>>>>> On Dec 27, 2014, at 9:24 PM, Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk> wrote:
>>>>>>
>>>>>> No, for once I am not coming from OWL :)
>>>>>>
>>>>>> I'm just writing a simple n-triples serializer, and I am not sure if I should simply always include the type if there is no @lang (e.g. ^^xsd:string)
>>>>>
>>>>> It was certainly the intention of the RDF 1.1 WG that every literal should have a type. We even provided a special 'type' for the @lang case, to preserve this intention. It seems to me that one should not ever go wrong by including the ^^xsd:string, which was semantically correct even in original RDF, whereas really plain plain literals now have the shadow of deprecation hanging over them, at the very least.
>>>>>
>>>>> Hope this helps.
>>>>>
>>>>> Pat Hayes
>>>>
>>>> And for serialization, the WG intention IIRC was that all ^^xsd:strings should be written without the ^^xsd:string in all formats where possible.
>>>
>>> Really? I have no recollection of that, but I may have missed some discussions. Can you find this in the minutes or emails anywhere?
>>
>> I share Andy's recollection
>
> OK, two is enough :-) I bow to your superior recollection, and withdraw my implicit advice to use explicit xsd:string typing. Apologies to all concerned.

I went looking (OK, a bit of looking) the first time but couldn't find 
spec text except the MAY.  This discussion was over an extended period.

The examples for Turtle are without xsd:string (except to show they are 
the same).

 From memory, the line of argument was that simple literals were more 
common than explicit ^^xsd:string though the community of use is going 
to be a major factor.

Like Gregg, Jena outputs without explicit datatype as the best choice 
overall.

	Andy

>
> Pat
>
>> , and that is how my serializer behaves.
>> Shame that the spec-text doesn't cspture that.
>>
>> Gregg
>>
>>>> It look nicer.
>>>
>>> Maybe, but it also can produce uncertainty, as for example:
>>>
>>> "Before rdf 1.1 the norm tended to be to NOT express xsd:string unless it really was a character-by-character string (e.g. a genome identifier), and not when it was human text (but in unknown or mixed language)."
>>>
>>> Even in RDF 1.0, plain literals were specified to be semantically identical to xsd:string-typed literals, but this was buried in the semantics dociument which nobody read, and because the syntactic distinction was available, people assumed it meant something. As long as a syntax offers both choices, this misreading process will continue to operate, even now RDF 1.1 has said explicitly that plain literals are only syntactic sugar for the typed version.
>>>
>>>> http://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal only says "MAY" -- that is mainly so as not to suggest much RDF 1.0 data output by pre-existing software is suddenly invalidated, which it isn't.
>>>
>>> Certainly, plain literal surface syntax is not *invalidated* by RDF 1.1. Sorry if I gave that impression.
>>>
>>> Pat
>>>
>>>
>>>>
>>>>    Andy
>>>>
>>>>
>>>>>
>>>>>> ..Or if I should have a special case to output anything with type xsd:string as a classic "plain literal", e.g. no @ or ^^.
>>>>>>
>>>>>> Surely just one of these should be in the canonical version ? My guts says to always include the type for non-lang, but the spec is ambigous on this - if xsd:string is implied, should I then prefer to generate this implied version?
>>>>>>
>>>>>> Before rdf 1.1 the norm tended to be to NOT express xsd:string unless it really was a character-by-character string (e.g. a genome identifier), and not when it was human text (but in unknown or mixed language).
>>>>>>
>>>>>> As we SHOULD be generating the Canonical N-Triples, then it would be good to know if there already is a silent de facto agreement that is just not expressed in the spec.
>>>>>>
>>>>>> You might know the code base -
>>>>>> https://github.com/stain/commons-rdf/blob/tests/src/test/java/com/github/commonsrdf/dummyimpl/LiteralImpl.java#L99
>>>>>>
>>>>>> On 27 Dec 2014 17:14, "Peter Ansell" <ansell.peter@gmail.com> wrote:
>>>>>> Hi Stian,
>>>>>>
>>>>>> RDF-1.1 does not have the concept of plain literals [1]. Hence, it is
>>>>>> difficult to map the OWL-WG-derived rdf:PlainLiteral set to RDF-1.1,
>>>>>> if that is where you are coming at the issue from [2].
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Peter
>>>>>>
>>>>>> [1] http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#section-Graph-Literal
>>>>>> [2] https://github.com/owlcs/owlapi/issues/172
>>>>>>
>>>>>> On 27 December 2014 at 16:37, Stian Soiland-Reyes
>>>>>> <soiland-reyes@cs.manchester.ac.uk> wrote:
>>>>>>> In http://www.w3.org/TR/n-triples/#canonical-ntriples I read:
>>>>>>>
>>>>>>>> Canonical N-Triples has the following additional constraints on layout:
>>>>>>>>
>>>>>>>>    The whitespace following subject, predicate, and object MUST be a single space, (U+0020). All other locations that allow whitespace MUST be empty.
>>>>>>>>    There MUST be no comments.
>>>>>>>>    HEX MUST use only uppercase letters ([A-F]).
>>>>>>>>    Characters MUST NOT be represented by UCHAR.
>>>>>>>>    Within STRING_LITERAL_QUOTE, only the characters U+0022, U+005C, U+000A, U+000D are encoded using ECHAR. ECHAR MUST NOT be used for characters that are allowed directly in STRING_LITERAL_QUOTE.
>>>>>>>
>>>>>>>
>>>>>>> and in http://www.w3.org/TR/n-triples/#sec-parsing-terms
>>>>>>>
>>>>>>>> If neither a language tag nor a datatype IRI is provided, the literal has a datatype of xsd:string.
>>>>>>>
>>>>>>>
>>>>>>> and in http://www.w3.org/TR/n-triples/#sec-literals
>>>>>>>
>>>>>>>> If there is no datatype IRI and no language tag it is a simple literal and the datatype is http://www.w3.org/2001/XMLSchema#string.
>>>>>>>
>>>>>>>> Example 3
>>>>>>>>    <http://example.org/show/218> <http://www.w3.org/2000/01/rdf-schema#label> "That Seventies Show"^^<http://www.w3.org/2001/XMLSchema#string> . # literal with XML Schema string datatype
>>>>>>>>    <http://example.org/show/218> <http://www.w3.org/2000/01/rdf-schema#label> "That Seventies Show" . # same as above
>>>>>>>
>>>>>>>
>>>>>>> So I am not any wiser with regards to how to serialize plain literals
>>>>>>> in RDF 1.1 Canoical N-Triples..
>>>>>>>
>>>>>>>
>>>>>>> Are both of the two examples allowed in Canonical N-Triples? (it seems
>>>>>>> so by the spec.. :-( ).
>>>>>>>
>>>>>>> Which variant should I generate?
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Stian Soiland-Reyes, myGrid team
>>>>>>> School of Computer Science
>>>>>>> The University of Manchester
>>>>>>> http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718
>>>>>
>>>>> ------------------------------------------------------------
>>>>> IHMC                                     (850)434 8903 home
>>>>> 40 South Alcaniz St.            (850)202 4416   office
>>>>> Pensacola                            (850)202 4440   fax
>>>>> FL 32502                              (850)291 0667   mobile (preferred)
>>>>> phayes@ihmc.us       http://www.ihmc.us/users/phayes
>>>
>>> ------------------------------------------------------------
>>> IHMC                                     (850)434 8903 home
>>> 40 South Alcaniz St.            (850)202 4416   office
>>> Pensacola                            (850)202 4440   fax
>>> FL 32502                              (850)291 0667   mobile (preferred)
>>> phayes@ihmc.us       http://www.ihmc.us/users/phayes
>
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 home
> 40 South Alcaniz St.            (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile (preferred)
> phayes@ihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>
>
Received on Monday, 29 December 2014 13:39:50 UTC