W3C home > Mailing lists > Public > public-rdf-comments@w3.org > December 2014

Re: Plain literals in Canonical N-triples

From: David Booth <david@dbooth.org>
Date: Mon, 29 Dec 2014 14:18:53 -0500
Message-ID: <54A1A91D.1080507@dbooth.org>
To: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
CC: "public-rdf-comments@w3.org" <public-rdf-comments@w3.org>
FYI, there is discussion of this recorded in the minutes, corresponding 
to Andy's recollection:
http://www.w3.org/2011/rdf-wg/meeting/2012-10-29#line0832
http://www.w3.org/2011/rdf-wg/meeting/2011-06-01#line0364

David Booth

On 12/29/2014 01:36 PM, David Booth wrote:
> FWIW, it certainly seems to me like this detail was omitted
> unintentionally and would be helpful to include in the errata.
>
> David Booth
>
> On 12/29/2014 12:50 PM, Stian Soiland-Reyes wrote:
>> OK, thank you all for recollecting! So I'll settle for the "naked"
>> literal in output of an xsd:string.
>>
>> Should this go into an errata or is it too much of a change?
>>
>> On 29 Dec 2014 07:41, "Andy Seaborne" <andy@apache.org
>> <mailto:andy@apache.org>> wrote:
>>
>>     On 29/12/14 06:31, Pat Hayes wrote:
>>
>>
>>         On Dec 28, 2014, at 6:10 PM, Gregg Kellogg
>>         <gregg@greggkellogg.com <mailto:gregg@greggkellogg.com>> wrote:
>>
>>             On Dec 28, 2014, at 3:32 PM, Pat Hayes <phayes@ihmc.us
>>             <mailto:phayes@ihmc.us>> wrote:
>>
>>
>>
>>                     On Dec 28, 2014, at 5:40 AM, Andy Seaborne
>>                     <andy@apache.org <mailto:andy@apache.org>> wrote:
>>
>>                         On 28/12/14 05:04, Pat Hayes wrote:
>>
>>                             On Dec 27, 2014, at 9:24 PM, Stian
>>                             Soiland-Reyes
>>                             <soiland-reyes@cs.manchester.__ac.uk
>>                             <mailto:soiland-reyes@cs.manchester.ac.uk>>
>>                             wrote:
>>
>>                             No, for once I am not coming from OWL :)
>>
>>                             I'm just writing a simple n-triples
>>                             serializer, and I am not sure if I should
>>                             simply always include the type if there is
>>                             no @lang (e.g. ^^xsd:string)
>>
>>
>>                         It was certainly the intention of the RDF 1.1 WG
>>                         that every literal should have a type. We even
>>                         provided a special 'type' for the @lang case, to
>>                         preserve this intention. It seems to me that one
>>                         should not ever go wrong by including the
>>                         ^^xsd:string, which was semantically correct
>>                         even in original RDF, whereas really plain plain
>>                         literals now have the shadow of deprecation
>>                         hanging over them, at the very least.
>>
>>                         Hope this helps.
>>
>>                         Pat Hayes
>>
>>
>>                     And for serialization, the WG intention IIRC was
>>                     that all ^^xsd:strings should be written without the
>>                     ^^xsd:string in all formats where possible.
>>
>>
>>                 Really? I have no recollection of that, but I may have
>>                 missed some discussions. Can you find this in the
>>                 minutes or emails anywhere?
>>
>>
>>             I share Andy's recollection
>>
>>
>>         OK, two is enough :-) I bow to your superior recollection, and
>>         withdraw my implicit advice to use explicit xsd:string typing.
>>         Apologies to all concerned.
>>
>>
>>     I went looking (OK, a bit of looking) the first time but couldn't
>>     find spec text except the MAY.  This discussion was over an extended
>>     period.
>>
>>     The examples for Turtle are without xsd:string (except to show they
>>     are the same).
>>
>>      >From memory, the line of argument was that simple literals were
>>     more common than explicit ^^xsd:string though the community of use
>>     is going to be a major factor.
>>
>>     Like Gregg, Jena outputs without explicit datatype as the best
>>     choice overall.
>>
>>              Andy
>>
>>
>>         Pat
>>
>>             , and that is how my serializer behaves.
>>             Shame that the spec-text doesn't cspture that.
>>
>>             Gregg
>>
>>                     It look nicer.
>>
>>
>>                 Maybe, but it also can produce uncertainty, as for
>> example:
>>
>>                 "Before rdf 1.1 the norm tended to be to NOT express
>>                 xsd:string unless it really was a character-by-character
>>                 string (e.g. a genome identifier), and not when it was
>>                 human text (but in unknown or mixed language)."
>>
>>                 Even in RDF 1.0, plain literals were specified to be
>>                 semantically identical to xsd:string-typed literals, but
>>                 this was buried in the semantics dociument which nobody
>>                 read, and because the syntactic distinction was
>>                 available, people assumed it meant something. As long as
>>                 a syntax offers both choices, this misreading process
>>                 will continue to operate, even now RDF 1.1 has said
>>                 explicitly that plain literals are only syntactic sugar
>>                 for the typed version.
>>
>>
>> http://www.w3.org/TR/rdf11-__concepts/#section-Graph-__Literal
>>
>> <http://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal>
>>                     only says "MAY" -- that is mainly so as not to
>>                     suggest much RDF 1.0 data output by pre-existing
>>                     software is suddenly invalidated, which it isn't.
>>
>>
>>                 Certainly, plain literal surface syntax is not
>>                 *invalidated* by RDF 1.1. Sorry if I gave that
>> impression.
>>
>>                 Pat
>>
>>
>>
>>                         Andy
>>
>>
>>
>>                             ..Or if I should have a special case to
>>                             output anything with type xsd:string as a
>>                             classic "plain literal", e.g. no @ or ^^.
>>
>>                             Surely just one of these should be in the
>>                             canonical version ? My guts says to always
>>                             include the type for non-lang, but the spec
>>                             is ambigous on this - if xsd:string is
>>                             implied, should I then prefer to generate
>>                             this implied version?
>>
>>                             Before rdf 1.1 the norm tended to be to NOT
>>                             express xsd:string unless it really was a
>>                             character-by-character string (e.g. a genome
>>                             identifier), and not when it was human text
>>                             (but in unknown or mixed language).
>>
>>                             As we SHOULD be generating the Canonical
>>                             N-Triples, then it would be good to know if
>>                             there already is a silent de facto agreement
>>                             that is just not expressed in the spec.
>>
>>                             You might know the code base -
>>
>> https://github.com/stain/__commons-rdf/blob/tests/src/__test/java/com/github/__commonsrdf/dummyimpl/__LiteralImpl.java#L99
>>
>>
>> <https://github.com/stain/commons-rdf/blob/tests/src/test/java/com/github/commonsrdf/dummyimpl/LiteralImpl.java#L99>
>>
>>
>>                             On 27 Dec 2014 17:14, "Peter Ansell"
>>                             <ansell.peter@gmail.com
>>                             <mailto:ansell.peter@gmail.com>> wrote:
>>                             Hi Stian,
>>
>>                             RDF-1.1 does not have the concept of plain
>>                             literals [1]. Hence, it is
>>                             difficult to map the OWL-WG-derived
>>                             rdf:PlainLiteral set to RDF-1.1,
>>                             if that is where you are coming at the issue
>>                             from [2].
>>
>>                             Cheers,
>>
>>                             Peter
>>
>>                             [1]
>>
>> http://www.w3.org/TR/2014/REC-__rdf11-concepts-20140225/#__section-Graph-Literal
>>
>>
>> <http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#section-Graph-Literal>
>>
>>                             [2]
>>                             https://github.com/owlcs/__owlapi/issues/172
>>                             <https://github.com/owlcs/owlapi/issues/172>
>>
>>                             On 27 December 2014 at 16:37, Stian
>>                             Soiland-Reyes
>>                             <soiland-reyes@cs.manchester.__ac.uk
>>                             <mailto:soiland-reyes@cs.manchester.ac.uk>>
>>                             wrote:
>>
>>                                 In
>>
>> http://www.w3.org/TR/n-__triples/#canonical-ntriples
>>
>> <http://www.w3.org/TR/n-triples/#canonical-ntriples>
>>                                 I read:
>>
>>                                     Canonical N-Triples has the
>>                                     following additional constraints on
>>                                     layout:
>>
>>                                         The whitespace following
>>                                     subject, predicate, and object MUST
>>                                     be a single space, (U+0020). All
>>                                     other locations that allow
>>                                     whitespace MUST be empty.
>>                                         There MUST be no comments.
>>                                         HEX MUST use only uppercase
>>                                     letters ([A-F]).
>>                                         Characters MUST NOT be
>>                                     represented by UCHAR.
>>                                         Within STRING_LITERAL_QUOTE,
>>                                     only the characters U+0022, U+005C,
>>                                     U+000A, U+000D are encoded using
>>                                     ECHAR. ECHAR MUST NOT be used for
>>                                     characters that are allowed directly
>>                                     in STRING_LITERAL_QUOTE.
>>
>>
>>
>>                                 and in
>>
>> http://www.w3.org/TR/n-__triples/#sec-parsing-terms
>>
>> <http://www.w3.org/TR/n-triples/#sec-parsing-terms>
>>
>>                                     If neither a language tag nor a
>>                                     datatype IRI is provided, the
>>                                     literal has a datatype of xsd:string.
>>
>>
>>
>>                                 and in
>>
>> http://www.w3.org/TR/n-__triples/#sec-literals
>>
>> <http://www.w3.org/TR/n-triples/#sec-literals>
>>
>>                                     If there is no datatype IRI and no
>>                                     language tag it is a simple literal
>>                                     and the datatype is
>>
>> http://www.w3.org/2001/__XMLSchema#string
>>
>> <http://www.w3.org/2001/XMLSchema#string>.
>>
>>
>>                                     Example 3
>>                                         <http://example.org/show/218>
>>
>> <http://www.w3.org/2000/01/__rdf-schema#label
>>
>> <http://www.w3.org/2000/01/rdf-schema#label>>
>>                                     "That Seventies
>>
>> Show"^^<http://www.w3.org/__2001/XMLSchema#string
>>
>> <http://www.w3.org/2001/XMLSchema#string>>
>>                                     . # literal with XML Schema string
>>                                     datatype
>>                                         <http://example.org/show/218>
>>
>> <http://www.w3.org/2000/01/__rdf-schema#label
>>
>> <http://www.w3.org/2000/01/rdf-schema#label>>
>>                                     "That Seventies Show" . # same as
>> above
>>
>>
>>
>>                                 So I am not any wiser with regards to
>>                                 how to serialize plain literals
>>                                 in RDF 1.1 Canoical N-Triples..
>>
>>
>>                                 Are both of the two examples allowed in
>>                                 Canonical N-Triples? (it seems
>>                                 so by the spec.. :-( ).
>>
>>                                 Which variant should I generate?
>>
>>
>>                                 --
>>                                 Stian Soiland-Reyes, myGrid team
>>                                 School of Computer Science
>>                                 The University of Manchester
>>                                 http://soiland-reyes.com/__stian/work/
>>                                 <http://soiland-reyes.com/stian/work/>
>>                                 http://orcid.org/0000-0001-__9842-9718
>>                                 <http://orcid.org/0000-0001-9842-9718>
>>
>>
>>
>> ------------------------------__------------------------------
>>                         IHMC
>>                           (850)434 8903 home
>>                         40 South Alcaniz St.            (850)202 4416
>>                           office
>>                         Pensacola                            (850)202
>>                         4440   fax
>>                         FL 32502                              (850)291
>>                         0667   mobile (preferred)
>>                         phayes@ihmc.us <mailto:phayes@ihmc.us>
>>                         http://www.ihmc.us/users/__phayes
>>                         <http://www.ihmc.us/users/phayes>
>>
>>
>>
>> ------------------------------__------------------------------
>>                 IHMC                                     (850)434 8903
>> home
>>                 40 South Alcaniz St.            (850)202 4416   office
>>                 Pensacola                            (850)202 4440   fax
>>                 FL 32502                              (850)291 0667
>>                   mobile (preferred)
>>                 phayes@ihmc.us <mailto:phayes@ihmc.us>
>>                 http://www.ihmc.us/users/__phayes
>>                 <http://www.ihmc.us/users/phayes>
>>
>>
>>         ------------------------------__------------------------------
>>         IHMC                                     (850)434 8903 home
>>         40 South Alcaniz St.            (850)202 4416   office
>>         Pensacola                            (850)202 4440   fax
>>         FL 32502                              (850)291 0667   mobile
>>         (preferred)
>>         phayes@ihmc.us <mailto:phayes@ihmc.us>
>>         http://www.ihmc.us/users/__phayes
>> <http://www.ihmc.us/users/phayes>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
>
>
Received on Monday, 29 December 2014 19:19:21 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:30:00 UTC