W3C home > Mailing lists > Public > public-rdf-comments@w3.org > December 2014

Re: Plain literals in Canonical N-triples

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 29 Dec 2014 13:38:43 -0600
Cc: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>, "public-rdf-comments@w3.org" <public-rdf-comments@w3.org>
Message-Id: <AC6C7869-79E8-47DB-8E99-3028CADF277E@ihmc.us>
To: David Booth <david@dbooth.org>

On Dec 29, 2014, at 12:52 PM, David Booth <david@dbooth.org> wrote:

> P.S. Or to put it differently, it would be harmful if anyone interpreted the existing ambiguity to be intentional.

Well, there is no actual ambiguity. In RDF 1.1, the datatype of plain literals (without a language tag) is xsd:string, unambiguously. That type URI appears explictly in the RDF 1.1 abstract (graph) syntax, unambiguously. But the RDF specs do not define all possible surface syntaxes for RDF, and they explicitly allow a surface syntax to omit the xsd:string typing URI as a form of syntactic sugar, since it is implied in all cases, so its omission does not introduce any ambiguity.

Pat

> 
> On 12/29/2014 01:36 PM, David Booth wrote:
>> FWIW, it certainly seems to me like this detail was omitted
>> unintentionally and would be helpful to include in the errata.
>> 
>> David Booth
>> 
>> On 12/29/2014 12:50 PM, Stian Soiland-Reyes wrote:
>>> OK, thank you all for recollecting! So I'll settle for the "naked"
>>> literal in output of an xsd:string.
>>> 
>>> Should this go into an errata or is it too much of a change?
>>> 
>>> On 29 Dec 2014 07:41, "Andy Seaborne" <andy@apache.org
>>> <mailto:andy@apache.org>> wrote:
>>> 
>>>    On 29/12/14 06:31, Pat Hayes wrote:
>>> 
>>> 
>>>        On Dec 28, 2014, at 6:10 PM, Gregg Kellogg
>>>        <gregg@greggkellogg.com <mailto:gregg@greggkellogg.com>> wrote:
>>> 
>>>            On Dec 28, 2014, at 3:32 PM, Pat Hayes <phayes@ihmc.us
>>>            <mailto:phayes@ihmc.us>> wrote:
>>> 
>>> 
>>> 
>>>                    On Dec 28, 2014, at 5:40 AM, Andy Seaborne
>>>                    <andy@apache.org <mailto:andy@apache.org>> wrote:
>>> 
>>>                        On 28/12/14 05:04, Pat Hayes wrote:
>>> 
>>>                            On Dec 27, 2014, at 9:24 PM, Stian
>>>                            Soiland-Reyes
>>>                            <soiland-reyes@cs.manchester.__ac.uk
>>>                            <mailto:soiland-reyes@cs.manchester.ac.uk>>
>>>                            wrote:
>>> 
>>>                            No, for once I am not coming from OWL :)
>>> 
>>>                            I'm just writing a simple n-triples
>>>                            serializer, and I am not sure if I should
>>>                            simply always include the type if there is
>>>                            no @lang (e.g. ^^xsd:string)
>>> 
>>> 
>>>                        It was certainly the intention of the RDF 1.1 WG
>>>                        that every literal should have a type. We even
>>>                        provided a special 'type' for the @lang case, to
>>>                        preserve this intention. It seems to me that one
>>>                        should not ever go wrong by including the
>>>                        ^^xsd:string, which was semantically correct
>>>                        even in original RDF, whereas really plain plain
>>>                        literals now have the shadow of deprecation
>>>                        hanging over them, at the very least.
>>> 
>>>                        Hope this helps.
>>> 
>>>                        Pat Hayes
>>> 
>>> 
>>>                    And for serialization, the WG intention IIRC was
>>>                    that all ^^xsd:strings should be written without the
>>>                    ^^xsd:string in all formats where possible.
>>> 
>>> 
>>>                Really? I have no recollection of that, but I may have
>>>                missed some discussions. Can you find this in the
>>>                minutes or emails anywhere?
>>> 
>>> 
>>>            I share Andy's recollection
>>> 
>>> 
>>>        OK, two is enough :-) I bow to your superior recollection, and
>>>        withdraw my implicit advice to use explicit xsd:string typing.
>>>        Apologies to all concerned.
>>> 
>>> 
>>>    I went looking (OK, a bit of looking) the first time but couldn't
>>>    find spec text except the MAY.  This discussion was over an extended
>>>    period.
>>> 
>>>    The examples for Turtle are without xsd:string (except to show they
>>>    are the same).
>>> 
>>>     >From memory, the line of argument was that simple literals were
>>>    more common than explicit ^^xsd:string though the community of use
>>>    is going to be a major factor.
>>> 
>>>    Like Gregg, Jena outputs without explicit datatype as the best
>>>    choice overall.
>>> 
>>>             Andy
>>> 
>>> 
>>>        Pat
>>> 
>>>            , and that is how my serializer behaves.
>>>            Shame that the spec-text doesn't cspture that.
>>> 
>>>            Gregg
>>> 
>>>                    It look nicer.
>>> 
>>> 
>>>                Maybe, but it also can produce uncertainty, as for
>>> example:
>>> 
>>>                "Before rdf 1.1 the norm tended to be to NOT express
>>>                xsd:string unless it really was a character-by-character
>>>                string (e.g. a genome identifier), and not when it was
>>>                human text (but in unknown or mixed language)."
>>> 
>>>                Even in RDF 1.0, plain literals were specified to be
>>>                semantically identical to xsd:string-typed literals, but
>>>                this was buried in the semantics dociument which nobody
>>>                read, and because the syntactic distinction was
>>>                available, people assumed it meant something. As long as
>>>                a syntax offers both choices, this misreading process
>>>                will continue to operate, even now RDF 1.1 has said
>>>                explicitly that plain literals are only syntactic sugar
>>>                for the typed version.
>>> 
>>> 
>>> http://www.w3.org/TR/rdf11-__concepts/#section-Graph-__Literal
>>> 
>>> <http://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal>
>>>                    only says "MAY" -- that is mainly so as not to
>>>                    suggest much RDF 1.0 data output by pre-existing
>>>                    software is suddenly invalidated, which it isn't.
>>> 
>>> 
>>>                Certainly, plain literal surface syntax is not
>>>                *invalidated* by RDF 1.1. Sorry if I gave that
>>> impression.
>>> 
>>>                Pat
>>> 
>>> 
>>> 
>>>                        Andy
>>> 
>>> 
>>> 
>>>                            ..Or if I should have a special case to
>>>                            output anything with type xsd:string as a
>>>                            classic "plain literal", e.g. no @ or ^^.
>>> 
>>>                            Surely just one of these should be in the
>>>                            canonical version ? My guts says to always
>>>                            include the type for non-lang, but the spec
>>>                            is ambigous on this - if xsd:string is
>>>                            implied, should I then prefer to generate
>>>                            this implied version?
>>> 
>>>                            Before rdf 1.1 the norm tended to be to NOT
>>>                            express xsd:string unless it really was a
>>>                            character-by-character string (e.g. a genome
>>>                            identifier), and not when it was human text
>>>                            (but in unknown or mixed language).
>>> 
>>>                            As we SHOULD be generating the Canonical
>>>                            N-Triples, then it would be good to know if
>>>                            there already is a silent de facto agreement
>>>                            that is just not expressed in the spec.
>>> 
>>>                            You might know the code base -
>>> 
>>> https://github.com/stain/__commons-rdf/blob/tests/src/__test/java/com/github/__commonsrdf/dummyimpl/__LiteralImpl.java#L99
>>> 
>>> 
>>> <https://github.com/stain/commons-rdf/blob/tests/src/test/java/com/github/commonsrdf/dummyimpl/LiteralImpl.java#L99>
>>> 
>>> 
>>>                            On 27 Dec 2014 17:14, "Peter Ansell"
>>>                            <ansell.peter@gmail.com
>>>                            <mailto:ansell.peter@gmail.com>> wrote:
>>>                            Hi Stian,
>>> 
>>>                            RDF-1.1 does not have the concept of plain
>>>                            literals [1]. Hence, it is
>>>                            difficult to map the OWL-WG-derived
>>>                            rdf:PlainLiteral set to RDF-1.1,
>>>                            if that is where you are coming at the issue
>>>                            from [2].
>>> 
>>>                            Cheers,
>>> 
>>>                            Peter
>>> 
>>>                            [1]
>>> 
>>> http://www.w3.org/TR/2014/REC-__rdf11-concepts-20140225/#__section-Graph-Literal
>>> 
>>> 
>>> <http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/#section-Graph-Literal>
>>> 
>>>                            [2]
>>>                            https://github.com/owlcs/__owlapi/issues/172
>>>                            <https://github.com/owlcs/owlapi/issues/172>
>>> 
>>>                            On 27 December 2014 at 16:37, Stian
>>>                            Soiland-Reyes
>>>                            <soiland-reyes@cs.manchester.__ac.uk
>>>                            <mailto:soiland-reyes@cs.manchester.ac.uk>>
>>>                            wrote:
>>> 
>>>                                In
>>> 
>>> http://www.w3.org/TR/n-__triples/#canonical-ntriples
>>> 
>>> <http://www.w3.org/TR/n-triples/#canonical-ntriples>
>>>                                I read:
>>> 
>>>                                    Canonical N-Triples has the
>>>                                    following additional constraints on
>>>                                    layout:
>>> 
>>>                                        The whitespace following
>>>                                    subject, predicate, and object MUST
>>>                                    be a single space, (U+0020). All
>>>                                    other locations that allow
>>>                                    whitespace MUST be empty.
>>>                                        There MUST be no comments.
>>>                                        HEX MUST use only uppercase
>>>                                    letters ([A-F]).
>>>                                        Characters MUST NOT be
>>>                                    represented by UCHAR.
>>>                                        Within STRING_LITERAL_QUOTE,
>>>                                    only the characters U+0022, U+005C,
>>>                                    U+000A, U+000D are encoded using
>>>                                    ECHAR. ECHAR MUST NOT be used for
>>>                                    characters that are allowed directly
>>>                                    in STRING_LITERAL_QUOTE.
>>> 
>>> 
>>> 
>>>                                and in
>>> 
>>> http://www.w3.org/TR/n-__triples/#sec-parsing-terms
>>> 
>>> <http://www.w3.org/TR/n-triples/#sec-parsing-terms>
>>> 
>>>                                    If neither a language tag nor a
>>>                                    datatype IRI is provided, the
>>>                                    literal has a datatype of xsd:string.
>>> 
>>> 
>>> 
>>>                                and in
>>> 
>>> http://www.w3.org/TR/n-__triples/#sec-literals
>>> 
>>> <http://www.w3.org/TR/n-triples/#sec-literals>
>>> 
>>>                                    If there is no datatype IRI and no
>>>                                    language tag it is a simple literal
>>>                                    and the datatype is
>>> 
>>> http://www.w3.org/2001/__XMLSchema#string
>>> 
>>> <http://www.w3.org/2001/XMLSchema#string>.
>>> 
>>> 
>>>                                    Example 3
>>>                                        <http://example.org/show/218>
>>> 
>>> <http://www.w3.org/2000/01/__rdf-schema#label
>>> 
>>> <http://www.w3.org/2000/01/rdf-schema#label>>
>>>                                    "That Seventies
>>> 
>>> Show"^^<http://www.w3.org/__2001/XMLSchema#string
>>> 
>>> <http://www.w3.org/2001/XMLSchema#string>>
>>>                                    . # literal with XML Schema string
>>>                                    datatype
>>>                                        <http://example.org/show/218>
>>> 
>>> <http://www.w3.org/2000/01/__rdf-schema#label
>>> 
>>> <http://www.w3.org/2000/01/rdf-schema#label>>
>>>                                    "That Seventies Show" . # same as
>>> above
>>> 
>>> 
>>> 
>>>                                So I am not any wiser with regards to
>>>                                how to serialize plain literals
>>>                                in RDF 1.1 Canoical N-Triples..
>>> 
>>> 
>>>                                Are both of the two examples allowed in
>>>                                Canonical N-Triples? (it seems
>>>                                so by the spec.. :-( ).
>>> 
>>>                                Which variant should I generate?
>>> 
>>> 
>>>                                --
>>>                                Stian Soiland-Reyes, myGrid team
>>>                                School of Computer Science
>>>                                The University of Manchester
>>>                                http://soiland-reyes.com/__stian/work/
>>>                                <http://soiland-reyes.com/stian/work/>
>>>                                http://orcid.org/0000-0001-__9842-9718
>>>                                <http://orcid.org/0000-0001-9842-9718>
>>> 
>>> 
>>> 
>>> ------------------------------__------------------------------
>>>                        IHMC
>>>                          (850)434 8903 home
>>>                        40 South Alcaniz St.            (850)202 4416
>>>                          office
>>>                        Pensacola                            (850)202
>>>                        4440   fax
>>>                        FL 32502                              (850)291
>>>                        0667   mobile (preferred)
>>>                        phayes@ihmc.us <mailto:phayes@ihmc.us>
>>>                        http://www.ihmc.us/users/__phayes
>>>                        <http://www.ihmc.us/users/phayes>
>>> 
>>> 
>>> 
>>> ------------------------------__------------------------------
>>>                IHMC                                     (850)434 8903
>>> home
>>>                40 South Alcaniz St.            (850)202 4416   office
>>>                Pensacola                            (850)202 4440   fax
>>>                FL 32502                              (850)291 0667
>>>                  mobile (preferred)
>>>                phayes@ihmc.us <mailto:phayes@ihmc.us>
>>>                http://www.ihmc.us/users/__phayes
>>>                <http://www.ihmc.us/users/phayes>
>>> 
>>> 
>>>        ------------------------------__------------------------------
>>>        IHMC                                     (850)434 8903 home
>>>        40 South Alcaniz St.            (850)202 4416   office
>>>        Pensacola                            (850)202 4440   fax
>>>        FL 32502                              (850)291 0667   mobile
>>>        (preferred)
>>>        phayes@ihmc.us <mailto:phayes@ihmc.us>
>>>        http://www.ihmc.us/users/__phayes
>>> <http://www.ihmc.us/users/phayes>
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> 
>> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 home
40 South Alcaniz St.            (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile (preferred)
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 29 December 2014 19:39:17 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:30:00 UTC