Re: Proposed NTriples changes for literal notation from Graham Klyne on 2002-03-08 (w3c-rdfcore-wg@w3.org from March 2002)

From: Graham Klyne <Graham.Klyne@MIMEsweeper.com>
Date: Fri, 08 Mar 2002 18:06:13 +0000
To: Dave Beckett <dave.beckett@bristol.ac.uk>
Cc: RDF Core <w3c-rdfcore-wg@w3.org>
Message-Id: <5.1.0.14.2.20020308180042.00a3b140@joy.songbird.com>

Dave,

Generally this looks good.  I have a couple of nits and comments, but don't 
feel strongly about the resolution:

At 05:38 PM 3/8/02 +0000, Dave Beckett wrote:

>This closes action:
>   2002-02-26#12  DaveB  Propose n-triples changes to represent the
>                         new form of rdf literals.
>   in F2F minutes http://www.w3.org/2001/sw/RDFCore/20020225-f2f/
>
>During the F2F I was actioned to come up with a syntax to support the
>structured literal form we agreed.  I discussed this with
>Dan Connolly what we could do and we agreed something like the
>following was sufficient:
>
>   xml("<b>foo</b>")              XML content, no language
>   xml("<b>foo</b>", "en")        XML content, language given "en"
>
>   "chat"                         Unicode string, no language
>   "chat"-en                      Unicode string, language given as "en"

It would feel more consistent to me to have:

   xml("<b>foo</b>"-en)        XML content, language given "en"

>Features:
>   * Makes all existing literals legal

Good!!!

>   * Provides only one way to encoded the literal-structures
>     and so in that sense is canonical.

Also good - simple-minded applications may still do string comparison, right?

>In order to try this out, I've implemented the above in my N-Triples
>parser, and it works just fine.
>
>Issues:
>   1. "chat"-en might not be good enough if languages can contain
>      whitespace or other things (I need to check the RFCs)
>    Solution if this is needed:
>      "chat"-"en"

Frpm RFC 3066:

[[[
2.1 Language tag syntax

    The language tag is composed of one or more parts: A primary language
    subtag and a (possibly empty) series of subsequent subtags.

    The syntax of this tag in ABNF [RFC 2234] is:

     Language-Tag = Primary-subtag *( "-" Subtag )

     Primary-subtag = 1*8ALPHA

     Subtag = 1*8(ALPHA / DIGIT)
]]]

so on that basis, quotes are not needed.

>   2. Want one way to describe all literal structures:
>     Solution:
>       literal(unicode string value, unicode string language, boolean isXML)
>     and define the abbreviated forms in terms of that

I don't see any value in this.

>   3. I assume "chat" != "chat"-"" (need to check language RFCs)
>     Solution if this is needed:
>       Restrict the language string to always 1+ chars

According to RFC 3066, a language tag may not be empty so this case 
shouldn't arise.  I think it would be consistent with suggestions for 
xml:lang to have a blank tag value mean no language tag.  Then:

     "chat" == "chat"-""

#g


------------------------------------------------------------
Graham Klyne                    MIMEsweeper Group
Strategic Research              <http://www.mimesweeper.com>
<Graham.Klyne@MIMEsweeper.com>

Received on Friday, 8 March 2002 13:07:13 UTC