Re: NTriple review from Dave Beckett on 2002-11-11 (w3c-rdfcore-wg@w3.org from November 2002)

From: Dave Beckett <dave.beckett@bristol.ac.uk>
Date: Mon, 11 Nov 2002 12:31:06 +0000
To: Jeremy Carroll <jjc@hplb.hpl.hp.com>
cc: w3c-rdfcore-wg@w3.org
Message-ID: <22454.1037017866@hoth.ilrt.bris.ac.uk>

>>>Jeremy Carroll said:
> 
> >>language ::= [a-z0-9][a-z0-9-]+
> >>
> >>(delete ref to REC-xml#sec-lang-tag)
> > 
> > 
> > Why?  I guess this is incomplete since it is refering obliquely to
> > multiple changing RFCs.  Is checking this unimportant?  Is it defined
> > elsewhere that is better pointed at?
> 
> This is editorial at this point, sounds as though we should stick with 
> what you've got.
> 
> > 
> > If I used the above defn, it would be good to explain where it came
> > from.
> 
> If you ant that then something like grahams text

ant=want ? :)

> [[
>     The language tag is composed of one or more parts: A primary language
>     subtag and a (possibly empty) series of subsequent subtags.
> 
>     The syntax of this tag in ABNF [RFC 2234] is:
> 
>      Language-Tag = Primary-subtag *( "-" Subtag )
> 
>      Primary-subtag = 1*8ALPHA
> 
>      Subtag = 1*8(ALPHA / DIGIT)
> 
>     The productions ALPHA and DIGIT are imported from RFC 2234; they
>     denote respectively the characters A to Z in upper or lower case and
>     the digits from 0 to 9.  The character "-" is HYPHEN-MINUS (ABNF:
>     %x2D).
> ]]
> 
> is the relevant stuff from RFC 3066, XML got burnt because this was a 
> change from RFC 1766 which XML initially copied.

Yes, that's what I was thinking of.


If you are happy with this, I'll make a change, trying try to put
this in terms of this syntax; see below

> In terms of N-triple syntax, a minimal change to your text would be
> 
> language ::= ( character - ('.'|'^' | ws )) +
> 
> to avoid the ambiguity on datatyping, keeping the comment.

Hmm, the EBNF we are using from
http://www.w3.org/TR/REC-xml#sec-notation can't express the length
restrictions of RFC3066 on the primary-subtag and subtag.

so at best we can have:

  language ::= [A-Za-z0-9]+ ('-' [A-Za-z0-9]+ )?

or if we go for lowercase only

  language ::= [a-z0-9]+ ('-' [a-z0-9]+ )?

I'm prefering the latter I think; with pointers to the RFC3066
section above.  The current N-Triples language definition is too far
away from the RFC3066 etc. version.

Dave

Received on Monday, 11 November 2002 07:32:12 UTC