Re: adding {}s to grammar to address I18N-ISSUE-189

On Sun, Oct 7, 2012 at 5:32 AM, Eric Prud'hommeaux <eric@w3.org> wrote:
> The LC grammar includes a LANGTAG production
>
>   [144s] LANGTAG ::= '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*
>
> which doesn't match the one in BCP 47
>
>          obs-language-tag = primary-subtag *( "-" subtag )
>          primary-subtag   = 1*8ALPHA
>          subtag           = 1*8(ALPHA / DIGIT)
>
> (Basically, Turtle is too liberal in what it permits in a LANGTAG.)
> The proposal from I18N was to reference
>   http://tools.ietf.org/html/bcp47#section-2.1
> which could mean one of:
>
> 1 remove the production rule and include instead (coursly) href the bcp47 defn.
> 2 preserve our production and href the bcp47 rule informatively
> 3 preserve our production and href the bcp47 rule normatively
> 4 align our production and href the bcp47 rule normatively
>
> I've mocked up #4 in the editor's draft (my pref). See the last
> sentence of
> http://www.w3.org/2011/rdf-wg/wiki/I18n-Comments#189:_.5BS.5D_reference_obs-language-tag_instead_of_defining_your_own
> for all the links.

None of 1-4 provides an enhancement to the state of language tag
parsing in Turtle. In order to use the grammar to test for a valid
language tag it must be compared to the complete registration list,
and be a legal composition. For even the lower bar of testing for a
well formed language tag a much more complex grammar must be used. All
of these solutions would simply add complexity without any real gain
to anyone. RDF Concepts already requires, with a MUST no less,
that "The language tag must be well-formed according to section
2.2.9", these additions to Turtle aren't enough to do that. Either we
need to go all the way and specify the exhaustive grammar for well
formedness or leave this alone and let something up stream of the
parser confirm well formedness.

--Gavin


> --
> -ericP
>

Received on Sunday, 7 October 2012 14:35:37 UTC