W3C home > Mailing lists > Public > www-international@w3.org > October to December 2006

Re: rfc4646. Some analysis code

From: John Cowan <cowan@ccil.org>
Date: Tue, 7 Nov 2006 14:59:00 -0500
To: Dave Pawson <dave.pawson@gmail.com>
Cc: I18N <www-international@w3.org>
Message-ID: <20061107195900.GE19712@ccil.org>

Dave Pawson scripsit:

> No problem.... but how does it fit in with the
>   langtag       = (language
>                    ["-" script]
>                    ["-" region]
>                    *("-" variant)
>                    *("-" extension)
> 
> grouping? You make it sound like an alternate? is that right?

Exactly.  You can see the new draft ABNF at
http://inter-locale.com/ID/draft-ietf-ltru-4646bis-01.html
(when replaced, it'll become -02, -03, etc. etc.)

> >"Grandfathered" is a semantic concept (the meaning of the tag cannot be
> >deduced from its parts); "irregular" a syntactic one (the tag cannot be
> >parsed into parts using the regular parsing algorithms).  All irregular
> >tags are grandfathered, but not all grandfathered tags are irregular.
> >Unfortunately this distinction was not clarified until after 4646 was
> >published.
> 
> Mmmm. I won't pretend to understand that! I guess the discussion
> was more than the resulting text :-)

Okay, let me unpack that a bit.

Most 4646 language tags follow the general pattern of language-script-
region-variant, with all but the first part optional.  The ABNF makes it
possible to (a) recognize a well-formed tag and (b) take it apart into
the four components.  Then you can look in the Language Subtag Registry
at http://www.iana.org/assignments/language-subtag-registry to find out
what the various subtags mean.

There are some exceptions, however, based on tags that were registered
before we adopted these rules.  For example, "sgn" means "sign languages"
and "US" means "in the United States", but "sgn-us" does not mean "any
sign language used in the United States", it means the specific sign
language called "American Sign Language".  A tag like this has the
regular form, but its meaning is grandfathered.  You can recognize a
tag like this using the ABNF, but if you try to understand its meaning
piece by piece, you get the wrong answer.  All such tags are listed in
the Language Subtag Registry.

Furthermore, some of the grandfathered tags are also irregular: they don't
match the language-script-region-variant pattern at all, and you cannot
take them apart.  "i-hak" is an example of this: it means "Hakka Chinese".
These tags are also listed in the Language Subtag Registry.

> One more if I may.
> 
> Same topic.
> 
> grandfathered = 1*3ALPHA 1*2("-" (2*8alphanum))
>                   ; grandfathered registration
>                   ; Note: i is the only singleton
>                   ; that starts a grandfathered tag
> 
> Why wasn't it
>        i 1*2ALPHA ......
> Any particular reason?

That's not what the comment is telling you.  It says that various
2ALPHA and 3ALPHA subtags can begin a grandfathered tag, but the only
1ALPHA subtag ("singleton") that can begin one is "i".  Anyhow,
that doesn't matter if you just use the list of 17 irregular tags
directly and not worry about this definition.

-- 
Take two turkeys, one goose, four               John Cowan
cabbages, but no duck, and mix them             http://www.ccil.org/~cowan
together. After one taste, you'll duck          cowan@ccil.org
soup the rest of your life.
        --Groucho
Received on Tuesday, 7 November 2006 19:59:17 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:08 GMT