W3C home > Mailing lists > Public > public-rdf-comments@w3.org > March 2013

Fwd: Language Tag Case Conflict (between RDF1.1 and BCP47)

From: Ivan Herman <ivan@w3.org>
Date: Fri, 29 Mar 2013 11:09:03 +0100
Cc: Hong Sun <hong.sun@agfa.com>
Message-Id: <2AABBD49-44C7-4BF1-90AC-847B22AF1CE7@w3.org>
To: public-rdf-comments Comments <public-rdf-comments@w3.org>
Hong Sun does not have the right credentials to send the mail to the WG mailing list, so I forward this to the comment list for processing and archiving!

Hong Sun, thank you.

Ivan

Begin forwarded message:

> From: Hong Sun <hong.sun@agfa.com>
> Subject: [Moderator Action] Language Tag Case Conflict (between RDF1.1 and BCP47)
> Date: March 29, 2013 10:43:18 GMT+01:00
> To: public-rdf-wg@w3.org
> 
> Dear All, 
> 
> I am working on processing text with language tag, but reading the RDF 1.1 specification, I found there is a conflict in choosing the case for a language tag. 
> 
> In RDF1.1 
> http://www.w3.org/TR/2013/WD-rdf11-concepts-20130115/#dfn-language-tag 
> It is stated 
> """ 
> a non-empty language tag as defined by [BCP47]. The language tag must be well-formed according to section 2.2.9 of [BCP47], and must be normalized to lowercase. 
> """ 
> which is together with the following example: 
> show:218 show:localName "Cette Série des Années Septante"@fr-be .  # literal with a region subtag 
> 
> 
> But taking a look at BCP47 
> http://tools.ietf.org/html/bcp47#section-2.2.9        , it states 
> """ 
> For example, one might use a tag such as "no-QQ", where 'QQ' 
>    is one of a range of private use ISO 3166-1 codes to indicate an 
>    otherwise undefined region. 
> """ 
> 
> An even more clear recommendation is given in this document in 
> http://tools.ietf.org/html/bcp47#section-2.1.1 
> """ 
> All subtags, including extension and private 
>    use subtags, use lowercase letters with two exceptions: two-letter 
>    and four-letter subtags that neither appear at the start of the tag 
>    nor occur after singletons.  Such two-letter subtags are all 
>    uppercase (as in the tags "en-CA-x-ca" or "sgn-BE-FR") and four- 
>    letter subtags are titlecase (as in the tag "az-Latn-x-latn"). 
> """ 
> 
> In short, it seems that: 
> according to RDF1.1, we should uses de-ch,   
> in BCP47, it recommends to use de-CH, 
> and meanwhile RDF1.1 also states language tag must be well-formed according to [BCP47]. 
> 
> So now there is a conflict, and which exactly should we use? 
> 
> 
> In addition, in the other specifictions, Turtle does not care the case, while N3 now also use lower case for sub-tag, e.g. de-ch. 
> 
> http://www.w3.org/2000/10/swap/grammar/n3.n3 
> """ 
> # was: "[a-zA-Z][a-zA-Z0-9]*(-[a-zA-Z0-9]+)?"; 
> langcode        cfg:matches          "[a-z]+(-[a-z0-9]+)*"; # http://www.w3.org/TR/rdf-testcases/#language 
>                 cfg:canStartWith         "a". 
> """ 
> 
> Is it possible to treat the language tag as case-insensitive? As Andy Seaborne suggested in http://lists.w3.org/Archives/Public/public-rdf-wg/2013Feb/0275.html 
> 
> Thanks! 
> 
> Kind Regards,
> 
> Hong Sun | Agfa HealthCare
> Researcher | HE/Advanced Clinical Applications Research
> T  +32 3444 8108
> 
> http://www.agfahealthcare.com
> http://blog.agfahealthcare.com
> Click on link to read important disclaimer: http://www.agfahealthcare.com/maildisclaimer


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Friday, 29 March 2013 10:09:28 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:29:55 UTC