[PROPOSAL: Turtle: EDITORIAL. Concepts: Non-ED] Re: Language Tag Case Conflict (between RDF1.1 and BCP47)

* Hong Sun <hong.sun@agfa.com> [2013-03-29 16:50+0100]
> Thanks Gavin,
> 
> You are right, it is mainly for the RDF1.1 concept.

if we're expecting that Turtle will be a lot of people's RDF HowTo,
perhaps it's worth a bit of informative text in the bottom of 2.5.1
Quoted Literals <http://www.w3.org/TR/turtle/#turtle-literals>

[[
Note that RDF treats language tags as case-insensitive. "中国"@zh-Hans
and "中国"@zh-hans are treated as the same node. Per BCP47
section-2.1.1, the latter is not a correct representation.
]]

Of course, this also depends on RDF Concepts changing "and must be
normalized to lowercase." to something like ". While BCP47
section-2.1.1 specifies the appropriate case for various sub-language
forms, RDF treats as equal all variations in case. For example, a
literal "中国" with a language tag of "zh-Hans" is the same term as
that literal with a language tag of "zh-hans" or "zh-HANS"".

This nudges people in the right direction, but doesn't tell them what
case to expect if they SPARQL for lantag("中国"@zh-Hans). I believe
most systems will give you the case used in the first variant that
they encountered so this is the strongest statement we can make.

The I18N folks might look askance at having ill-formed language tag
examples in Concepts and Turtle, but may, if it's too late to demand
BCP47 conformance, see that as preferable to ambiguity.


> Kind regards,
> Hong
> 
> 
> 
> From:   Gavin Carothers <gavin@carothers.name>
> To:     David Wood <david@3roundstones.com>
> Cc:     Ivan Herman <ivan@w3.org>, public-rdf-comments Comments 
> <public-rdf-comments@w3.org>, Hong Sun/AXIFX/AGFA@AGFA
> Date:   03/29/2013 04:30 PM
> Subject:        Re: Language Tag Case Conflict (between RDF1.1 and BCP47)
> 
> 
> 
> 
> 
> 
> On Fri, Mar 29, 2013 at 8:13 AM, David Wood <david@3roundstones.com> 
> wrote:
> Hi Hong Sun and Ivan,
> 
> Do I understand correctly that these comments apply to Turtle?
> 
> Editors option: No, concerns about the case sensitivity, or lack of case 
> sensitivity do not effect Turtle. Turtle is very careful to avoid doing 
> anything other then refer to RDF Concepts 1.1. The grammar allows for any 
> case of language tag segments and specifically passes the tag along as a 
> "unicode string" no attempt to normalize the language is made as part of 
> the Turtle specification. That's up to the rest of the RDF stack. We 
> chickened out.
> 
> The only change would be to examples in the Turtle if the WG does decide 
> one way or the other.
> 
> Cheers,
> Gavin
>  
> 
> Regards,
> Dave
> --
> http://about.me/david_wood
> 
> 
> 
> On Mar 29, 2013, at 06:09, Ivan Herman <ivan@w3.org> wrote:
> 
> Hong Sun does not have the right credentials to send the mail to the WG 
> mailing list, so I forward this to the comment list for processing and 
> archiving!
> 
> Hong Sun, thank you.
> 
> Ivan
> 
> Begin forwarded message:
> 
> From: Hong Sun <hong.sun@agfa.com>
> Subject: [Moderator Action] Language Tag Case Conflict (between RDF1.1 and 
> BCP47)
> Date: March 29, 2013 10:43:18 GMT+01:00
> To: public-rdf-wg@w3.org
> 
> Dear All, 
> 
> I am working on processing text with language tag, but reading the RDF 1.1 
> specification, I found there is a conflict in choosing the case for a 
> language tag. 
> 
> In RDF1.1 
> http://www.w3.org/TR/2013/WD-rdf11-concepts-20130115/#dfn-language-tag 
> It is stated 
> """ 
> a non-empty language tag as defined by [BCP47]. The language tag must be 
> well-formed according to section 2.2.9 of [BCP47], and must be normalized 
> to lowercase. 
> """ 
> which is together with the following example: 
> show:218 show:localName "Cette Série des Années Septante"@fr-be .  # 
> literal with a region subtag 
> 
> 
> But taking a look at BCP47 
> http://tools.ietf.org/html/bcp47#section-2.2.9        , it states 
> """ 
> For example, one might use a tag such as "no-QQ", where 'QQ' 
>    is one of a range of private use ISO 3166-1 codes to indicate an 
>    otherwise undefined region. 
> """ 
> 
> An even more clear recommendation is given in this document in 
> http://tools.ietf.org/html/bcp47#section-2.1.1 
> """ 
> All subtags, including extension and private 
>    use subtags, use lowercase letters with two exceptions: two-letter 
>    and four-letter subtags that neither appear at the start of the tag 
>    nor occur after singletons.  Such two-letter subtags are all 
>    uppercase (as in the tags "en-CA-x-ca" or "sgn-BE-FR") and four- 
>    letter subtags are titlecase (as in the tag "az-Latn-x-latn"). 
> """ 
> 
> In short, it seems that: 
> according to RDF1.1, we should uses de-ch,   
> in BCP47, it recommends to use de-CH, 
> and meanwhile RDF1.1 also states language tag must be well-formed 
> according to [BCP47]. 
> 
> So now there is a conflict, and which exactly should we use? 
> 
> 
> In addition, in the other specifictions, Turtle does not care the case, 
> while N3 now also use lower case for sub-tag, e.g. de-ch. 
> 
> http://www.w3.org/2000/10/swap/grammar/n3.n3 
> """ 
> # was: "[a-zA-Z][a-zA-Z0-9]*(-[a-zA-Z0-9]+)?"; 
> langcode        cfg:matches          "[a-z]+(-[a-z0-9]+)*"; # 
> http://www.w3.org/TR/rdf-testcases/#language 
>                 cfg:canStartWith         "a". 
> """ 
> 
> Is it possible to treat the language tag as case-insensitive? As Andy 
> Seaborne suggested in 
> http://lists.w3.org/Archives/Public/public-rdf-wg/2013Feb/0275.html 
> 
> Thanks! 
> 
> Kind Regards,
> 
> Hong Sun | Agfa HealthCare
> Researcher | HE/Advanced Clinical Applications Research
> T  +32 3444 8108
> 
> http://www.agfahealthcare.com
> http://blog.agfahealthcare.com
> Click on link to read important disclaimer: 
> http://www.agfahealthcare.com/maildisclaimer 
> 
> 
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> FOAF: http://www.ivan-herman.net/foaf.rdf
> 
> 
> 
> 
> 
> 
> 

-- 
-ericP

Received on Friday, 29 March 2013 16:52:46 UTC