- From: Hong Sun <hong.sun@agfa.com>
- Date: Tue, 2 Apr 2013 10:46:20 +0200
- To: eric@w3.org
- Cc: David Wood <david@3roundstones.com>, "Eric Prud'hommeaux" <ericw3c@gmail.com>, gavin@carothers.name, Ivan Herman <ivan@w3.org>, public-rdf-comments Comments <public-rdf-comments@w3.org>, jjc@syapse.com
- Message-ID: <OF191FDF83.9AD29A20-ONC1257B41.002BECC9-C1257B41.003030AC@agfa.com>
From Eric, >Of course, this also depends on RDF Concepts changing "and must be >normalized to lowercase." to something like ". While BCP47 >section-2.1.1 specifies the appropriate case for various sub-language >forms, RDF treats as equal all variations in case. For example, a >literal "中国" with a language tag of "zh-Hans" is the same term as >that literal with a language tag of "zh-hans" or "zh-HANS"". --Indeed, my confusion comes from "The language tag must be well-formed according to section 2.2.9 of [BCP47], and must be normalized to lowercase", if the expression can be adapted as something like Eric suggested, there will be no confusion. From Jeremy, >Crucially, the result of comparing two language tags should not be sensitive to the case of the original input. >My view, is that the answer to Hong Sun is that if it is important to him for other reasons to leave the country code in upper case, then he should, >and he can get correct RDF 1.1 conformant operation by systematic case normalizing language tags using any case normalization he so choses including >that from BCP47. RDF 1.0 is clear on that point, and to the extent that RDF 1.1 does not allow that, it is a defect. --It is not important for me whether the tag should be lower case or upper case, I just want to make sure when I convert data to RDF format, it is following the right format. Following what you stated, can I draw the following conclusions? 1. "text"@en-GB is equal to "text"@en-gb 2. It is better to use "text"@en-gb in RDF 1.1 Thanks! kind regards, Hong From: Eric Prud'hommeaux <eric@w3.org> To: Hong Sun/AXIFX/AGFA@AGFA Cc: gavin@carothers.name, David Wood <david@3roundstones.com>, Ivan Herman <ivan@w3.org>, public-rdf-comments Comments <public-rdf-comments@w3.org> Date: 03/29/2013 05:52 PM Subject: [PROPOSAL: Turtle: EDITORIAL. Concepts: Non-ED] Re: Language Tag Case Conflict (between RDF1.1 and BCP47) Sent by: "Eric Prud'hommeaux" <ericw3c@gmail.com> * Hong Sun <hong.sun@agfa.com> [2013-03-29 16:50+0100] > Thanks Gavin, > > You are right, it is mainly for the RDF1.1 concept. if we're expecting that Turtle will be a lot of people's RDF HowTo, perhaps it's worth a bit of informative text in the bottom of 2.5.1 Quoted Literals <http://www.w3.org/TR/turtle/#turtle-literals> [[ Note that RDF treats language tags as case-insensitive. "中国"@zh-Hans and "中国"@zh-hans are treated as the same node. Per BCP47 section-2.1.1, the latter is not a correct representation. ]] Of course, this also depends on RDF Concepts changing "and must be normalized to lowercase." to something like ". While BCP47 section-2.1.1 specifies the appropriate case for various sub-language forms, RDF treats as equal all variations in case. For example, a literal "中国" with a language tag of "zh-Hans" is the same term as that literal with a language tag of "zh-hans" or "zh-HANS"". This nudges people in the right direction, but doesn't tell them what case to expect if they SPARQL for lantag("中国"@zh-Hans). I believe most systems will give you the case used in the first variant that they encountered so this is the strongest statement we can make. The I18N folks might look askance at having ill-formed language tag examples in Concepts and Turtle, but may, if it's too late to demand BCP47 conformance, see that as preferable to ambiguity. > Kind regards, > Hong > > > > From: Gavin Carothers <gavin@carothers.name> > To: David Wood <david@3roundstones.com> > Cc: Ivan Herman <ivan@w3.org>, public-rdf-comments Comments > <public-rdf-comments@w3.org>, Hong Sun/AXIFX/AGFA@AGFA > Date: 03/29/2013 04:30 PM > Subject: Re: Language Tag Case Conflict (between RDF1.1 and BCP47) > > > > > > > On Fri, Mar 29, 2013 at 8:13 AM, David Wood <david@3roundstones.com> > wrote: > Hi Hong Sun and Ivan, > > Do I understand correctly that these comments apply to Turtle? > > Editors option: No, concerns about the case sensitivity, or lack of case > sensitivity do not effect Turtle. Turtle is very careful to avoid doing > anything other then refer to RDF Concepts 1.1. The grammar allows for any > case of language tag segments and specifically passes the tag along as a > "unicode string" no attempt to normalize the language is made as part of > the Turtle specification. That's up to the rest of the RDF stack. We > chickened out. > > The only change would be to examples in the Turtle if the WG does decide > one way or the other. > > Cheers, > Gavin > > > Regards, > Dave > -- > http://about.me/david_wood > > > > On Mar 29, 2013, at 06:09, Ivan Herman <ivan@w3.org> wrote: > > Hong Sun does not have the right credentials to send the mail to the WG > mailing list, so I forward this to the comment list for processing and > archiving! > > Hong Sun, thank you. > > Ivan > > Begin forwarded message: > > From: Hong Sun <hong.sun@agfa.com> > Subject: [Moderator Action] Language Tag Case Conflict (between RDF1.1 and > BCP47) > Date: March 29, 2013 10:43:18 GMT+01:00 > To: public-rdf-wg@w3.org > > Dear All, > > I am working on processing text with language tag, but reading the RDF 1.1 > specification, I found there is a conflict in choosing the case for a > language tag. > > In RDF1.1 > http://www.w3.org/TR/2013/WD-rdf11-concepts-20130115/#dfn-language-tag > It is stated > """ > a non-empty language tag as defined by [BCP47]. The language tag must be > well-formed according to section 2.2.9 of [BCP47], and must be normalized > to lowercase. > """ > which is together with the following example: > show:218 show:localName "Cette Série des Années Septante"@fr-be . # > literal with a region subtag > > > But taking a look at BCP47 > http://tools.ietf.org/html/bcp47#section-2.2.9 , it states > """ > For example, one might use a tag such as "no-QQ", where 'QQ' > is one of a range of private use ISO 3166-1 codes to indicate an > otherwise undefined region. > """ > > An even more clear recommendation is given in this document in > http://tools.ietf.org/html/bcp47#section-2.1.1 > """ > All subtags, including extension and private > use subtags, use lowercase letters with two exceptions: two-letter > and four-letter subtags that neither appear at the start of the tag > nor occur after singletons. Such two-letter subtags are all > uppercase (as in the tags "en-CA-x-ca" or "sgn-BE-FR") and four- > letter subtags are titlecase (as in the tag "az-Latn-x-latn"). > """ > > In short, it seems that: > according to RDF1.1, we should uses de-ch, > in BCP47, it recommends to use de-CH, > and meanwhile RDF1.1 also states language tag must be well-formed > according to [BCP47]. > > So now there is a conflict, and which exactly should we use? > > > In addition, in the other specifictions, Turtle does not care the case, > while N3 now also use lower case for sub-tag, e.g. de-ch. > > http://www.w3.org/2000/10/swap/grammar/n3.n3 > """ > # was: "[a-zA-Z][a-zA-Z0-9]*(-[a-zA-Z0-9]+)?"; > langcode cfg:matches "[a-z]+(-[a-z0-9]+)*"; # > http://www.w3.org/TR/rdf-testcases/#language > cfg:canStartWith "a". > """ > > Is it possible to treat the language tag as case-insensitive? As Andy > Seaborne suggested in > http://lists.w3.org/Archives/Public/public-rdf-wg/2013Feb/0275.html > > Thanks! > > Kind Regards, > > Hong Sun | Agfa HealthCare > Researcher | HE/Advanced Clinical Applications Research > T +32 3444 8108 > > http://www.agfahealthcare.com > http://blog.agfahealthcare.com > Click on link to read important disclaimer: > http://www.agfahealthcare.com/maildisclaimer > > > ---- > Ivan Herman, W3C Semantic Web Activity Lead > Home: http://www.w3.org/People/Ivan/ > mobile: +31-641044153 > FOAF: http://www.ivan-herman.net/foaf.rdf > > > > > > > -- -ericP
Received on Tuesday, 2 April 2013 08:46:44 UTC