Re: Fwd: Language Tag Case Conflict (between RDF1.1 and BCP47) from Andy Seaborne on 2014-01-20 (public-rdf-wg@w3.org from January 2014)

From: Andy Seaborne <andy@apache.org>
Date: Mon, 20 Jan 2014 13:52:57 +0000
To: public-rdf-wg@w3.org
Message-ID: <52DD2A39.8050301@apache.org>
We have finalized the text except for clearly editorial fixes; we have 
done the testing and gathered reports.  Anything that is a visible 
change by anyone's reading invalidates that and we have to go round the 
cycle from LC again.

The proposed text is a change to RDF - it moves language tag equality 
from the value space to the stored form (it connects the MAY text to the 
MUST NOT text).

However good an idea is, it's too late (and I wanted to insist on BCP47 
normalization!).

	Andy


On 20/01/14 10:13, Richard Cyganiak wrote:
> Taking this to the WG list.
>
>  From a technical point of view, this is a reasonable comment and might be worth considering; at least a pointer to the BCP 47 normalisation strategy would be helpful for implementers.
>
> But from a process point of view, the window for RDF 1.1 might be closed? What options still exist at this point?
>
> Best,
> Richard
>
>
>
> Begin forwarded message:
>
>> Resent-From: public-rdf-comments@w3.org
>> From: "Vladimir Alexiev" <vladimir.alexiev@ontotext.com>
>> Subject: Re: Language Tag Case Conflict (between RDF1.1 and BCP47)
>> Date: 20 January 2014 09:02:20 GMT
>> To: <public-rdf-comments@w3.org>
>> Reply-To: <vladimir.alexiev@ontotext.com>
>>
>> RDF1.1 talks about lowercasing of language tags:
>> http://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal
>> "Lexical representations of language tags may be converted to lower case. The value space of language tags is always in lower case."
>>
>> Normalizing tags makes them easier to compare. Lowercasing is a CHEAP way to normalize. However, there is a BETTER way, I'll call it BCP47-normalization:
>>
>> BCP47 2.1.1. "Formatting of Language Tags" says:
>> "Although case distinctions do not carry meaning in language tags, consistent formatting and presentation of language tags will aid users.  The format of subtags in the registry is RECOMMENDED as the form to use in language tags.  This format generally corresponds to the common conventions for the various ISO standards from which the subtags are derived."
>> and goes on to describe that:
>> - script codes are in capital case (e.g. 'Cyrl' Cyrillic).
>> - country codes are in upper case (e.g. 'MN' Mongolia).
>>
>> I posted an algorithm to do BCP47-normalization (in Perl) in Sep 2013:
>> https://rt.cpan.org/Public/Ticket/Attachment/1267147/670949/lang_normalize.pl
>>
>> There are issues posted against these implementations:
>> - Sesame RIO:
>> https://openrdf.atlassian.net/browse/SES-1659
>> https://openrdf.atlassian.net/browse/SES-1999
>> - perl RDF::Trine::Node::Literal:
>> https://rt.cpan.org/Public/Bug/Display.html?id=88964
>> - Note: Jena appears to store the lang tag as provided, which IMHO is better than storing as lowercase:
>> http://grepcode.com/file_/repo1.maven.org/maven2/org.apache.jena/jena-core/2.11.0/com/hp/hpl/jena/graph/impl/LiteralLabelImpl.java/?v=source
>>
>> I therefore propose to change the above RDF1.1 text to:
>>
>> "Lexical representations of language tags MAY be normalized, according to BCP47 section 2.1.1. "Formatting of Language Tags" (country codes in upper case, script codes capitalized, the rest in lower case).
>> Language tags MAY also be normalized by converting all to lower case, but BCP47 normalization is preferred.
>> No matter which method is chosen, the semantics of language tags MUST NOT depend on case.
>> In particular, implementations MUST NOT store as separate statements, two statements that differ only by the case of language tags."
>>
>> I also propose to say this about SPARQL lang()
>> http://www.w3.org/TR/rdf-sparql-query/#func-lang
>> "lang() MAY normalize the language tag as described in RDF 1.1 Concepts and Abstract Syntax sec 3.3 Literals.
>> It is recommended that lang() normalizes the literal according to BCP47 section 2.1.1, and not by converting it all to lower case."
>>
>> Best regards!
>> --
>> Vladimir Alexiev, PhD, PMP
>> Lead, Data and Ontology Management Group
>> Ontotext Corp, www.ontotext.com
>> Sirma Group Holding, www.sirma.com
>> Email: vladimir.alexiev@ontotext.com, skype:valexiev1
>> Mobile: +359 888 568 132, SMS: 359888568132@sms.mtel.net
>> Landline: +359 (988) 106 084, Fax: +359 (2) 975 3226
>> Calendar: https://www.google.com/calendar/embed?src=vladimir%40sirma.bg
>>
>>
>>
>>
>
>
Received on Monday, 20 January 2014 13:53:29 UTC