Re: Reversing the debate. from Andy Seaborne on 2011-09-27 (public-rdf-wg@w3.org from September 2011)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Tue, 27 Sep 2011 08:55:07 +0100
To: public-rdf-wg@w3.org
Message-ID: <4E81815B.5050506@epimorphics.com>
On 27/09/11 08:19, Jan Wielemaker wrote:
> Sandro,
>
> On 09/27/2011 07:01 AM, Sandro Hawke wrote:
>> Jan, FYI I strongly agree with your intuitions here -- perhaps not
>
> :-) When I saw the first bits of this discussion I was under the
> impression there would be a lot of debate, but in the end it would
> land around the 3a proposal. I couldn't see it otherwise ...
> Seems I was wrong :-(
>
>> surprisingly given the many long hours I've spent happily coding with
>> SWI Prolog. About two weeks ago, I was arguing this position using
>> somewhat different tactics in email; finally I spent an hour on the
>> phone in which folks -- mostly Andy and Gavin -- convinced me that while
>> this option (3a, giving us lang:en) may be architecturally appealing,
>> there are details that would require a lot of work to get right, to give
>> us something comfortable and sensible for users, and I gave up. Alas,
>> the only bit I remember right now is case sensitivity, that
>> "chat"@en="chat"@EN in SPARQL, but it's probably not practical to make
>> "chat"^^lang:en="chat"^^lang:EN in SPARQL. This puts a real (if minor)
>> problem for users up against an architectural-purity argument, and I
>
> Does it? AFAIK, XML language specifiers are indeed case insensitive,
> so what is wrong with "chat"@EN --> "chat"^^lang:en? Canonizing cannot
> be a bad idea.

The information for the language may not be in the URI string:

ex:foo owl:sameAs lang:en .

Language matching should also involve canonicalization: e.g.

en-BU -> en-MM

RFC 4646 sec 4.4

People have spent a lot of time on Tags for Identifying Languages in ISO 
639, UN M.49 etc.  They do care.

>> don't like to be on the side against the users.
>
> As a user of a system where identity is in URIs and which provides a
> powerful mechanism to say things about URIs, I would be disappointed
> to see language identifiers (!) not being represented as URIs.
>
> Could you, Andy and Gavin get the key counter arguments together?
>

See Jeremy's message:

http://lists.w3.org/Archives/Public/public-rdf-wg/2011May/0425.html

The details of language tags don't map onto datatypes very well (e.g. 
scripts which are different lexical forms so don't work with 
sub-datatypes). It would need a significant amount of time (WG time) to 
attempt an option 3 approach.

Personally, I don't think there is a solution that respects the work 
people have put into language identification in ISO 639 etc.  But if 
there is, option 2* does not completely preclude option 3* as later work 
as it only adds the datatype at the root of the subtree.

And what about RDF/XML (rdf:datatype, xml:lang) for option 3?

For me, rdf:LangString proposal (any option 2) at least gives all 
literals a datatype which is a (small) step forward, and we're changing 
plain literals anyway.

 Andy
Received on Tuesday, 27 September 2011 07:55:37 UTC