Re: Reversing the debate. from Jan Wielemaker on 2011-09-27 (public-rdf-wg@w3.org from September 2011)

From: Jan Wielemaker <J.Wielemaker@vu.nl>
Date: Tue, 27 Sep 2011 09:19:23 +0200
To: Sandro Hawke <sandro@w3.org>
CC: Pat Hayes <phayes@ihmc.us>, RDF Working Group WG <public-rdf-wg@w3.org>
Message-ID: <4E8178FB.8010406@vu.nl>
Sandro,

On 09/27/2011 07:01 AM, Sandro Hawke wrote:
> Jan, FYI I strongly agree with your intuitions here -- perhaps not

:-)  When I saw the first bits of this discussion I was under the
impression there would be a lot of debate, but in the end it would
land around the 3a proposal.  I couldn't see it otherwise ...
Seems I was wrong :-(

> surprisingly given the many long hours I've spent happily coding with
> SWI Prolog.  About two weeks ago, I was arguing this position using
> somewhat different tactics in email; finally I spent an hour on the
> phone in which folks -- mostly Andy and Gavin -- convinced me that while
> this option (3a, giving us lang:en) may be architecturally appealing,
> there are details that would require a lot of work to get right, to give
> us something comfortable and sensible for users, and I gave up.  Alas,
> the only bit I remember right now is case sensitivity, that
> "chat"@en="chat"@EN in SPARQL, but it's probably not practical to make
> "chat"^^lang:en="chat"^^lang:EN in SPARQL.  This puts a real (if minor)
> problem for users up against an architectural-purity argument, and I

Does it?  AFAIK, XML language specifiers are indeed case insensitive,
so what is wrong with "chat"@EN --> "chat"^^lang:en?  Canonizing cannot
be a bad idea.

> don't like to be on the side against the users.

As a user of a system where identity is in URIs and which provides a
powerful mechanism to say things about URIs, I would be disappointed
to see language identifiers (!) not being represented as URIs.

Could you, Andy and Gavin get the key counter arguments together?

 --- Jan

>
>       -- Sandro
>
>
> On Mon, 2011-09-26 at 22:09 +0200, Jan Wielemaker wrote:
>> Hi Pat,
>>
>> On 09/26/2011 07:34 PM, Pat Hayes wrote:
>>> Perhaps the best way to resolve this interminable debate would be to
>>> start from the other end. Rather than implementors pointing out the
>>> horribleness of various proposals, could we have a list of what
>>> implementors would consider to be the least objectionable behavior? I
>>
>> I fear there is no single obvious consensus amoung implementors :-(
>>
>>> myself have no idea why "xxx@lll" is so very much worse than "xxx"
>>> paired with the datatype langbase:tag, but I am quite willing to be
>>
>> Nice challenge. It strikes me as odd, which might be more of an
>> intuition than science. We already have a two-dimensional space,
>> consisting of a value and a datatype on one hand and a very similar
>> two-dimensional space consisting of a value (string) and a language tag.
>> Fortunately, this is not a three dimensional space, but just a two
>> dimensional one because all language tagged strings are (implicitly) of
>> type xsd:string (or some rdf:langString subtype). I.e., there is no such
>> thing as "1.0"@en^^xsd:float vs. "1,0"@nl^^xsd:float (oops, forget that;
>> I see TONS of WORMS ...).
>>
>> It is clear that we want to support operations mostly on the plain
>> string value, such as search and comparison. That is, I don't want a
>> search for @ to succeed on "foo"@en. Also, I don't want "foo"@en to be
>> lexically smaller than "foo"@nl. So, from an implementation point of
>> view, I probably want to maintain the two-dimensional space where the
>> value ("foo") is separated from the tag [1]. It would make me very happy
>> if the 2.67 dimensional space (value + datatype/language tag/nothing) is
>> reduced to a simple two-dimensional space: value+URL. Changing "foo"
>> into "foo"^^xsd:string is a good step here. Changing "foo"@en into
>> "foo"^^lang:en would be a nice and consistent second step, putting all
>> literals in a nice two-dimensional space without exceptions.
>>
>> In addition, I believe that having a URL for a language opens some
>> nice opportunities to model relations between languages in RDF.
>>
>>> told that there is a consensus among implementors that this is so (or
>>> whatever in fact is the consensus) and then I am sure I can design an
>>> RDF modification which will realize that desired behavior and have a
>>> reasonably coherent semantics.
>>>
>>> I would however observe that as tagged literals are exceptional, and
>>
>> In what sense exceptional?  I think there are lots of use-cases where
>> language tags play a vital role.
>>
>>> as we are proposing to make some kind of change to the existing spec,
>>> that *some* amount of change to existing code might have to be
>>> contemplated. If no changes are allowed at all to any existing
>>> deployed code, the WG should just pack up now, define RDF2 to be the
>>> same as RDF1 and declare its business done. We all have other things
>>> to do, I am sure.
>>
>> That is far too sceptical to me :-) Just map @tag into ^^lang:tag and
>> define some more mappings for related SPARQL constructs and I think that
>> everything is much more orthogonal and simple.  Yes, we will have infinite
>> debates on the relations between @en, @en-US, @en-GB, etc., but we won't
>> be able to resolve these anyway.  Just declare these out of the scope of
>> this working group.  At least we provide whoever wants to model langages
>> with URLs about which they can make statements.
>>
>>  Cheers --- Jan
>>
>> [1] I'm part of the camp where operations on strings are considered
>> both slow and dangerous ...  Data processing systems should try to
>> avoid looking into strings as much as possible.
>>
>>> Pat
>>>
>>> On Sep 26, 2011, at 4:51 AM, Jan Wielemaker wrote:
>>>
>>>> On 09/26/2011 11:28 AM, Richard Cyganiak wrote:
>>>>> You understate the issues.
>>>>>
>>>>> Every existing application that uses the Literal.getLexicalForm()
>>>>> call of some API to get at the xxx part of xxx@lll would have to
>>>>> be changed, because the lexical form of xxx@lll is now xxx@lll.
>>>>>
>>>>> That's a complete non-starter.
>>>>
>>>> I fully agree. Also note that APIs for (notably in-core) RDF stores
>>>> can now typically work on a single shared representation of the
>>>> literal. If we add a tag to the literal many of the operations will
>>>> have to create a copy without the tag. I'm not saying this cannot
>>>> be solved, but I fear it will be natural nor pretty, especially for
>>>> existing stores that did not anticipate this in their design
>>>> phase.
>>>>
>>>> I must admit that I'm only following this from the sideline. As an
>>>> implementor I'm starting to get worried about some wild ideas
>>>> though. The solution I still like best is that foo@tag is the same
>>>> as "foo"^^langbase:tag, where langbase is some to be decided prefix
>>>> for language identifiers.  Any implementation should be fairly
>>>> comfortable with that (typically it will just simplify things).
>>>>
>>>> I understand things get complicated if we want to attach semantics
>>>> to the these datatypes, so I'd propose not to do that. Most likely
>>>> others will make an attempt.
>>>>
>>>> Regards --- Jan
>>>>
>>>>
>>>>
>>>>
>>>
>>> ------------------------------------------------------------ IHMC
>>> (850)434 8903 or (650)494 3973 40 South Alcaniz St.
>>> (850)202 4416   office Pensacola                            (850)202
>>> 4440   fax FL 32502                              (850)291 0667
>>> mobile phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>
>
Received on Tuesday, 27 September 2011 07:20:06 UTC