Re: Reversing the debate. from Jan Wielemaker on 2011-09-26 (public-rdf-wg@w3.org from September 2011)

From: Jan Wielemaker <J.Wielemaker@vu.nl>
Date: Mon, 26 Sep 2011 22:09:48 +0200
To: Pat Hayes <phayes@ihmc.us>
CC: RDF Working Group WG <public-rdf-wg@w3.org>
Message-ID: <4E80DC0C.4010303@vu.nl>
Hi Pat,

On 09/26/2011 07:34 PM, Pat Hayes wrote:
> Perhaps the best way to resolve this interminable debate would be to
> start from the other end. Rather than implementors pointing out the
> horribleness of various proposals, could we have a list of what
> implementors would consider to be the least objectionable behavior? I

I fear there is no single obvious consensus amoung implementors :-(

> myself have no idea why "xxx@lll" is so very much worse than "xxx"
> paired with the datatype langbase:tag, but I am quite willing to be

Nice challenge. It strikes me as odd, which might be more of an
intuition than science. We already have a two-dimensional space,
consisting of a value and a datatype on one hand and a very similar
two-dimensional space consisting of a value (string) and a language tag.
Fortunately, this is not a three dimensional space, but just a two
dimensional one because all language tagged strings are (implicitly) of
type xsd:string (or some rdf:langString subtype). I.e., there is no such
thing as "1.0"@en^^xsd:float vs. "1,0"@nl^^xsd:float (oops, forget that;
I see TONS of WORMS ...).

It is clear that we want to support operations mostly on the plain
string value, such as search and comparison. That is, I don't want a
search for @ to succeed on "foo"@en. Also, I don't want "foo"@en to be
lexically smaller than "foo"@nl. So, from an implementation point of
view, I probably want to maintain the two-dimensional space where the
value ("foo") is separated from the tag [1]. It would make me very happy
if the 2.67 dimensional space (value + datatype/language tag/nothing) is
reduced to a simple two-dimensional space: value+URL. Changing "foo"
into "foo"^^xsd:string is a good step here. Changing "foo"@en into
"foo"^^lang:en would be a nice and consistent second step, putting all
literals in a nice two-dimensional space without exceptions.

In addition, I believe that having a URL for a language opens some
nice opportunities to model relations between languages in RDF.

> told that there is a consensus among implementors that this is so (or
> whatever in fact is the consensus) and then I am sure I can design an
> RDF modification which will realize that desired behavior and have a
> reasonably coherent semantics.
>
> I would however observe that as tagged literals are exceptional, and

In what sense exceptional?  I think there are lots of use-cases where
language tags play a vital role.

> as we are proposing to make some kind of change to the existing spec,
> that *some* amount of change to existing code might have to be
> contemplated. If no changes are allowed at all to any existing
> deployed code, the WG should just pack up now, define RDF2 to be the
> same as RDF1 and declare its business done. We all have other things
> to do, I am sure.

That is far too sceptical to me :-) Just map @tag into ^^lang:tag and
define some more mappings for related SPARQL constructs and I think that
everything is much more orthogonal and simple.  Yes, we will have infinite
debates on the relations between @en, @en-US, @en-GB, etc., but we won't
be able to resolve these anyway.  Just declare these out of the scope of
this working group.  At least we provide whoever wants to model langages
with URLs about which they can make statements.

	Cheers --- Jan

[1] I'm part of the camp where operations on strings are considered
both slow and dangerous ...  Data processing systems should try to
avoid looking into strings as much as possible.

> Pat
>
> On Sep 26, 2011, at 4:51 AM, Jan Wielemaker wrote:
>
>> On 09/26/2011 11:28 AM, Richard Cyganiak wrote:
>>> You understate the issues.
>>>
>>> Every existing application that uses the Literal.getLexicalForm()
>>> call of some API to get at the xxx part of xxx@lll would have to
>>> be changed, because the lexical form of xxx@lll is now xxx@lll.
>>>
>>> That's a complete non-starter.
>>
>> I fully agree. Also note that APIs for (notably in-core) RDF stores
>> can now typically work on a single shared representation of the
>> literal. If we add a tag to the literal many of the operations will
>> have to create a copy without the tag. I'm not saying this cannot
>> be solved, but I fear it will be natural nor pretty, especially for
>> existing stores that did not anticipate this in their design
>> phase.
>>
>> I must admit that I'm only following this from the sideline. As an
>> implementor I'm starting to get worried about some wild ideas
>> though. The solution I still like best is that foo@tag is the same
>> as "foo"^^langbase:tag, where langbase is some to be decided prefix
>> for language identifiers.  Any implementation should be fairly
>> comfortable with that (typically it will just simplify things).
>>
>> I understand things get complicated if we want to attach semantics
>> to the these datatypes, so I'd propose not to do that. Most likely
>> others will make an attempt.
>>
>> Regards --- Jan
>>
>>
>>
>>
>
> ------------------------------------------------------------ IHMC
> (850)434 8903 or (650)494 3973 40 South Alcaniz St.
> (850)202 4416   office Pensacola                            (850)202
> 4440   fax FL 32502                              (850)291 0667
> mobile phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>
>
Received on Monday, 26 September 2011 20:10:31 UTC