Re: Reversing the debate. from Sandro Hawke on 2011-09-27 (public-rdf-wg@w3.org from September 2011)

From: Sandro Hawke <sandro@w3.org>
Date: Tue, 27 Sep 2011 01:01:37 -0400
To: Jan Wielemaker <J.Wielemaker@vu.nl>
Cc: Pat Hayes <phayes@ihmc.us>, RDF Working Group WG <public-rdf-wg@w3.org>
Message-ID: <1317099697.364.39.camel@waldron>
Jan, FYI I strongly agree with your intuitions here -- perhaps not
surprisingly given the many long hours I've spent happily coding with
SWI Prolog.  About two weeks ago, I was arguing this position using
somewhat different tactics in email; finally I spent an hour on the
phone in which folks -- mostly Andy and Gavin -- convinced me that while
this option (3a, giving us lang:en) may be architecturally appealing,
there are details that would require a lot of work to get right, to give
us something comfortable and sensible for users, and I gave up.  Alas,
the only bit I remember right now is case sensitivity, that
"chat"@en="chat"@EN in SPARQL, but it's probably not practical to make
"chat"^^lang:en="chat"^^lang:EN in SPARQL.  This puts a real (if minor)
problem for users up against an architectural-purity argument, and I
don't like to be on the side against the users.

     -- Sandro


On Mon, 2011-09-26 at 22:09 +0200, Jan Wielemaker wrote:
> Hi Pat,
> 
> On 09/26/2011 07:34 PM, Pat Hayes wrote:
> > Perhaps the best way to resolve this interminable debate would be to
> > start from the other end. Rather than implementors pointing out the
> > horribleness of various proposals, could we have a list of what
> > implementors would consider to be the least objectionable behavior? I
> 
> I fear there is no single obvious consensus amoung implementors :-(
> 
> > myself have no idea why "xxx@lll" is so very much worse than "xxx"
> > paired with the datatype langbase:tag, but I am quite willing to be
> 
> Nice challenge. It strikes me as odd, which might be more of an
> intuition than science. We already have a two-dimensional space,
> consisting of a value and a datatype on one hand and a very similar
> two-dimensional space consisting of a value (string) and a language tag.
> Fortunately, this is not a three dimensional space, but just a two
> dimensional one because all language tagged strings are (implicitly) of
> type xsd:string (or some rdf:langString subtype). I.e., there is no such
> thing as "1.0"@en^^xsd:float vs. "1,0"@nl^^xsd:float (oops, forget that;
> I see TONS of WORMS ...).
> 
> It is clear that we want to support operations mostly on the plain
> string value, such as search and comparison. That is, I don't want a
> search for @ to succeed on "foo"@en. Also, I don't want "foo"@en to be
> lexically smaller than "foo"@nl. So, from an implementation point of
> view, I probably want to maintain the two-dimensional space where the
> value ("foo") is separated from the tag [1]. It would make me very happy
> if the 2.67 dimensional space (value + datatype/language tag/nothing) is
> reduced to a simple two-dimensional space: value+URL. Changing "foo"
> into "foo"^^xsd:string is a good step here. Changing "foo"@en into
> "foo"^^lang:en would be a nice and consistent second step, putting all
> literals in a nice two-dimensional space without exceptions.
> 
> In addition, I believe that having a URL for a language opens some
> nice opportunities to model relations between languages in RDF.
> 
> > told that there is a consensus among implementors that this is so (or
> > whatever in fact is the consensus) and then I am sure I can design an
> > RDF modification which will realize that desired behavior and have a
> > reasonably coherent semantics.
> >
> > I would however observe that as tagged literals are exceptional, and
> 
> In what sense exceptional?  I think there are lots of use-cases where
> language tags play a vital role.
> 
> > as we are proposing to make some kind of change to the existing spec,
> > that *some* amount of change to existing code might have to be
> > contemplated. If no changes are allowed at all to any existing
> > deployed code, the WG should just pack up now, define RDF2 to be the
> > same as RDF1 and declare its business done. We all have other things
> > to do, I am sure.
> 
> That is far too sceptical to me :-) Just map @tag into ^^lang:tag and
> define some more mappings for related SPARQL constructs and I think that
> everything is much more orthogonal and simple.  Yes, we will have infinite
> debates on the relations between @en, @en-US, @en-GB, etc., but we won't
> be able to resolve these anyway.  Just declare these out of the scope of
> this working group.  At least we provide whoever wants to model langages
> with URLs about which they can make statements.
> 
>  Cheers --- Jan
> 
> [1] I'm part of the camp where operations on strings are considered
> both slow and dangerous ...  Data processing systems should try to
> avoid looking into strings as much as possible.
> 
> > Pat
> >
> > On Sep 26, 2011, at 4:51 AM, Jan Wielemaker wrote:
> >
> >> On 09/26/2011 11:28 AM, Richard Cyganiak wrote:
> >>> You understate the issues.
> >>>
> >>> Every existing application that uses the Literal.getLexicalForm()
> >>> call of some API to get at the xxx part of xxx@lll would have to
> >>> be changed, because the lexical form of xxx@lll is now xxx@lll.
> >>>
> >>> That's a complete non-starter.
> >>
> >> I fully agree. Also note that APIs for (notably in-core) RDF stores
> >> can now typically work on a single shared representation of the
> >> literal. If we add a tag to the literal many of the operations will
> >> have to create a copy without the tag. I'm not saying this cannot
> >> be solved, but I fear it will be natural nor pretty, especially for
> >> existing stores that did not anticipate this in their design
> >> phase.
> >>
> >> I must admit that I'm only following this from the sideline. As an
> >> implementor I'm starting to get worried about some wild ideas
> >> though. The solution I still like best is that foo@tag is the same
> >> as "foo"^^langbase:tag, where langbase is some to be decided prefix
> >> for language identifiers.  Any implementation should be fairly
> >> comfortable with that (typically it will just simplify things).
> >>
> >> I understand things get complicated if we want to attach semantics
> >> to the these datatypes, so I'd propose not to do that. Most likely
> >> others will make an attempt.
> >>
> >> Regards --- Jan
> >>
> >>
> >>
> >>
> >
> > ------------------------------------------------------------ IHMC
> > (850)434 8903 or (650)494 3973 40 South Alcaniz St.
> > (850)202 4416   office Pensacola                            (850)202
> > 4440   fax FL 32502                              (850)291 0667
> > mobile phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> >
> >
> >
> >
> >
> >
> 
>
Received on Tuesday, 27 September 2011 05:01:51 UTC