Re: example of options 3 & 4 simplifying code (ACTION-86) from Sandro Hawke on 2011-09-09 (public-rdf-wg@w3.org from September 2011)

From: Sandro Hawke <sandro@w3.org>
Date: Fri, 09 Sep 2011 11:19:02 -0400
To: Richard Cyganiak <richard@cyganiak.de>
Cc: Pat Hayes <phayes@ihmc.us>, RDF Working Group WG <public-rdf-wg@w3.org>, Ivan Herman <ivan@w3.org>
Message-ID: <1315581542.2095.299.camel@waldron>
On Fri, 2011-09-09 at 07:01 +0200, Richard Cyganiak wrote:
> On 8 Sep 2011, at 23:01, Sandro Hawke wrote:
> > My argument is about who bears the cost, not whether there is a cost.
> > Language tagging is complicated; I just don't want people to have to be
> > aware of it *at all* until/unless they are using it or writing complete
> > general-purpose libraries.
> 
> I don't understand this.
> 
> So let's assume a scenario where everyone just uses untagged string literals.
> 
> Who in this scenario are the people bearing the unnecessary cost from the i18n support of RDF?

The people who have to write or understand the extra branches in my
example code, or pass the additional language-tag parameter.  And, more
importantly, anyone who tries to understand RDF and must learn about
language tagging, because the have to pay attention to it whenever
dealing with any literals.

I think it's easier to understand a very-simple RDF with a library of
datatypes -- some of which get complicated -- than a
somewhat-more-complicated RDF.

> >> Also, your proposal seems to be motivated by a desire to reduce the total number of distinct parts in a literal from three to two. Why not go further and reduce them to one? Surely that would be superior to your proposal by your own metric?
> > 
> > To quote Occam and/or Einstein: Things should be as simple as possible,
> > but not simpler.
> 
> They already are as simple as possible. Thanks for “proving” my point!
> 
> >> Also, "foo"@en and "foo"@EN and "foo"@eN are all the same literal in Turtle, SPARQL and N-Triples. Would "foo"^^rdfl:en, "foo"^^rdfl:EN and "foo"^^rdfl:eN be the same or different in your proposal?
> > 
> > It's like "0.1"^^xs:decimal and "00.1"^^xs:decimal, I think.  
> 
> No. "0.1"^^xsd:decimal and "00.1"^^xsd:decimal are *distinct* in the abstract syntax. "foo"@en and "foo"@EN are the *same* in the abstract syntax.
> 
> rdfl:en and rdfl:EN are *distinct* URIs in the abstract syntax.
> 
> So either you end up with a situation where language tags are sometimes case sensitive and sometimes not. Or you end up with a situation where IRIs are sometimes case sensitive and sometimes not.

I'm proposing the former.  I'm proposing that language tag case
insensitivity become optional in RDF, exactly like removing leading
zeros is.   That is, I'm proposing it be handled in the datatype
reasoning, instead of in the parser.

> > If you implement language tag processing, then you'll
> > smoosh/normalize those.
> 
> Why “if”? Language tag processing is a normative part of RDF, so any conforming implementation has to implement this proposed “IRI case smushing”.
> 
> If your proposal is to normatively define a profile of RDF that removes i18n support, then say so. Spec-wise that would be easy to do.

As I see it, yes, this is a (desirable) side-effect of Option 3.  I
don't think language tag reasoning should be any more mandatory than
simple reasoning about datatypes (like that "1"^^xs:int ==
"1"^^xs:integer).

I note that RIF and OWL2 agreed on a set of datatypes for which
reasoning is not optional.   I suggest language tag processing be on
this list, and perhaps this list be additionally called out in the RDF
specs.
    
    -- Sandro


> Best,
> Richard
> 
> 
> 
> > 
> >> Also, this:
> >> http://lists.w3.org/Archives/Public/public-rdf-wg/2011May/0425.html
> > 
> > I understand that paper to show that we can't use existing RDF inference
> > machinery to address language tag machinery.  That's okay; it would have
> > been nice, but I'm not suggesting we do so.  I'm not saying by using
> > language tags as datatypes we get any subsumption or other reasoning for
> > free.  The only benefit, and one I continue to think is worthwhile but
> > certainly not the most important thing in the RDF world, is to move all
> > the complexity of language tags -- including their very existence -- out
> > of RDF itself and into a family of datatypes.
> > 
> > I'm really sorry, again, the was I wasn't able to engage on this a few
> > weeks ago, for personal reasons.
> > 
> >      -- Sandro
> > 
> >> Best,
> >> Richard
> >> 
> >> 
> >> On 8 Sep 2011, at 14:13, Sandro Hawke wrote:
> >> 
> >>> On Thu, 2011-09-08 at 10:12 +0200, Richard Cyganiak wrote:
> >>>> On 7 Sep 2011, at 19:34, Sandro Hawke wrote:
> >>>>> I argued in todays meeting, off the cuff, that option 2 (in Pat's
> >>>>> email [1]) offers only aesthetic improvements, while options 3 and 4
> >>>>> will result in simpler code.  
> >>>> 
> >>>> Please provide some example code for:
> >>>> 
> >>>> Option 3:
> >>>> 
> >>>> - checking whether a literal is a string
> >>> 
> >>> LANG = "http://www.w3.org/ns/lang/"
> >>> XS = "http://www.w3.org/2001/XMLSchema" 
> >>> 
> >>> def is_string(node):
> >>>  return is_literal(node) and (
> >>>     node.datatype == XS+"string" or
> >>>     node.datatype.startswith(LANG) )
> >>> 
> >>>> - returning the language tag of a language-tagged string
> >>> 
> >>> def lang_tag(node):
> >>>  assert node.datatype.startswith(LANG) 
> >>>  return node[len(LANG):]
> >>> 
> >>>> Option 4:
> >>>> 
> >>>> - returning the lexical form of a literal
> >>> 
> >>> node.lexrep
> >>> 
> >>> I agree this is a significant compatibility problem, since it will
> >>> return chat@fr.
> >>> 
> >>>    -- Sandro
> >>> 
> >>>> Thanks,
> >>>> Richard
> >>>> 
> >>> 
> >>> 
> >>> 
> >> 
> >> 
> > 
> > 
> > 
> 
>
Received on Friday, 9 September 2011 15:19:11 UTC