Re: Proposal for ISSUE-12, string literals from Andy Seaborne on 2011-05-12 (public-rdf-wg@w3.org from May 2011)

From: Andy Seaborne <andy.seaborne@epimorphics.com>
Date: Thu, 12 May 2011 18:28:02 +0100
To: public-rdf-wg@w3.org
Message-ID: <4DCC18A2.90706@epimorphics.com>
On 12/05/11 17:49, Richard Cyganiak wrote:
> On 12 May 2011, at 17:05, Pat Hayes wrote:
>> Hmm, on second thoughts and a more careful reading, I am no longer
>> sure I like the "MAY replace any literal with a canonical form"
>> idea.
>
> Note that the term “canonical forms” is defined *in the proposal*,
> and covers *only* plain string literals. So this does *only* license
> the replacement of funky string datatypes with plain literals. It
> does *not* license any other replacements, like those you mention
> below.
>
> I mentioned the idea of *extending* this to also cover other XSD
> literals, but that clearly goes beyond ISSUE-12, and is not necessary
> to address ISSUE-12.
>
> I offer a scaled-down rewording. So instead of this:
>
>>> §8 “Some literals are canonical forms. Implementations MAY
>>> replace any literal with a canonical form if both are
>>> syntactically different, but have the same value. All plain
>>> literals, with or without language tag, are canonical forms.”
>
> How about this:
>
>>> §8 “Implementations MAY replace xsd:string typed literals and
>>> rdf:PlainLiteral typed literals with a plain literal that has the
>>> same value.”
>
> That would be sufficient if we don't want to do anything about other
> XSD literals.
>
> That said, I find your argument against literal canonicalization not
> compelling.
>
>> If this is a licence for some other engine to tidy up the literals
>> in my RDF, then I vote against this idea. Who knows why I might
>> have chosen to use a non-canonical form? Some people might use the
>> number of leading zeros to encode precision information, for
>> example.
>
> We don't have to cater for inappropriate use of the technology. You
> could use the same kind of reasoning to argue that "foo" and
> "foo"^^xsd:string must be kept distinct because someone might use the
> difference to encode access control information. That's absurd. Show
> me someone who does it.
>
> (You don't *actually* encode precision information this way in your
> own data, do you???)
>
>> It just seems inappropriate to give a global licence to 'tidy up'
>> other people's data.
>
> Why?
>
>> And why do we need this? The datatype definitions already provide
>> for the relevant equalities, if someone wants to keep their data
>> semantically tidy.
>
> It's about keeping it *syntactically* tidy -- removing unnecessary
> syntactic variation, so that the syntax matches the semantics. It
> gives implementers license to simplify implementations.
>
> And as you say, it's replacing one form with another form that
> “means” the same thing (under D-Entailment), so I really don't see
> the problem.
>
> Finally, it's a MAY. Implementers who think it's inappropriate don't
> have to do it. Users who think it's inappropriate can vote with their
> legs / with apt-get.
>
> Best, Richard

Yes, although not so much the equality as being say "don't care", 
especially when they don't know the data has language tags in it.

i.e. "foo" (in the query) does not match "foo"@en

You can do it with something like:

  { ?s ?p ?o . FILTER (str(?o) = "foo") }

but that is really coding round the issue and presumes you know to 
insert that idiom.


Adding datatypes, and I presume assuming the entailment relationships 
are included, does not help much - it adds more solutions to
  { ?s <p> ?o }
which itself can be confusing even if
  { <s> <p> "foo" }
now matches.

A difference between entailment and query is that entailment say "can 
pattern X be true?" and query says "these are the ways (variables, 
values) that make pattern X true".

	Andy
Received on Thursday, 12 May 2011 17:30:58 UTC