Re: normalization issues with Turtle spec tests from Ruben Verborgh on 2013-04-07 (public-rdf-comments@w3.org from April 2013)

From: Ruben Verborgh <ruben.verborgh@ugent.be>
Date: Sun, 7 Apr 2013 08:46:21 +0200
To: Eric Prud'hommeaux <eric@w3.org>
Cc: public-rdf-comments@w3.org, Gregg Kellogg <gregg@greggkellogg.net>, gavin@carothers.name
Message-Id: <A6A2C9F0-9044-4B64-AA2F-29322F19BA33@ugent.be>

Dear Eric,

> The tests are enforcing checking that the term generaged from e.g. '"+1"^^xsd:integer' is distinct from '1' (which is the same as '"1"^^xsd:integer').
>  https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#dfn-literal-equality
> This might be an opportunity to comment out some code.

1) Do you perhaps know the reason for this choice—and has this changed somewhere along the way?
If I take
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
    <a> <b> "1"^^xsd:integer.
    <a> <b> "+1"^^xsd:integer.
    <a> <b> "0001"^^xsd:integer.
and put it through cwm, I get
    <a>     <b> 1,
                1,
                1 .
and if I put that again through cwm, I get
    <a>     <b> 1 .
However, does the section you point to means this changed so that '"1"^^xsd:integer’ and the others are no longer equivalent to ‘1’?

2) Directly above the “linear equality” section, https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#dfn-language-tag says:
"The language tag must be well-formed according to section 2.2.9 of [BCP47], and must be normalized to lowercase.”
So test langtagged_LONG_with_subtag seems wrong to answer with '@en-UK’.
Furthermore, the definition of “literal equality” might be slightly off then. Are the following equal?
- "test"@en-uk
- "test"@en-UK

3) The whole equality thing makes it quite tricky to find triples. For instance, if I search for:
   triples.find(any, any, 1), what should be returned?
- triples with an object of ‘1'?
- triples with an object of ‘”1"^^xsd:integer’?
- triples with an object of ‘”+1"^^xsd:integer'?
- triples with an object of ‘”01"^^xsd:integer’?
- triples with an object of ‘”000001"^^xsd:integer'?
How do existing implementations deal with this?

So yes, I might comment out some code.
But then the result will either be more difficult to work with for the library user (because of inequalities),
or far less performant (as I’d have to index a normalized version and still store and return the original).
I really wonder why the choice against normalization was made.

> I use SWObjects with a command line "-d test.nt --compare ref.nt”
Wonderful, I will try that. Seems much easier.

Best,

Ruben

Received on Sunday, 7 April 2013 06:50:24 UTC