RE: tex-01 new proposal from Jeremy Carroll on 2003-04-04 (w3c-rdfcore-wg@w3.org from April 2003)

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Fri, 4 Apr 2003 15:33:41 +0200
To: <Patrick.Stickler@nokia.com>, <jjc@hplb.hpl.hp.com>, <dave.beckett@bristol.ac.uk>
Cc: <bwm@hplb.hpl.hp.com>, <w3c-rdfcore-wg@w3.org>
Message-ID: <BHEGLCKMOHGLGNOKPGHDCEHBCBAA.jjc@hpl.hp.com>

First,
apologies that I have submitted two different proposals for this issue, I
suspect it causes a bit of confusion. Also the two proposals have two
different sets of tests, but you could swap the tests around so ....


The four tests in the "tex-01 new proposal" are intended to show that, even
if the concepts document says that case is retained in the abstract syntax,
that the nature of graph equivalence is such that:

_:a eg:p "foo"@en-US .
_:a eg:p "foo"@en-us .

is equivalent to

_:a eg:p "foo"@en-US .

This seems to be the essence of our tidiness decision when applied to
lang-tags.
I express this as four parser tests because we don't have tests that say
N-triple-doc-1 = N-triple-doc-2

but this can be achieved by saying both N-triple-docs =
a-single-RDF/XML-doc.


Dave seems to want to make this complete clear my having N-triples use only
lowercase for language tags. The test would then be:

_:a eg:p "foo"@en-us .
_:a eg:p "foo"@en-us .

is equivalent to

_:a eg:p "foo"@en-us .

I proposed the "tex-01 new proposal" in response to last weeks discussion in
which there was some sense that maybe we should entirely accept Tex's
comment and *not* normalize the case of the language tag.


My suspicion is that these tests are more trouble than they are worth, when
the alternative entailment tests act just as well to show that langauge tags
are case insensitive, without requiring us to clarify some issues to do with
N-triples (on which we do not appear to agree).

Inline comments:
> > tex01/test002.rdf
> > <rdf:RDF>
> >  <rdf:Description rdf:nodeID="a" xml:lang="en-US" eg:p="foo"/>
> >  <rdf:Description rdf:nodeID="a" xml:lang="en-us" eg:p="foo"/>
> > </rdf:RDF>
> >
> > tex01/test002.nt
> > _:a eg:p "foo"@en-US .
>
> If case is syntactically significant, why wouldn't the parser
> produce
>
>    _:a eg:p "foo"@en-US .
>    _:a eg:p "foo"@en-us .

It would - but the graph equivalence test should treat the two literals
tidily.

>
> If we had two separate graphs, one with
>
>    _:a eg:p "foo"@en-US .
>
> and the other with
>
>    _:a eg:p "foo"@en-us .
>
> would the graph merge discard the 'en-us' triple? Why?
>

Yes, one or the other.

On the "tex-01 new proposal" the two langauge tags are equal, so the two
triples are equal so the set
{
_:a eg:p "foo"@en-US .
_:a eg:p "foo"@en-us .
}
contains one member,

and its probably
_:a rg:p "foo"@EN-us .
:)


> It seems to me that either case is syntactically significant or
> its not, and if it is, then the graph syntax should fully respect
> and preserve all variations of case.
>
That's why I made the "yet another tex-01 proposal" given Tex has let us off
the hook, we should make as few changes to the LC doc as we can get away
with.

> If case is not syntactically significant, such that it is normalized
> in some fashion, then test002 seems more reasonable (though I would
> suggest lowercase as the default/preferred choice).
>
> On this one, I don't see any harm in having parsers transpose lang
> tags to lowercase and NTriples only having lowercase lang tags.
>
I do not like NTriples being changed in this way, that seems to make things
worse from the point of view of tex's comment.

Jeremy

Received on Friday, 4 April 2003 08:34:24 UTC