RE: 2001-09-07#6: ns qualified parseType values from Graham Klyne on 2001-09-19 (w3c-rdfcore-wg@w3.org from September 2001)

From: Graham Klyne <Graham.Klyne@MIMEsweeper.com>
Date: Wed, 19 Sep 2001 15:18:22 +0100
To: "dehora" <dehora@eircom.net>
Cc: "'W3C Rdfcore'" <w3c-rdfcore-wg@w3.org>
Message-Id: <5.1.0.14.2.20010919145138.037750f0@joy.songbird.com>
At 01:17 PM 9/19/01 +0100, dehora wrote:
> > ** 1st complication:  some old parsers may misinterpret
> > rdf:Resource and
> > Literal;  I think that results in loss of functionality rather than
> > complete failure of interoperability, but it's still messy.
>
>In the sense that they'll treat rdf:Literal as a single string?

My typo above.  I meant to say "some old parsers may misinterpret 
rdf:Resource *as* Literal"

[...]

> >
> > ** 2nd complication:  what does this effect?  Does it mean that the
> > contained literal must be canonical XML [...]?
>
>Yes, I mean this. RDF-XML writers that want to mark RDF-XML Literals as
>having a canonical parseType _must_ canonicize the Literal on
>serialization. It's the writers job to make sense. The meaning of
>parseType in my mind is a lexing/parsing clue not a transformation clue.

Oh, I didn't take it that way.  In which case, why bother?  RDF writers can 
write canonical XML anyway if they so choose.  I don't see the value in 
labelling it as such.

>I could live with this, Jeremy has also suggested as much. But it seems
>odd that we don't eat our own food.

I agree with your sentiment, *but* we are stuck with the unadorned forms 
for compatibility purposes and there doesn't seem to be a clear migration 
path so I'm suggesting what seems to be an approach with slightly less room 
for confusion.

> > I find point (3) is more difficult.  Why do we want to
> > canonicalize XML?  I
> > think the answer is to define a form of equivalence that can
> > be tested by
> > string comparison.  I'd rather target the equivalence issue
> > directly, and I
> > think this is an issue separate from rdf:parseType since it
> > also arises
> > with non-XML literals (e.g. dates, numbers, etc.).  Anyway,
> > canonicalization doesn't completely solve the problem in the case of
> > rdf:parseType="Resource", where the logical definition of
> > equivalence would
> > be two literals yielding the same RDF graph.
>
>[I don't consider whatever is at the end of parseType="Resource" to be a
>literal with structure, it's expanded as RDF.]

True.

>Literals will require a canonical form, or if one prefers, a shared
>abstraction of a literal, for two reasons I see:
>
>-equivalence operations as you point out.
>-serialization and roundtripping.

By "serialization and roundtripping" do you mean "roundtripping via 
serialization"?

I think this matter of round-tripping is somewhat bound up with the 
question of equivalence.  That is, when roundtripping from form A to form B 
and back to form A, one wants the result to be "equivalent" to the 
original.  Which begs a definition of equivalence.

I have a second concern about getting too insistent on roundtripping:  in 
some cases, the alternative forms just doesn't convey all the same 
information -- some information is bound to be lost.  (Again, a suitable 
definition of equivalence provides us with a way to test whether any 
information loss is significant to the purpose at hand.)

[...]

>My approach has been to look at Unicode rather than data structures
>(though I'm aware that some people have made good arguments for literal
>as structure) for the following reasons:

I wasn't suggesting moving away from Unicode strings as the representation, 
just introducing concepts of equivalence that are not simple string equality.

Simple example, for decimal numbers the following might be considered 
equivalent:
    "55"
    "055"
    "+55"
though they're clearly different strings.  This can be stated without 
dictating an alternative representation, though it does presume some 
syntactic structure and corresponding interpretation.  (Maybe subsumed by 
the "qLiteral -> LV" mapping of the model theory?)

[...]

>With regard to stuff like dates and numbers, that's another issue again:
>datatyping. I observe that if we allow namespaced parseTypes values,
>people can simply slot in XML Schema simple types for ad hoc literal
>processing. You get XML Schema pretty much for free this way.

Again, I like the sentiment here.  But RDF M&S requires that any literal 
with an rdf:parseType attrribute is well-formed XML.  And I don't think a 
simple string like "55" qualifies.  Maybe this is fixable?  But if we try 
to fix this in the way you suggest, I think we have to relax the 
requirement that unrecognized rdf:parseType values must have well-formed 
XML content.

#g


------------------------------------------------------------
Graham Klyne                    MIMEsweeper Group
Strategic Research              <http://www.mimesweeper.com>
<Graham.Klyne@MIMEsweeper.com>
------------------------------------------------------------
Received on Wednesday, 19 September 2001 10:33:22 UTC