RE: weekly call for agenda items from Jeremy Carroll on 2002-11-21 (w3c-rdfcore-wg@w3.org from November 2002)

From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
Date: Thu, 21 Nov 2002 09:41:23 +0100
To: "pat hayes" <phayes@ai.uwf.edu>, "Jeremy Carroll" <jjc@hplb.hpl.hp.com>
Cc: <w3c-rdfcore-wg@w3.org>
Message-ID: <BHEGLCKMOHGLGNOKPGHDGELFCAAA.jjc@hpl.hp.com>
Summary:

For me, the move in namespace from rdfs:XMLLiteral to rdf:XMLLiteral may
impact the order in which material is presented in the model theory, but
should not impact *any* entailments.
If it does, then I would want to revisit that decision.

In particular, correct treatment of rdf:XMLLiteral (this canonicalization
stuff) should not be required for rdf-entailment. Having it as an optional
extra, supported in datatype aware systems, was part of the intent of moving
it to be a datatype.

I had two comments against the first concepts WD saying that XML Literal was
too central in the abstract syntax. Both came from W3C people. I note that
we have changed the namespace of the term (from rdfs to rdf) at the request
of the W3C rep.
If that has the side-effect of making XMLLiteral more central again, then I
suggest we request either Dan to consult the community they represent. We
may find that there isn't anyone proposing this change.

======================


> Well, make it more concrete.

The current WD is very concrete! I will work through a simple example.

> The value space consists of XML
> thingies. Can there be cases of two different XML literal strings
> denoting the same one of those thingies?

Yes e.g. "a<em/>"^^rdf:XMLLiteral and "a<em></em>"^^rdf:XMLLiteral

From the WD we use a language identifier of "" and create the two (Unicode)
strings
"<rdf-wrapper xml:lang=''>a<em/></rdf-wrapper>"
and
"<rdf-wrapper xml:lang=''>a<em></em></rdf-wrapper>"
(Using the five part concatenation formula).

These are then encoded in UTF-8 (in this case we can read them as ASCII).
These UTF-8 strings form two 'XML documents', in the sense of
http://www.w3.org/TR/REC-xml#sec-documents which also avoids the Platonic
issues).


We canonicalize both (as in the L2V mapping in the WD) and we get:
"<rdf-wrapper xml:lang=\"\">a<em></em></rdf-wrapper>\n"
(using N-triple escape notation - again this is a UTF-8 string).

That is both strings map to the same canonical XML document.

FYI: differences are:
  - the ' quotes for attribute values got mapped to " quotes
  - the "<em/>" empty tag got mapped to a start tag followed by an end tag
  - the whitespace outside the document element (rdf-wrapper) was
normalized, to being a newline after the end-tag.

Canonicalization will also make other changes.


>That is, are there any cases
> that ought to trigger the inference rule rdfD-2 for XMLLiteral?

Yes (not that I've looked at that rule yet).

> (Assume there are no lang tags.) If not, I propose that we just say
> that the value space is the same as the lexical space.

No, it's not.

> But if there
> are any rdf:XMLLiteral-datatyping entailments then I ought to say
> what they are and incorporate them into rdf-entailment.

I think not. I see datatyping as an optional layer on top. We have said that
systems should (lowercase) support XSD (including rdf:XMLLiteral), that's
good.
Also when we went down this canonicalization route we were very aware that
there are perfectly good RDF implementations that cannot tell when two XML
literals are identical. That is the canonicalization stuff is a cost to the
cheap and cheerful implementor, and requiring them to even understand it
before understanding the basics of the model theory seems mistaken.

I would hope that the DPH can take away something from the model theory - I
would be surprised if it were XML canonicalization.

>
> If there are lang tags, does having different lang tags guarantee
> that the canonical XML docs are distinct, or can there be cases where
> the lang tags dissolve into nothing and leave the docs identical?

Different lang tags means different documents (modulo case - since we now
normalize language tags on input, I think)

> If
> the former, then the MT can treat XMLLiterals just like plain
> literals but with an XML syntax check added, which would be very nice
> and easy.
>

No - it doesn't work.

The old version of the abstract syntax did required canonicalization for
syntactic well-formedness - with that then yes they are quite like plain
literals. Since we only really need it for equality, and we only really need
equality for semantic reasons, moving the C14N into the L2V mapping has been
an improvement.



> >I don't know whether anyone would care to argue whether a document
> >is or is not an XSD string.
>
> Lets agree that they are not, as far as we are concerned. After all,
> literal strings are not XSD strings either. Saying what XSD-anything
> is, is up to XMLS to do, not our job.

*We* certainly won't say, but I think we are not saying that literal strings
are not xsd:string either.

>
> >I would think not ... an xsd:string is a sequence of unicode code
> >points, whereas a document is a sequence of bytes (a canonical XML
> >document is a sequence of bytes in the UTF-8 charcater encoding).
>



Jeremy
Received on Thursday, 21 November 2002 03:41:39 UTC