RE: weekly call for agenda items from pat hayes on 2002-11-21 (w3c-rdfcore-wg@w3.org from November 2002)

From: pat hayes <phayes@ai.uwf.edu>
Date: Thu, 21 Nov 2002 14:18:56 -0600
To: "Jeremy Carroll" <jjc@hplb.hpl.hp.com>
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <p05111b20ba02eedc4f08@[10.0.100.86]>
>Summary:
>
>For me, the move in namespace from rdfs:XMLLiteral to rdf:XMLLiteral may
>impact the order in which material is presented in the model theory, but
>should not impact *any* entailments.
>If it does, then I would want to revisit that decision.

Well, obviously it moves some things that were datatype entailments 
into rdf-entailment. Only the XMLLiteral cases, of course.

>
>In particular, correct treatment of rdf:XMLLiteral (this canonicalization
>stuff) should not be required for rdf-entailment. Having it as an optional
>extra, supported in datatype aware systems, was part of the intent of moving
>it to be a datatype.
>
>I had two comments against the first concepts WD saying that XML Literal was
>too central in the abstract syntax. Both came from W3C people. I note that
>we have changed the namespace of the term (from rdfs to rdf) at the request
>of the W3C rep.
>If that has the side-effect of making XMLLiteral more central again, then I
>suggest we request either Dan to consult the community they represent. We
>may find that there isn't anyone proposing this change.
>
>======================
>
>
>>  Well, make it more concrete.
>
>The current WD is very concrete! I will work through a simple example.
>
>>  The value space consists of XML
>>  thingies. Can there be cases of two different XML literal strings
>>  denoting the same one of those thingies?
>
>Yes e.g. "a<em/>"^^rdf:XMLLiteral and "a<em></em>"^^rdf:XMLLiteral
>
>>From the WD we use a language identifier of "" and create the two (Unicode)
>strings
>"<rdf-wrapper xml:lang=''>a<em/></rdf-wrapper>"
>and
>"<rdf-wrapper xml:lang=''>a<em></em></rdf-wrapper>"
>(Using the five part concatenation formula).
>
>These are then encoded in UTF-8 (in this case we can read them as ASCII).
>These UTF-8 strings form two 'XML documents', in the sense of
>http://www.w3.org/TR/REC-xml#sec-documents which also avoids the Platonic
>issues).
>
>
>We canonicalize both (as in the L2V mapping in the WD) and we get:
>"<rdf-wrapper xml:lang=\"\">a<em></em></rdf-wrapper>\n"
>(using N-triple escape notation - again this is a UTF-8 string).
>
>That is both strings map to the same canonical XML document.
>
>FYI: differences are:
>   - the ' quotes for attribute values got mapped to " quotes
>   - the "<em/>" empty tag got mapped to a start tag followed by an end tag
>   - the whitespace outside the document element (rdf-wrapper) was
>normalized, to being a newline after the end-tag.
>
>Canonicalization will also make other changes.
>
>
>>That is, are there any cases
>>  that ought to trigger the inference rule rdfD-2 for XMLLiteral?
>
>Yes (not that I've looked at that rule yet).
>
>>  (Assume there are no lang tags.) If not, I propose that we just say
>>  that the value space is the same as the lexical space.
>
>No, it's not.

OK.

>
>>  But if there
>>  are any rdf:XMLLiteral-datatyping entailments then I ought to say
>>  what they are and incorporate them into rdf-entailment.
>
>I think not. I see datatyping as an optional layer on top. We have said that
>systems should (lowercase) support XSD (including rdf:XMLLiteral), that's
>good.

No, we have now said that this is built into the RDF namespace, and 
RDFS includes all of RDF.

>Also when we went down this canonicalization route we were very aware that
>there are perfectly good RDF implementations that cannot tell when two XML
>literals are identical.

I don't think that position is now tenable.

>That is the canonicalization stuff is a cost to the
>cheap and cheerful implementor, and requiring them to even understand it
>before understanding the basics of the model theory seems mistaken.
>
>I would hope that the DPH can take away something from the model theory - I
>would be surprised if it were XML canonicalization.
>
>>
>>  If there are lang tags, does having different lang tags guarantee
>>  that the canonical XML docs are distinct, or can there be cases where
>>  the lang tags dissolve into nothing and leave the docs identical?
>
>Different lang tags means different documents (modulo case - since we now
>normalize language tags on input, I think)
>
>>  If
>>  the former, then the MT can treat XMLLiterals just like plain
>>  literals but with an XML syntax check added, which would be very nice
>>  and easy.
>>
>
>No - it doesn't work.
>
>The old version of the abstract syntax did required canonicalization for
>syntactic well-formedness - with that then yes they are quite like plain
>literals. Since we only really need it for equality, and we only really need
>equality for semantic reasons, moving the C14N into the L2V mapping has been
>an improvement.
>
>
>
>>  >I don't know whether anyone would care to argue whether a document
>>  >is or is not an XSD string.
>>
>>  Lets agree that they are not, as far as we are concerned. After all,
>>  literal strings are not XSD strings either. Saying what XSD-anything
>>  is, is up to XMLS to do, not our job.
>
>*We* certainly won't say, but I think we are not saying that literal strings
>are not xsd:string either.

True, I mis-spoke.

>
>>
>>  >I would think not ... an xsd:string is a sequence of unicode code
>>  >points, whereas a document is a sequence of bytes (a canonical XML
>>  >document is a sequence of bytes in the UTF-8 charcater encoding).
>>
>
>
>
>Jeremy


-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola              			(850)202 4440   fax
FL 32501           				(850)291 0667    cell
phayes@ai.uwf.edu	          http://www.coginst.uwf.edu/~phayes
s.pam@ai.uwf.edu   for spam
Received on Thursday, 21 November 2002 15:19:01 UTC