Re: xmlsch-02: (was: Agenda for RDFCore WG Telecon 2003-09-19 (more on xmlsch-02)) from Patrick Stickler on 2003-09-24 (w3c-rdfcore-wg@w3.org from September 2003)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Wed, 24 Sep 2003 09:44:38 +0300
To: <w3c-rdfcore-wg@w3.org>
Message-ID: <BB971806.245%patrick.stickler@nokia.com>
  > At 08:46 19/09/03 -0500, Dan Connolly wrote:
  > > 
http://lists.w3.org/Archives/Public/www-xml-schema-comments/2003JulSep/0084.
html That seems pretty clear about what should be in the abstract syntax
  > (i.e. " 3 " is NOT an integer lexical form.
 
For what it's worth, Henry Thomson and I chatted about this over lunch
yesterday (he was visiting Nokia to give an XML Schema tutorial) and
his take was that ³ 3 ³ was simply an error and the negative entailment
test case we have is valid, and that the meaning of " 3 "^^xsd:integer
is unknown.

Applications which are expressing typed literals using XML Schema
datatypes must "do the right thing" and produce valid lexical forms.
Folks doing data mining, harvesting, etc. are not exempt from this.
Both RDF and XML Schema operate on a "higher level" than plain XML
and that means that there are certain requirements if one is to
express information at that level.

So, my understanding of his position is that we've done things
Correctly and the implementations that incorrectly make the
entailments in question because they employ whitespace processing
on the lexical forms themselves need to be fixed.


  > But it still seems to beg a question about interpretation of the RDF/XML
  > ... Michael suggests that if whitespace normalization is
required/allowed, 
  > we have to say so explicitly, but doing so would seem to embed aspects
of 
  > XML schema datatypes into the purely syntactic handling of RDF/XML.

We have a unique situation in that folks are using XML Schema datatypes
Yet RDF is not an XML Schema application, so parsers do not do XML Schema
Based processing.

The bottom line is that, no matter *what* the datatype one is using,
one must use only valid lexical forms, and the XML Schema specs are
clear about what the lexical spaces of its types are (even if it was
unclear about some other things).

We simply need to make it clear that RDF/XML is not going to go
through XML Schema processing, so users should presume any whitespace
processing.

  > My inclination is say that the denotation of datatyped literals with
  > lexical forms not valid for the datatype is unconstrained (but that
there 
  > must be *some* denotation for a given interpretation).  This means that:
  > 
  >     <rdf:Description rdf:about="ex:S>
  >       <ex:p rdf:datatype="&xsd;integer"> 3 </ex:p>
  >     </rdf:Description>
  > 
  > entails:
  > 
  >     <rdf:Description rdf:about="ex:S>
  >       <ex:p rdf:datatype="&xsd;integer">3</ex:p>
  >     </rdf:Description>
  > 
  > is not defined by the RDF specification,
  >
  > but that self-entailment:
  > 
  >     <rdf:Description rdf:about="ex:S>
  >       <ex:p rdf:datatype="&xsd;integer"> 3 </ex:p>
  >     </rdf:Description>
  > 
  > entails:
  > 
  >     <rdf:Description rdf:about="ex:S>
  >       <ex:p rdf:datatype="&xsd;integer"> 3 </ex:p>
  >     </rdf:Description>
  > 
  > always holds.
 
I would expect that this is now the case. While we don't know
what an ill formed typed literal denotes, the same ill formed
typed literal should still consistently denote the same thing
wherever it occurs since it would be a single tidied node in
the graph. No?

  > This leaves wiggle-room for systems that apply whitespace facet
  > normalization of lexical forms to be permissable without imposing it on
all 
  > implementations, and also allowing a minimal coherent handling
  > (self-entailment) of unrecognized datatypes.

I remain uncomfortable about santioning such wiggle-room. If
Its not a valid RDF+XSD entailment, it should never be made by
a conformant RDF processor -- or at least, it should be clear
that it is nonstandard, and nonportable behavior.

While there may be some degree of short term inconvenience for
those who want to harvest "knowledge" from non-RDF sources, being
strict in this regard will benefit the RDF and SW communities in
the long run since it highlights the need to be sure of the knowledge
one is asserting. Just because one can write a script that coerces
web content into something that looks like RDF does not mean that
the knowledge reflected by the RDF is valid. Forcing folks to be
sure about what they are asserting, by requiring valid lexical forms,
is IMO a wise thing to do.

So, let's leave the negative entailment tests, and implementors
will need to fix their stuff accordingly.

Patrick
Received on Wednesday, 24 September 2003 02:44:57 UTC