Re: Rough draft of datatypes with whitespace text from Patrick Stickler on 2003-10-01 (w3c-rdfcore-wg@w3.org from October 2003)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Wed, 01 Oct 2003 15:39:17 +0300
To: ext Jeremy Carroll <jjc@hpl.hp.com>, <w3c-rdfcore-wg@w3.org>
Message-ID: <BBA0A5A5.1A91%patrick.stickler@nokia.com>
On 2003-10-01 14:16, "ext Jeremy Carroll" <jjc@hpl.hp.com> wrote:

> 
> 
> See
> http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/#section-Datatyp
> es
> for current text.
> 
> 1: delete note:
> [[
> In [XML-SCHEMA1], white space normalization occurs during validation according
> to the value of the whiteSpace facet. The lexical-to-value mapping used in
> RDF datatyping occurs after this, so that the whiteSpace facet has no effect
> in RDF datatyping.
> ]]
> 
> 2: Replace:
> [[
> A datatype consists of a lexical space, a value space and a lexical-to-value
> mapping. 
> ]]
> with
> [[
> A datatype consists of a lexical space, a value space, a lexical-to-value
> mapping, and optionally a whitespace facet.
> ]]

Eh? When did the WG agree that whitespace facets where part of *RDF*
datatyping?!

It has (finally) been clarified that whitespace facets in XML Schema
are not relevant to the L2V mapping and are not applied to lexical
forms. So why are we mentioning them?

> 
> 3: Before:
> [[
> A datatype is identified by one or more URI references.
> ]]
> 
> [[
> The whitespace facet, if any, is one of the values ???? [TBD] as defined in
> [XML Schema1]
> ]]
> 
> 
> 3:
> Between:
> [[
> A datatype is identified by one or more URI references.
> ****
> RDF may be used with any datatype definition that conforms to this
> abstraction, even if not defined in terms of XML Schema.
> ]]
> 
> Add the following:
> [[
> To define the operational of an RDF datatype we use the terms
> extended lexical-to-value mapping, and extended lexical space.
> 
> For datatypes without a whitespace facet, the extended lexical-to-value
> mapping is identical to the lexical-to-value mapping, and the extended
> lexical space is identical to the lexical space.

Eh??!

> Otherwise, the extended lexical-to-value mapping consists of first
> applying whitespace normalization following [XML Schema1] as
> appropriate for the
> whitespace facet
> and then applying the lexical-to-value mapping.
> The extended lexical space is the domain of the extended lexical-to-value
> mapping. i.e. a string is in the extended lexical space if and only if the
> result of whitespace normalization of that string is in the lexical space.
> ]]
> 
> 4.
> Specifiy that rdf:XMLLiteral does not have a whitespace facet
> 
> 5. 
> Specify that XSD datatypes have the whitespace facet as defined by XSD.
> 
> 6.
> In other documents (particularly semantics) globally substitute lexical space
> by extended lexical space and lexical-to-value mapping by extended
> lexical-to-value mapping.

??!!!

I'm sorry, but was this decided during the last telecon? I skimmed over
the minutes but didn't see any such substantial change.

Or is this just a proposal?

Can someone please provide some explicit pointers if this reflects
a WG decision?

What is being proposed above is (a) unnecessary, (b) makes RDF datatyping
fail to be a subset of the XML Schema datatyping model, (c) and seems
solely motivated by the *laziness* of those creating RDF/XML to be
*precise* about what they are saying.

Those who are creating RDF/XML expressed using XML Schema datatypes
need to be precise about the values being expressed -- and if someone
is doing Web scraping and pulling stuff from less-precise contexts
(less precise than either RDF or XML Schema) then they need to do
validation on those denotations -- and the ability to do validation
also means they can produce normalized lexical forms when generating
the RDF/XML.

Adding these extensions to cater to folks too sloppy to do validation
of stuff they are scraping off the web or elsewhere and just throwing
it into a form that simply *looks* like RDF but does not merit the
precision of RDF is just plain wrong.

RDF (and XML Schema) demands higher precision than just HTML, or
well-formed XML, or many other encodings -- and any time you move
data from a less precise to a more precise context, there is a
*cost*. And for XML Schema Datatypes, that cost is validation/normalization
of lexical forms -- regardless of their source.

Sorry Jeremy, if I seem to be jumping on you personally. I'm not.
(unless of course, it's your own proposal ;-)

Sheesh... you miss one telecon...   ;-)

Patrick
Received on Wednesday, 1 October 2003 08:39:26 UTC