- From: David Booth <dbooth@w3.org>
- Date: Fri, 27 Feb 2004 17:34:22 -0500
- To: Bijan Parsia <bparsia@isr.umd.edu>, "David Orchard" <dorchard@bea.com>
- Cc: <www-ws-desc@w3.org>, "'Mark Baker'" <distobj@acm.org>,
Bijan,
While I agree with your observation that RDF does not help you do data
validation, to some extent I think it misses the point.
IMO, the most important use of XML schemas in WS is not to facilitate
runtime message validation by an XML parser. It is to clearly indicate the
intended message format, so that requester and provider entities can be
sure that their corresponding software agents will know:
(a) how to generate the right data format;
(b) how to access the data that they need (i.e., where
in the parse tree); and
(c) what datatype it is.
IMO, runtime message validation by an XML parser is a distant secondary
consideration. Why? Because nearly every responsible application MUST do
its own application-specific data validation anyway, before it makes use of
the incoming data. An XML schema in conjunction with a validating parser
can somewhat reduce the data validation burden, but in general cannot
eliminate it. Furthermore, there is a risk to doing part of the
validation in the XML parser: The code that performs the app-specific data
validation is separated from the XML Schema, and this means that they can
get out of sync if the schema is later modified. Hence, it is safest if
the code does *not* rely on the XML parser to validate. For this and
performance reasons, many production XML applications turn off parser
validation.
Returning to the list above, an XML schema is very good for indicating the
location of data items in the parse tree, along with their data
types. But if RDF is used, then the format of the data is largely a
non-issue, and RDF can also indicate the data types.
More comments in line . . .
>At 11:59 AM 2/15/2004 -0500, Bijan Parsia wrote:
>>
>>On Feb 15, 2004, at 1:31 AM, David Orchard wrote:
>>
>>. . .
>>Well, the reason that I want "ignore unknowns" is because I know that
>>"ignore unknowns" has been deployed on the web for >10 years and it works
>>for versioning. If there's another solution, I'm really really really
>>interested in it.
>
>The extra bit, perhaps, is the validation. Although required known fields
>are ubiquitous where you have ignore unknowns :)
+1
The reason the "ignore unknown" rule for optional extensions works so well
for so many languages is that they have implicitly been making two very
important assumptions: (a) that a statement in a given language represents
a set of assertions; and (b) that the language makes the "open world"
assumption. (The "open world assumption" means that a statement in the
language is not assumed to tell you *everything* that could possibly be
true about its subject. There may be additional things true about its
subject (for example, additional behavior) beyond what the statement
asserts.) Of course, sets of assertions in a open world are what RDF is
all about, so this comes naturally to RDF.
>> . . .
>>So, what are the requirements:
>>1. Types that are valid have type information
>>2. Types that are not known do not break validation
>>3. Types allow for arbitrary extensibilty in ways not predicted by the
>>Version N schema author.
>>4. Types that are not known and optional can be added without breaking
>>compatibility (same as #2?)
>>5. Types that are known and not allowed break validation.
>>
>>Assuming that these are roughly the requirements for doing compatibile
>>versioning, Bijan, what would the RDF/XML look like to express these
>>assurances?
>
>Can't. Not even with OWL. . . . .
But if you relax the validation requirement, and instead focus on the twin
tasks of (1) ensuring that the parties agree on what data to expect, and
(2) *processing* the data; then I think RDF *does* meet these requirements
very well.
In other words, if we re-write the above requirements as:
1. Data items that are expected have type information
2. Data items that are not expected do not break processing
3. Input data allows for arbitrary extensibilty in ways
not predicted by the Version N schema author.
4. Data items that are not known and optional can be added
without breaking compatibility (same as #2?)
5. (N/A if validation requirement is relaxed)
and we can add a further requirement that may have been implicit earlier:
6. A machine-processable document can clearly indicate what
data items (and types) are expected
then I think RDF meets the requirements nicely.
>The hard bit is what "valid" means. . . . .
+1
In short, the ability to have your XML parser validate incoming messages
against a schema is nice, and can be useful. But it is only a part of the
story, and IMO it is a much less important part than other considerations.
--
David Booth
W3C Fellow / Hewlett-Packard
Telephone: +1.617.253.1273
Received on Friday, 27 February 2004 17:34:26 UTC