- From: David Booth <dbooth@w3.org>
- Date: Fri, 27 Feb 2004 17:34:22 -0500
- To: Bijan Parsia <bparsia@isr.umd.edu>, "David Orchard" <dorchard@bea.com>
- Cc: <www-ws-desc@w3.org>, "'Mark Baker'" <distobj@acm.org>,
Bijan, While I agree with your observation that RDF does not help you do data validation, to some extent I think it misses the point. IMO, the most important use of XML schemas in WS is not to facilitate runtime message validation by an XML parser. It is to clearly indicate the intended message format, so that requester and provider entities can be sure that their corresponding software agents will know: (a) how to generate the right data format; (b) how to access the data that they need (i.e., where in the parse tree); and (c) what datatype it is. IMO, runtime message validation by an XML parser is a distant secondary consideration. Why? Because nearly every responsible application MUST do its own application-specific data validation anyway, before it makes use of the incoming data. An XML schema in conjunction with a validating parser can somewhat reduce the data validation burden, but in general cannot eliminate it. Furthermore, there is a risk to doing part of the validation in the XML parser: The code that performs the app-specific data validation is separated from the XML Schema, and this means that they can get out of sync if the schema is later modified. Hence, it is safest if the code does *not* rely on the XML parser to validate. For this and performance reasons, many production XML applications turn off parser validation. Returning to the list above, an XML schema is very good for indicating the location of data items in the parse tree, along with their data types. But if RDF is used, then the format of the data is largely a non-issue, and RDF can also indicate the data types. More comments in line . . . >At 11:59 AM 2/15/2004 -0500, Bijan Parsia wrote: >> >>On Feb 15, 2004, at 1:31 AM, David Orchard wrote: >> >>. . . >>Well, the reason that I want "ignore unknowns" is because I know that >>"ignore unknowns" has been deployed on the web for >10 years and it works >>for versioning. If there's another solution, I'm really really really >>interested in it. > >The extra bit, perhaps, is the validation. Although required known fields >are ubiquitous where you have ignore unknowns :) +1 The reason the "ignore unknown" rule for optional extensions works so well for so many languages is that they have implicitly been making two very important assumptions: (a) that a statement in a given language represents a set of assertions; and (b) that the language makes the "open world" assumption. (The "open world assumption" means that a statement in the language is not assumed to tell you *everything* that could possibly be true about its subject. There may be additional things true about its subject (for example, additional behavior) beyond what the statement asserts.) Of course, sets of assertions in a open world are what RDF is all about, so this comes naturally to RDF. >> . . . >>So, what are the requirements: >>1. Types that are valid have type information >>2. Types that are not known do not break validation >>3. Types allow for arbitrary extensibilty in ways not predicted by the >>Version N schema author. >>4. Types that are not known and optional can be added without breaking >>compatibility (same as #2?) >>5. Types that are known and not allowed break validation. >> >>Assuming that these are roughly the requirements for doing compatibile >>versioning, Bijan, what would the RDF/XML look like to express these >>assurances? > >Can't. Not even with OWL. . . . . But if you relax the validation requirement, and instead focus on the twin tasks of (1) ensuring that the parties agree on what data to expect, and (2) *processing* the data; then I think RDF *does* meet these requirements very well. In other words, if we re-write the above requirements as: 1. Data items that are expected have type information 2. Data items that are not expected do not break processing 3. Input data allows for arbitrary extensibilty in ways not predicted by the Version N schema author. 4. Data items that are not known and optional can be added without breaking compatibility (same as #2?) 5. (N/A if validation requirement is relaxed) and we can add a further requirement that may have been implicit earlier: 6. A machine-processable document can clearly indicate what data items (and types) are expected then I think RDF meets the requirements nicely. >The hard bit is what "valid" means. . . . . +1 In short, the ability to have your XML parser validate incoming messages against a schema is nice, and can be useful. But it is only a part of the story, and IMO it is a much less important part than other considerations. -- David Booth W3C Fellow / Hewlett-Packard Telephone: +1.617.253.1273
Received on Friday, 27 February 2004 17:34:26 UTC