Re: SV: What's a valid instance...James Clark from noah_mendelsohn@us.ibm.com on 2007-03-22 (xmlschema-dev@w3.org from March 2007)

From: <noah_mendelsohn@us.ibm.com>
Date: Thu, 22 Mar 2007 10:50:51 -0400
To: "Bryan Rasmussen" <BRS@itst.dk>
Cc: "Pete Cordell" <petexmldev@tech-know-ware.com>, xmlschema-dev@w3.org
Message-ID: <OF7B34CDEE.3CDA4C4F-ON852572A6.0050A152-852572A6.00519051@lotus.com>
I believe that in your example, the <bar/> is not valid, because there's 
no global declaration for it.  James is right, though, that if both <foo> 
and <bar> are declared global, there's no standard way to indicate in a 
schema document that one or the other is always the root.   If you made 
the schema:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema";
  elementFormDefault="qualified"
  xmlns="http://www.example.com";
  targetNamespace="http://www.example.com";>

  <xs:element name="foo">
    <xs:complexType/>
  </xs:element>

  <xs:element name="bar">
    <xs:complexType/>
  </xs:element>

</xs:schema>

it would indeed validate either <foo/> or <bar/>.  I expect that's James' 
concern.

This has been discussed many times.  In my mind, it was a tradeoff.  By 
allowing validation to begin anywhere, we made it easier to talk about 
scenarios in which you want to do incremental validations of parts of a 
document.  If we had added a distinguished <xsd:root>, which we could have 
done or could still do, we'd essentially just need another switich on the 
api which would be along the lines of boolean 
insistWereValidatingFromRoot.  You'd set that true when validating a whole 
document, false when knowingly validating a part.   Also, some 
declarations can be a root in some contexts, and not in others.  Consider 
a purchase order carried as the body of a SOAP envelope:  it's likely the 
root if it's being checked as it's being prepared, it may or may not be a 
child of <soap:body> if it's checked in the context of its containing 
message.

Still, James is very right:  we've missed the opportunity to indicate in 
the schema document what the expected root is, and that can indeed be 
useful sometimes.  Of all the other interesting tradeoffs between RelaxNG 
and W3C XML Schemas, I'm a little surprised to see that one suggested as 
most significant in determining a choice by the IEEE.    FYI, the 
pertinent part of the specification reads [1]:

--------
5.2 Assessing Schema-Validity
With a schema which satisfies the conditions expressed in Errors in Schema 
Construction and Structure (§5.1) above, the schema-validity of an element 
information item can be assessed. Three primary approaches to this are 
possible: 
1 The user or application identifies a complex type definition from among 
the {type definitions} of the schema, and appeals to Schema-Validity 
Assessment (Element) (§3.3.4) (clause 1.2);
2 The user or application identifies a element declaration from among the 
{element declarations} of the schema, checks that its {name} and {target 
namespace} match the [local name] and [namespace name] of the item, and 
appeals to Schema-Validity Assessment (Element) (§3.3.4) (clause 1.1);
3 The processor starts from Schema-Validity Assessment (Element) (§3.3.4) 
with no stipulated declaration or definition, and either ·strict· or ·lax· 
assessment ensues, depending on whether or not the element information and 
the schema determine either an element declaration (by name) or a type 
definition (via xsi:type) or not.
The outcome of this effort, in any case, will be manifest in the 
[validation attempted] and [validity] properties on the element 
information item and its [attributes] and [children], recursively, as 
defined by Assessment Outcome (Element) (§3.3.5) and Assessment Outcome 
(Attribute) (§3.2.5). It is up to applications to decide what constitutes 
a successful outcome.
--------

Note that in all of these, the application invoking the schema processor 
indicates the element to be validated.  For many, many applications, it is 
easy and natural to ensure that it's the expected root (<html> for a 
browser, <soap:envelope> for something processing a soap:container, 
whatever child of the <soap:body> if that's being validated separately, 
etc.)

Noah

[1] http://www.w3.org/TR/2004/PER-xmlschema-1-20040318/#validation_outcome

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------








"Bryan  Rasmussen" <BRS@itst.dk>
Sent by: xmlschema-dev-request@w3.org
03/22/2007 10:13 AM
 
        To:     "Pete Cordell" <petexmldev@tech-know-ware.com>, 
<xmlschema-dev@w3.org>
        cc:     (bcc: Noah Mendelsohn/Cambridge/IBM)
        Subject:        SV: What's a valid instance...James Clark



My understanding was he was partially correct because a Processor, 
dependent
on user input not specifying, could decide on whether to do lax validation 
or
not, for example XSV would I believe do as he indicated because it would
decide to validate laxly. 

But I don't rmemeber this rather maddening part of the past discussion 
well. 

It is I suppose considered up to the user to determine if their response 
is
no errors by lax validation or no errors by strict validation which would
probably lead to a bunch of users doing this incorrectly. 

At any rate I always require strict validation. 

Cheers,
Bryan Rasmussen

-----Oprindelig meddelelse-----
Fra: xmlschema-dev-request@w3.org
[mailto:xmlschema-dev-request@w3.org]På vegne af Pete Cordell
Sendt: 22. marts 2007 14:23
Til: xmlschema-dev@w3.org
Emne: What's a valid instance...James Clark



I thought I would ask why the IETF is inclined to move away from W3C XML 
Schema, and one of the members kindly directed me towards the following 
e-mail by James Clark:

http://www.imc.org/ietf-xml-use/mail-archive/msg00217.html

One of the issues he states is (best to just quote it):

--------------------from James' e-mail--------------------

7. In W3C XML Schema there is no way to specify what is allowed as the
root element.  W3C XML Schema does not define a single notion of
validity of a document with respect to a schema.  There are different
varieties of validation (lax and strict) and many different ways to
validate a document against a schema.  From a W3C XML Schema alone, it
is not possible to know what it is a valid document.

For example, consider a totally trivial schema like this:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema";
  elementFormDefault="qualified"
  xmlns="http://www.example.com";
  targetNamespace="http://www.example.com";>

<xs:element name="foo">
  <xs:complexType/>
</xs:element>

</xs:schema>

Now consider a totally bogus document like this:

<bar/>

Believe it or not, the W3C XML Schema processors that I have tried
report this as valid!  The definition of validity is so flexible in
W3C XML Schema as to seriously impact interoperability.  If an
application was relying on the W3C XML Schema validation to screeen
out incorrect input, it would be in serious trouble.

With RELAX NG, this sort of bogosity does not arise: ...

--------------------end of from James' e-mail--------------------

(I didn't really need to include the last line, but just love the word 
'bogosity!')

By my understanding he is wrong on this.  The only valid document 
(ignoring 
whitespace) would surely be:

    <foo xmlns="http://www.example.com"/>

Trouble is, James Clark is one of those people who I would imagine is 
rarely 
wrong, so I thought I'd better ask!

Many thanks,

Pete.
--
=============================================
Pete Cordell
Tech-Know-Ware Ltd
for XML to C++ data binding visit
http://www.tech-know-ware.com/lmx/
http://www.codalogic.com/lmx/
=============================================
Received on Thursday, 22 March 2007 14:51:19 UTC