where is identity of schema documents/components defined?

XML Schema Part 1: Structures, at the end of part 4.2.1, says:

   Note: The above is carefully worded so that multiple <include>ing
   of the same schema document will not constitute a violation of
   clause 2 of Schema Properties Correct (§3.15.6), but applications
   are allowed, indeed encouraged, to avoid <include>ing the same
   schema document more than once to forestall the necessity of
   establishing identity component by component.

What definition of "same document" does XML Schema use?

For example, in this example:

   at http://somewhere/A.xsd:

   <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" >
     <xsd:include schemaLocation="B.xsd" />
     <xsd:include schemaLocation="./B.xsd" />
   </xsd:schema>

   at http://somewhere/B.xsd:

   <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" >
     <xsd:element name="e"/>
   </xsd:schema>

how exactly do you decide whether there is a duplicate declaration of
element "e" or not?

Is a schema parser/checker/processor supposed to:
- recognize that the URI references "B.xsd" and "./B.xsd" both resolve
   to the same (non-relative) URI (http://somewhere/B.xsd),
- consider the second include element to refer to the same document
   that was already included by the first include element (and probably
   not read it again or at least track that it's re-reading previously
   read declarations and not reading new declarations), and
- not consider there to be two declarations of element "e"?

I know that the definition of sameness is a tricky issue in XML and the
web world (e.g., XML Namespace names are compared before and relative-
relative resolution but using relative references is deprecated).

Where is the line drawn in XML Schema for considering two references
to refer to the same document?

- Is it by URI references before URI resolution?  (I assume not.)

- Is it by (non-relative) URI after resolution of any relative
   references?  (I assume at least such URI resolution.  A further
   question is whether URI character strings are compared directly
   or whether escape sequences should be considered (and then whether
   Unicode equivalences should be considered (I assume not))).)

- Is it by identical content (e.g., at the level of characters or of
   the XML set)?  (E.g., what if the include elements referred to "B.xsd"
   and "C.xsd" and reading http://.../C.xsd returned the exact same HTTP
   response (body and even headers) as reading http://.../B.xsd did.)
   (I assume not.)

Since the quote at the top is just a note, the real question, instead
of the definition of the phrase "same document" used in the note but
apparently not elsewhere, is probably:  Where and how exactly does the
XML Schema specification address this issue?  If you encounter
"<element name='e' />" and then encounter "<element name='e' />" again,
what determines whether you have encountered the same thing a second
time (not a duplicate and not an error) or whether you have encountered
a second thing (a duplicate and an error)?

Thanks,
Daniel

Received on Thursday, 28 April 2005 15:07:22 UTC