Scope of IDs from noah_mendelsohn@us.ibm.com on 2003-08-14 (public-xml-id@w3.org from August 2003)

From: <noah_mendelsohn@us.ibm.com>
Date: Thu, 14 Aug 2003 11:24:54 -0400
To: public-xml-id@w3.org
Cc: xml-dist-app@w3.org, lehors@us.ibm.com, w3c-xml-schema-ig@w3.org
Message-ID: <OFBFD80434.F39EC04E-ON85256D82.005154E6@lotus.com>
Congratulations on the publication of the ID Requirements working draft 
[1].  Although this comment relates in part to SOAP, it is my personal 
opinion and does not represent an official position of the XMLP WG (or my 
employer, for that matter).

My concern is that the working draft should clarify the scope of the ID 
mechanisms to be considered, and thus the should make clear the use cases 
to be handled.

Traditional XML IDs are effectively scoped to the document:  ID's must be 
unique to the document, and IDREFs or fragment identifiers "targeting" 
such an ID are basically used to pick out markup within the document.  The 
situation in XML Schema is somewhat complicated by the fact that Schema 
validation can in principle be attempted on any particular element 
information item [3], but schema does its best to reproduce the document 
scoping of XML [4].  In the common case where the root of the schema 
assessment is the root element of a document, the xsd:ID type has 
uniqueness constraints and reference scope similar to those of XML id.

By contrast, and I think this is the source of some confusion, SOAP IDs 
[5] are not scoped to a whole document, at least not quite in the sense 
above.  SOAP IDs are an aspect of and scoped to the SOAP encoding, which 
must be "activated" by use of an "encodingStyle"  attribute.  Thus in the 
following document, only some of the ids and hrefs can be used (comments 
in the document show which are OK).

<soap:Envelope>

  <!-- NOT OK:  no IDs on Envelope Markup -->
  <soap:Header enc:id="E">
  </soap:Header>

  <soap:Body>
   <myelement1 
soap:encodingStyle="http://www.w3.org/2003/05/soap-encoding">
     <!-- OK, in the scope of encoding style--> 
     <a enc:id="A"/>
     <!-- OK --> 
     <q enc:href="A"/>
   </ymelement1>

   <myelement2>
     <!-- NOT OK: not in scope of SOAP encoding --> 
     <x enc:id="x"/>
   </myelement2>
 
   <myelement3 
soap:encodingStyle="http://www.w3.org/2003/05/soap-encoding">
     <!-- OK --> 
     <b enc:id="B"/>
   <myelement3 >

   <myelement4 >
     <!-- NOT OK: not in scope of SOAP encoding --> 
     <r enc:href="A"/>
   </myelement3 >

  </soap:Body>

</Soap:Envelope>

For the record, such tangled activation and deactivation of encoding would 
be very unlikely in practice,  Most users open the encoding scope once and 
do all their work within that.  Still, the key point is that the whole 
purpose of SOAP ids is different from XML and Schema IDs, or at best it's 
a constrained use of the concept.  The purpose of SOAP IDs is to mark up a 
particular type of graph encoding, and the SOAP syntax says that such 
graph nodes may be defined only by markup within the scope of a suitable 
encoding style.  To see why this is important, consider a situation in 
which someone someday invents a second encoding style, perhaps for a 
slightly different sort of graph, and both are used in the same document:

<soap:Envelope>

  <soap:Header >
   <myheader soap:encodingStyle="http://www.w3.org/2003/05/soap-encoding">
     <!-- OK --> 
     <a enc:id="A"/>
     <!-- OK --> 
     <q enc:href="A"/>
   </myheader>
  </soap:Header>

  <soap:Body>
   <myelement2 
soap:encodingStyle="http://www.w3.org/2006/01/NEWENCODING2">
     <!-- OK --> 
     <a enc2:id="A"/>
     <!-- OK --> 
     <q enc2:href="A"/>
   </myelement2>
  </soap:Body>

</Soap:Envelope>

Here we have two completely disjoint graphs, one in myheader encoded with 
today's encoding, another in the body encoded with the new encoding.  The 
specification for NEWENCODING2 has the freedom to define it's own 
reference markup (enc2:id, enc2:href) and to decide that it represents an 
ID scope disjoint from the one defined by the current soap encoding. Thus, 
the fact that the body uses another id="a" might not be a conflict.  This 
is in fact very handy if SOAP messages are to be composed modularly;  it 
would be a nuisance to have to rewrite all the IDs in the body just to 
avoid conflict with the header. 

Thus, SOAP IDs are really solving a different (though related) problem 
from XML IDs, which is why in my opinion the XMLP group adopted its own 
attributes.  I think it would be helpful if your requirements document 
acknowledged these sorts of use cases and made clear whether they are in 
or out of scope for your work.  My suggestion would be:  you should focus 
mainly on document scope and traditional uses of XML ID;  you should 
consider use cases like SOAP at least enough to convince yourself that 
separate mechanisms are indeed the right way to solve SOAP's problem.  If 
so, say so and keep working on the document level problem.  If you decide 
that a unifying mechanism is appropriate for all such use cases, then I 
think you have to take on the whole problem.  In case, I think you should 
set a requirement or goal to clearly outline in any final recommendation 
or finding the intended range of uses for any mechanisms you might 
propose.

Thank you as always for the opportunity to comment.  I am taking the 
liberty of cross posting to both the schema and distApp mailing lists, but 
I suggest that further responses be sent only to public-xml-id to avoid a 
cross-posting-mess. 

Noah

[1] http://www.w3.org/TR/2003/WD-xml-id-req-20030806/
[2] http://www.w3.org/TR/REC-xml#id
[3] http://www.w3.org/TR/xmlschema-1/#validation_outcome
[4] http://www.w3.org/TR/xmlschema-1/#cvc-id
[5] http://www.w3.org/TR/soap12-part2/#encodingedgesandnodes

------------------------------------------------------------------
Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142
------------------------------------------------------------------
Received on Thursday, 14 August 2003 11:28:14 UTC