- From: <noah_mendelsohn@us.ibm.com>
- Date: Sun, 24 Apr 2005 08:47:58 -0400
- To: Dan Connolly <connolly@w3.org>
- Cc: www-tag@w3.org
Dan Connolly wrote: > Noah said he could see both sides... > [[ NM: I think both views of this are right, > there is a case to be said that the infoset way is > architecturally better ... Yes, because it allows our recommendations to apply to in memory representations which have never been serialized, or to other cases in which you wish a non-XML serialization of the data. OTOH, Dan is right as > well and we need to provide [details missing] ]] > -- http://www.w3.org/2001/tag/2005/04/19-minutes#item05 > I think the [details missing] was something about > interoperability at the bytes-on-the-wire level. Right, that's what I meant. Wire-level or file-format-level definitions are what's needed to achieve actual interoperation. I believe Mike Champion covered this well in his followup note. Such interoperation is critical to the success of the Web and of XML. That's exactly what's "threatened" by the promotion of Binary XML as an alternative serialization: existing implementations that work with XML today will fail to interoperate with this new form of XML. Your discussion of XML schema seems to miss the simplest sense in which XML Schema is Infoset-based: the instances to be validated are modeled as Infosets. Consider, for example, an XQuery-based system in which it might be asserted that certain XML fragments resulting from a query are to be schema-valid per some element declaration. Because, the Schema Recommendation is Infoset-based, the validation can be performed against any in memory representation that might be convenient. If the Schema Recommendation were written to validate only XML documents, then the impementation would either have to actually serialize the result to enable validation, or would have to ensure that its implementation produced results that are the same as if serialization had been done. In any case, my reference to Infosets was primarily with regard to the data to be validated, and only indirectly to the representations of schema documents themselves. Since you've also gone into the latter, here are a few additional comments on your note: > I just happened to be looking at how URIs interact > with XML specifications, and I discovered > (rediscovered?) that XML Schema has conformance > clauses at three levels: > > (1) the component level, where even the > infoset representation of a > schema is abstracted away Yes, but I think it's worth using the terminology of the recommendation, and then clarifying a few details. The schema Recommendation establishes the following terminology: schema: the information needed (in addition to the instance itself) to perform a validation. Note that a single schema integrates declarations for multiple namespaces, and for non-namespaced constructs in a unform and symmetrical way. Each validation uses one schema and one instance, regardless of the number of namespaces involved. component: the schema is organized into components, mostly for reasons of clarity. Thus, there is a component for each element declaration, each type, etc. Like the schema as a whole, components are abstract. They tell you the information you need to perform a validation, not the form in which that information is to be stored or communicated. As an example, if you know the qualified name of an element and the fact that its type is xsd:integer (and a few other bits), you can validate the element as an integer. At this level, we don't constrain the manner in which you set down or communicate the name and type of the element. schema document: an element information item with qualified element name <xsd:schema> (using usual namespace bindings). We further state that in the common case where such an element is the root element of an XML document, and where that document is "on the web" (has been given a URI as opposed to, say, being offered directly through a Java InputStream), the media type should be "application/xml". With that background, I can quibble a bit with the above. Saying "abstracted away" implies that there was in all cases an Infoset from which to abstract, but that is not the primary focus of this level of conformance. The primary focus of establishing a level of component-based conformance is to deal with the (less common) case where schema information has been directly synthesized using some form other than schema documents. Imagine a dynamic API along the lines of "createElementDeclaration"; as long as it lets you specify the element name, the type, and whatever else the component requires, our Recommendation applies. In such cases, the Infoset has not been abstracted away, because it never existed. > > (2) "conformance to the XML Representation > of Schemas" which is actually at the > infoset level Right. This is actually about Schema Documents, and these are organized by target namespace. Note that, when inheritance across namespaces is involved, much of the information for a given component may be inherited from components not overtly declared in the document in hand. This is another sense in which the use of the phrase "abstracted away" is a bit misleading. The component constructed from the markup in a given schema document (infoset) may have information well beyond that found in the markup. That inherited information may have come from other schema documents, or from synthetic components (e.g. from the API postulated above). > plus another that we didn't get into in the > teleconference: > > (3) it has an explicit conformance clause for > processors that aren't running on some > disconnected LAN that has its own DNS root, > but have access to to the captial-I > Internet. I'd have to check, but I don't think we said it quite that way (I'm in the car at the moment and can't easily get to the details). As I recall, we turn that logic upside down relative to your summary and deal first with cases where DNS is not an issue at all. We start with a general discussion of the case where there is a schema document infoset, and thus a corresponding XML 1.0 serialization as a schema document. We can deal with such documents regardless of whether they have ever been given a URI and have in that sense been on the Web at all. For example, if you built a Java-based system and just used ordinary Java filesystem I/O to access the XML streams, we would consider those conforming schema documents and the recommendation would apply. Likewise for a relational database that stored such schema documents in tables, and named them with (non-URI-based) primary keys. Since processors have discretion to find such documents in processor-specific manner, we don't have to say anything about how one processor or another chooses files to use for its schemas. With that layer of conformance in hand, we add the web on top. We say that there is a particular but very important case where the documents have been given URIs and are accessible through the mechanisms of the Web. In this case we call for use of media type application/xml, and for the usual mechanisms of the Web to be used for retrieval. We call this third level "fully conforming", in part to encourage its use.. I'm not sure I see where you are picking up a suggestion to use private DNS roots. There is the case where you have referenced schema documents in an instance or schema by URI using schemaLocations. I don't think we particularly encourage the use of private DNS roots, except insofar as we recognize that certain disconnected systems may wish to have rather specially managed proxy caches of schema documents. So, it would be reasonable for a relational database to maintain in its store a set of {URI,schema-document} pairs to be used as caches for representations of the named documents. I don't >think< that implies a private DNS root. I believe that such systems can be considered fully conforming insofar as the caches are legitimate proxies for the schema document web resources. > Interesting stuff. > http://www.w3.org/TR/xmlschema-1/#concepts-conformance Not "interesting" in the sense of the Confuscian curse, I hope? Noah -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 --------------------------------------
Received on Sunday, 24 April 2005 12:48:08 UTC