- From: <noah_mendelsohn@us.ibm.com>
- Date: Sun, 24 Apr 2005 08:47:58 -0400
- To: Dan Connolly <connolly@w3.org>
- Cc: www-tag@w3.org
Dan Connolly wrote:
> Noah said he could see both sides...
> [[ NM: I think both views of this are right,
> there is a case to be said that the infoset way is
> architecturally better ...
Yes, because it allows our recommendations to apply to in memory
representations which have never been serialized, or to other cases in
which you wish a non-XML serialization of the data.
OTOH, Dan is right as
> well and we need to provide [details missing] ]]
> -- http://www.w3.org/2001/tag/2005/04/19-minutes#item05
> I think the [details missing] was something about
> interoperability at the bytes-on-the-wire level.
Right, that's what I meant. Wire-level or file-format-level definitions
are what's needed to achieve actual interoperation. I believe Mike
Champion covered this well in his followup note. Such interoperation is
critical to the success of the Web and of XML. That's exactly what's
"threatened" by the promotion of Binary XML as an alternative
serialization: existing implementations that work with XML today will
fail to interoperate with this new form of XML.
Your discussion of XML schema seems to miss the simplest sense in which
XML Schema is Infoset-based: the instances to be validated are modeled as
Infosets. Consider, for example, an XQuery-based system in which it
might be asserted that certain XML fragments resulting from a query are to
be schema-valid per some element declaration. Because, the Schema
Recommendation is Infoset-based, the validation can be performed against
any in memory representation that might be convenient. If the Schema
Recommendation were written to validate only XML documents, then the
impementation would either have to actually serialize the result to enable
validation, or would have to ensure that its implementation produced
results that are the same as if serialization had been done. In any
case, my reference to Infosets was primarily with regard to the data to be
validated, and only indirectly to the representations of schema documents
themselves.
Since you've also gone into the latter, here are a few additional comments
on your note:
> I just happened to be looking at how URIs interact
> with XML specifications, and I discovered
> (rediscovered?) that XML Schema has conformance
> clauses at three levels:
>
> (1) the component level, where even the
> infoset representation of a
> schema is abstracted away
Yes, but I think it's worth using the terminology of the recommendation,
and then clarifying a few details. The schema Recommendation establishes
the following terminology:
schema: the information needed (in addition to the instance itself) to
perform a validation. Note that a single schema integrates declarations
for multiple namespaces, and for non-namespaced constructs in a unform and
symmetrical way. Each validation uses one schema and one instance,
regardless of the number of namespaces involved.
component: the schema is organized into components, mostly for reasons of
clarity. Thus, there is a component for each element declaration, each
type, etc. Like the schema as a whole, components are abstract. They
tell you the information you need to perform a validation, not the form in
which that information is to be stored or communicated. As an example, if
you know the qualified name of an element and the fact that its type is
xsd:integer (and a few other bits), you can validate the element as an
integer. At this level, we don't constrain the manner in which you set
down or communicate the name and type of the element.
schema document: an element information item with qualified element name
<xsd:schema> (using usual namespace bindings). We further state that in
the common case where such an element is the root element of an XML
document, and where that document is "on the web" (has been given a URI as
opposed to, say, being offered directly through a Java InputStream), the
media type should be "application/xml".
With that background, I can quibble a bit with the above. Saying
"abstracted away" implies that there was in all cases an Infoset from
which to abstract, but that is not the primary focus of this level of
conformance. The primary focus of establishing a level of component-based
conformance is to deal with the (less common) case where schema
information has been directly synthesized using some form other than
schema documents. Imagine a dynamic API along the lines of
"createElementDeclaration"; as long as it lets you specify the element
name, the type, and whatever else the component requires, our
Recommendation applies. In such cases, the Infoset has not been
abstracted away, because it never existed.
>
> (2) "conformance to the XML Representation
> of Schemas" which is actually at the
> infoset level
Right. This is actually about Schema Documents, and these are organized
by target namespace. Note that, when inheritance across namespaces is
involved, much of the information for a given component may be inherited
from components not overtly declared in the document in hand. This is
another sense in which the use of the phrase "abstracted away" is a bit
misleading. The component constructed from the markup in a given schema
document (infoset) may have information well beyond that found in the
markup. That inherited information may have come from other schema
documents, or from synthetic components (e.g. from the API postulated
above).
> plus another that we didn't get into in the
> teleconference:
>
> (3) it has an explicit conformance clause for
> processors that aren't running on some
> disconnected LAN that has its own DNS root,
> but have access to to the captial-I
> Internet.
I'd have to check, but I don't think we said it quite that way (I'm in the
car at the moment and can't easily get to the details). As I recall, we
turn that logic upside down relative to your summary and deal first with
cases where DNS is not an issue at all. We start with a general
discussion of the case where there is a schema document infoset, and thus
a corresponding XML 1.0 serialization as a schema document. We can deal
with such documents regardless of whether they have ever been given a URI
and have in that sense been on the Web at all. For example, if you built
a Java-based system and just used ordinary Java filesystem I/O to access
the XML streams, we would consider those conforming schema documents and
the recommendation would apply. Likewise for a relational database that
stored such schema documents in tables, and named them with
(non-URI-based) primary keys. Since processors have discretion to find
such documents in processor-specific manner, we don't have to say anything
about how one processor or another chooses files to use for its schemas.
With that layer of conformance in hand, we add the web on top. We say
that there is a particular but very important case where the documents
have been given URIs and are accessible through the mechanisms of the Web.
In this case we call for use of media type application/xml, and for the
usual mechanisms of the Web to be used for retrieval. We call this third
level "fully conforming", in part to encourage its use..
I'm not sure I see where you are picking up a suggestion to use private
DNS roots. There is the case where you have referenced schema documents
in an instance or schema by URI using schemaLocations. I don't think we
particularly encourage the use of private DNS roots, except insofar as we
recognize that certain disconnected systems may wish to have rather
specially managed proxy caches of schema documents. So, it would be
reasonable for a relational database to maintain in its store a set of
{URI,schema-document} pairs to be used as caches for representations of
the named documents. I don't >think< that implies a private DNS root. I
believe that such systems can be considered fully conforming insofar as
the caches are legitimate proxies for the schema document web resources.
> Interesting stuff.
> http://www.w3.org/TR/xmlschema-1/#concepts-conformance
Not "interesting" in the sense of the Confuscian curse, I hope?
Noah
--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
Received on Sunday, 24 April 2005 12:48:08 UTC