W3C home > Mailing lists > Public > w3c-dist-auth@w3.org > October to December 2003

RE: How to use DTDs, or not to (was: RE: ACL and lockdiscovery)

From: Dennis E. Hamilton <dennis.hamilton@acm.org>
Date: Fri, 10 Oct 2003 13:45:13 -0700
To: "Julian Reschke" <julian.reschke@gmx.de>, "Lisa Dusseault" <lisa@xythos.com>, "'Geoffrey M Clemm'" <geoffrey.clemm@us.ibm.com>, <w3c-dist-auth@w3.org>
Message-ID: <FFEPLLNFAHGBKNENFGPAGECLDDAA.dennis.hamilton@acm.org>

Ah well, I have learned just enough to be dangerous.

Julian, thanks for the response.  

I have continued to look at this since I wrote yesterday's note.  Here are responses to your points, followed by a brain dump about reconciliation of DAV and XML:

1.	Yes, I forgot that DAV: is also the URI that is spoken of as the "DAV namespace." 

1.1	I find it better to refer to the "DAV namespace" or "DAV namespace URI" rather than the "DAV: namespace" especially because namespaces are often versioned (rather than revised [;<), so the identification in XML will be with different URIs over time.  Ditto for the lock-token namespace.

1.2	In the context of this discussion, the DAV namespace is not a barrier to having a DTD.  It is prefixes (that is QNames) that must be dealt with carefully when a DTD is used.  However, this is completely possible, as the XML Schema folk demonstrated.

2.	You have your finger on a critical and important point.  

2.1	It is not possible to create a generic DTD that can be used for validation of any DAV XML 1.0 "document".  That's specifically because of the RDF hack and the ability to ad lib property QNames.  (I call anything where there is an application technique that maps QNames to URIs as the RDF hack, since RDF seems to have been the first to do it. This creates a variety of problems, some of which are noted in a recent review of the latest RDF working documents.  See <http://lists.w3.org/Archives/Public/www-rdf-comments/2003OctDec/0017.html>.  I don't want to go deeper into this, though it has become an interoperability concern because of a problem concerning when the special application knowledge has to be applied to handle an RDF (or any RDF hack) correctly. [Aside: There is now an approach to (lexical) datatypes in the latest RDF working documents that could well be adapted to DAV property values at some point.]

2.2	It is possible to create a DTD for some specific (family of) DAV XML 1.0 "document".  It might be a loose DTD, in that there are more XML 1.0 documents that satisfy it than are acceptable as DAV XML "documents."  

2.3	It might not be possible to express the content model of <dav:prop> for other reasons, at least as an accurately-constrained grammar.  XML is rather limited in the power of language that can be defined by markup declarations.  Although it looks equivalent to BNF, it isn't.  There are further constraints that limit XML markup vocabularies to ones that can always be handled by greedy, non-backtracking parsers.  The limitations on mixed content models reflect these required simplifications.  Nevertheless, (2.1) is compelling in the case of DAV application of XML 1.0.

2.4	It is important to keep case (2.1) and  case (2.2) distinct.  With regard to wanting valid XML documents, if that is indeed the desire, there is a broader question.  Namely,


3.	What's a DAV XML 1.0 "document"?

3.1	I have been using quotes because there is the notion of an XML 1.0 document, and then there are notions of applications where XML, including XML documents, may occur.  

3.2	What I don't find in WebDAV (I haven't looked at 2518bis) is any clear specification of the relationship of DAV request bodies, DAV response bodies, and, for that matter, DAV documents-as-content, as an application of XML 1.0.

3.3	In the examples in the DAV specification, an XML declaration (<?xml version="1.0" ...?>) is always present.  Although not required by the XML 1.0 Specification, this is tantamount to a declaration that an XML 1.0 document is present (or something equivalent to an XML external entity).  This is a strong statement with regard to what one can count on:
	3.3.1	The only entity declarations that may be presumed by default 
	3.3.2 The well-formedness condition (and existence of a single, though un-named, root element)
	3.3.3 The reservation of attributes and elements with names of the form xml... (including all capitalizations) as exclusively defined by the XML specification or other approved specifications by the XML authority (whoever that might be) -- that is, use of xmlns in all of its variants, use of xml:lang, xml:base the one for whitespace control, etc.
	3.3.4 The definition of what a Document Type Declaration (the <!DOCTYPE ...> structure) is and the rules for markup declarations (for element types, attribute types, entity references, ...).

3.4	An example of (3.2) is the fact that it is not clear that the XML declaration is required for DAV XML documents and whether a Document Type Declaration is forbidden (as it is for SOAP, if I recall correctly).  I can't find where the DAV specification settles the matter, so I don't know if a DAV processor is expected to deal with it if it is there, and whether or not validation is expected or not.

3.5	Because DAV XML 1.0 documents are for an application of XML, and DAV processors are expected to deal with some specific things (such as the DAV version of the RDF hack), there are further considerations.

	3.5.1 These are out-of-band agreements that are not part of XML 1.0 compliance but that impact how the DAV use of XML is to be interpreted.  An important consideration is whether the application is presumed to follow the basic XML processing.  That is, the application could be viewed as fed by an XML processor that is designed to operate in an application-ignorant manner.  

	3.5.2 There is some application context of a transport nature that applies even to the XML 1.0 embedding in HTTP header and response bodies.  This is a problem that SOAP has addressed (I am not sure "dealt with" holds, though) and the SOAP HTTP binding and the WS-I Base Profile 1.0a are worthy of review with regard to coherence between HTTP, MIME types, and DAV XML 1.0 bodies.  (I keep thinking there is a mistake in the WS-I conclusion about byte-order marks, but it is more valuable to have a standard that can be consistently followed than be right about that.  There is a miss-reading of the XML 1.0 specification about this, and it has been perpetuated in the folklore around SOAP.)  

	3.5.3 Side Note: There is no prohibition on unusual encodings for XML documents.  The DAV specification goes too far in presuming there is a limitation.  There is a requirement that a processor always support utf-8 and utf-16.  In the absence of an encoding declaration [in the XML declaration or from context] it must be one or the other.  Note that the default encoding for MIME content type text/xml is ASCII. Note further that encoding="ASCII" does not entail encoding="utf-8".  And all of the DAV specification examples do match up encodings properly.    Whatever the encoding methodology, XML 1.0 documents are always taken to be expressed in Unicode:  That is, the abstraction of the character stream out of the medium is always Unicode.  This means that encoding characters that have no counterpart in (the XML-specified subset of) Unicode may not be used.  This is important with regard to XML 1.0 processors which should probably only see Unicode, on the way in and on the way out.  It would seem that there is a minimum subset of Unicode that any encoding must have corresponding character codes for in order to carry arbitrary XML 1.0 documents at all.

	3.5.4 Because DAV XML 1.0 is an application of XML, it is wise to consider that all XML well-formedness and any DTD validation (if invited) or non-validation (even with a Document Type Declaration present) rules will have been carried out first, before the XML stream is delivered for further application employment (e.g, via a validating DOM processor).  And there may be a desire to carry DAV XML 1.0 in some neutral way as pure XML 1.0 documents.  So there is need for care here.

	3.5.5 I am unclear on its status, but there is a proposal (at least an I-D) to use MIME-type designations like application/xml+DAV as a way to signify that there is an application-processing agreement applicable to the XML being carried in the medium.  This, by the way, would be useful as a type for DAV XML 1.0 documents as content, as well as an useful signal in content-type headers.  (Something for the DAV job jar.)

3.6	Coming back to the specific case of DTD, 

	3.6.1	It might be more appropriate to specifically reject the use of Document Type Declarations in DAV XML 1.0 documents that are used in HTTP bodies and headers and certainly for a MIME type that asserts DAV application.  

	3.6.2 This deals with section 17.7 of the specification too, because there are therefore no external entities and a DAV processor does not have to accommodate such things (or any internal entities other than the standard default ones, &amp; &lt;, ..., plus the encoding notation for Unicode characters).  [You might want to review the XML 1.1 Working Draft in this regard too, just to see if there is anything to anticipate here.]

	3.6.3 Although a Document Type Declaration might be prohibited, it is still possible to describe what the validity is expected to be if a DAV XML 1.0 document were submitted to an XML validator with a DTD constructed according to some prescribed scheme.  There is a hack for this by treating the DAV XML 1.0 document, separated out into a file with its XML declaration, as an external entity of a tiny document that validates the DAV XML 1.0 document by inclusion.  I have used that to apply different XSL transformations to the same XML document without modifying the XML document to add XSL processing instructions.
	I know I saw this trick demonstrated in a W3C document, but I have been unsuccessful finding it again.  (I remember the trick, just not the source.)

OK, end of brain dump.


Relax.  My next class starts on October 16 and I will quiet down for another 8 weeks ...

-- Dennis

-----Original Message-----
From: Julian Reschke [mailto:julian.reschke@gmx.de]
Sent: Friday, October 10, 2003 02:44
To: dennis.hamilton@acm.org; Lisa Dusseault; 'Julian Reschke'; 'Geoffrey
M Clemm'; w3c-dist-auth@w3.org
Subject: RE: How to use DTDs, or not to (was: RE: ACL and lockdiscovery)

> From: w3c-dist-auth-request@w3.org
> [mailto:w3c-dist-auth-request@w3.org]On Behalf Of Dennis E. Hamilton
> Sent: Friday, October 10, 2003 12:01 AM
> To: Lisa Dusseault; 'Julian Reschke'; 'Geoffrey M Clemm';
> w3c-dist-auth@w3.org
> Subject: RE: How to use DTDs, or not to (was: RE: ACL and lockdiscovery)

Hi Dennis,

just a few line-comments.

> ...
> 	(a) the ANY case in an element markup declaration means any
> element that is declared in the DTD, not some undeclared or
> unspecified element.
> ...

Thanks for the reminder. I had forgotten about that. It basically means that
by definition you can't express the content model of <prop> using a DTD.

> ...
> I notice that "DAV:" is referred to as the namespace, when,
> according to the Namespace specification and other specifications
> since (such as XML Schema), it is technically a prefix.  That is,
> attribute
> 	xmlns:DAV="some-URI-for-the-WebDAV-namespace"
> establishes the "DAV:" prefix for QNames that refer to local
> names of the WebDAV namespace.
> ...

No, no. "DAV:" is indeed the namespace URI for WebDAV. There are some cases
where it's *also* used as prefix, but that's just a coincidence.

> ...

Regards, Julian
Received on Friday, 10 October 2003 16:45:49 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 18:44:05 GMT