W3C home > Mailing lists > Public > xml-dist-app@w3.org > December 2002

RE: Closing XML Protocol Last Call issue 395

From: <noah_mendelsohn@us.ibm.com>
Date: Thu, 5 Dec 2002 18:16:24 -0500
To: "Martin Gudgin" <mgudgin@microsoft.com>
Cc: henrikn@microsoft.com, xml-dist-app@w3.org
Message-ID: <OFA4D85264.8FC3A92E-ON85256C86.007F35F7@lotus.com>

Right, that's the question.  The quote from section 2.8 is:

"If the XML document has a document type declaration, then the information 
set contains a single document type declaration information item. Note 
that entities and notations are provided as properties of the document 
information item, not the document type declaration information item.

A document type declaration information item has the following properties:

* [system identifier] The system identifier of the external subset, as it 
appears in the DOCTYPE declaration, without any additional URI escaping 
applied by the processor. If there is no external subset this property has 
no value. 
* [public identifier] The public identifier of the external subset, 
normalized as described in 4.2.2 External Entities [XML]. If there is no 
external subset or if it has no public identifier, this property has no 
* [children] An ordered list of processing instruction information items 
representing processing instructions appearing in the DTD, in the original 
document order. Items from the internal DTD subset appear before those in 
the external subset. 
* [parent] The document information item."

So, in the case of an internal subset that defines entities, and nothing 
else, we would get a mandatory document type declaration info item, but 
with no values for any properties but [parent]? 

Seems a bit strange to me, but I agree it could be read that way.  If so, 
I suppose there is no issue.  I think what's making me nervous is that I 
can't find an info item anywhere for parsed entities.  That's what leads 
me to feel that you can't quite tell from the Infoset whether they are 
there or not, and therefore whether a serialization might not include them 
after all.   I read you to say:  right, you can't tell much about the 
entities, element declarations, etc., but the absence of any document type 
declaration info item does let you infer that there were none.  As I say, 
these seems strange, since the whole drift of the infoset design seems to 
be to not tell you whether they were there.  On the other hand, if 
everyone reads it that way, I suppose I can go along. 
On balance, I would prefer the clarification, if only in a note.  If we've 
had to do this level of reasoning to prove that <!DOCTYPE > can't go in 
the serialization, I fear that others may not see it that way either. 

Related question that I raised before:  I think we're all agreed that we 
intend receipt of <!DOCTYPE to result in an env:SENDER error.  Where do we 
say whether that error is a MAY/MUST/SHOULD.  I think it should be a MUST 
fault, as the message received is incoherent and known to be buggy.  Where 
do we indicate that the error MUST be generated?  Thanks.

Noah Mendelsohn                              Voice: 1-617-693-4036
IBM Corporation                                Fax: 1-617-693-8676
One Rogers Street
Cambridge, MA 02142

"Martin Gudgin" <mgudgin@microsoft.com>
12/05/02 05:07 PM

        To:     <noah_mendelsohn@us.ibm.com>
        cc:     "Henrik Frystyk Nielsen" <henrikn@microsoft.com>, <xml-dist-app@w3.org>
        Subject:        RE: Closing XML Protocol Last Call issue 395

So the question really is this:

Does <!DOCTYPE soap:Envelope [ <!-- entitity decls here --> ]> ( i.e.
JUST an internal subset ) result in a a Document Type Declaration
Information Item appearing at the infoset level? My reading of Section
2.8 the infoset spec says Yes.


> -----Original Message-----
> From: noah_mendelsohn@us.ibm.com [mailto:noah_mendelsohn@us.ibm.com] 
> Sent: 05 December 2002 11:44
> To: Martin Gudgin
> Cc: Henrik Frystyk Nielsen; xml-dist-app@w3.org
> Subject: RE: Closing XML Protocol Last Call issue 395
> Gudge writes:
> > I'm not sure why this issues revolves around the
> > internal subset. We explicitly prohibit the Document
> > Type Declaration Information Item from appearing.
> So far, so good.  We agree.
> >> If there is no DTD then there is no internal
> >> or external subset. 
> Let's be a little careful.  Our infosets are synthetic.  They 
> come before 
> the lexical form is even considered.  Clearly we disallow the 
> info item. 
> What this means for any possible serialization in any 
> possible binding is 
> unclear.
> > Lexically one cannot have <!DOCTYPE ... in a SOAP message.
> Now we're talking about something binding specific.  Assume 
> we're talking 
> about >the< SOAP HTTP binding. 
> >> The only parts of the DTD that are reflected
> >> in the infoset are unparsed entities, notations 
> >> and PIs appearing the in DTD.
> Right, so if I had a lexical form with an internal subset declaring a 
> parsed entity, then that would not show up in the Infoset 
> when I parsed 
> the document.  I couldn't tell that there had been an 
> internal or external 
> subset.
> Now, go the other way.  We say in the HTTP binding that we want 
> (indirectly through RFC 3203) the XML 1.x serialization of 
> the infoset. 
> But if what I say in the para above is right (and I'm not 
> sure about it), 
> that's ambiguous.  There are at lexical forms with internal 
> subset that 
> correspond to the Infoset that has no DTD information item. 
> That is the 
> source of my concern.  If there is even a hint of this 
> ambiguity, I think 
> our binding (or the RFC if appropriate) needs to say explicitly: 
> "<!DOCTYPE ... > MUST NOT appear."
> I feel like I may be confused, but in the meantime, I remain 
> concerned 
> that there is an ambiguity.  If someone sent an instance with 
> internal 
> subset, but that parsed into an Infoset with no Doctype Info 
> Item, I'd not 
> sure where I'd point in the spec to say "you broke the 
> rules."  What am I 
> missing?  Thanks.
> ------------------------------------------------------------------
> Noah Mendelsohn                              Voice: 1-617-693-4036
> IBM Corporation                                Fax: 1-617-693-8676
> One Rogers Street
> Cambridge, MA 02142
> ------------------------------------------------------------------
Received on Thursday, 5 December 2002 18:18:35 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 22:01:22 UTC