- From: Larry Masinter <LMM@acm.org>
- Date: Tue, 10 Dec 2002 14:32:04 -0800
- To: <www-tag@w3.org>
- Cc: <ietf-xml-use@imc.org>
In reply to: http://lists.w3.org/Archives/Public/www-tag/2002Dec/0119.html Since we spent some time on this topic when discussing the "XML Guidelines" document (http://www.imc.org/ietf-xml-use) I thought I would respond to the points in the XMLP WG response. I am not opposed to XMLP WG disallowing an "Internal Subset" in SOAP messages: I just think it would be good to be clearer about the justification. Also, if these are good syntactic restrictions for XMLP, then they are likely to be a good syntactic restrictions for other XML applications in protocols. Our concern wasn't that "there should be no subsets" as much as it is "there should not be widely varying subsets". So, if XMLP has good reason for syntactic restrictions, those restrictions should be documented separately, so that a separate class of XML processors that generate and consume the restricted class of XML documents could be more widely supported, not just in SOAP processors. I understand that the XMLP constraints only apply to the HTTP binding of SOAP, and that it is possible to define a different bindings with different properties. > ....... Doing general entity substitution beyond that mandated by > XML 1.0 (e.g. <) implies a degree of buffer management, often > data copying, etc. which can be a noticeable burden when going for > truly high performance. This performance effect has been reported > by workgroup members who are building high performance SOAP > implementations. We couldn't find a first-hand account of such performance effects on implementations that allow entity substitutions, in cases where such entity substitutions aren't used, and if you have them, it would be very helpful if you could share them. It was easy to imagine that there might be issues of code footprint, but not cases where there was actually a performance impact if entity definitions weren't actually used. Were there details with any of these reports of 'performance effects'? > Furthermore, a DTD in the Infoset would become another piece of the > message. We would have questions to answer: what are the rules for > relaying through an intermediary? It would be useful to define XMLP in terms of the 'canonical InfoSet': the Infoset of the RFC 3076 Canonical XML of the document. In particular, all entities are expanded and DTDs removed from the Canonical XML. > what are the rules for > relaying through an intermediary? If something comes into an intermediary > as an entity reference, must it go out as an entity reference? If that > header is removed by the intermediary, must one check whether it is the > last use of the entity and should the outbound DTD have the definition > removed? If intermediary processing is defined in terms of a canonical Infoset and not the concrete syntax, then there are no requirements that entity references be preserved. On the other hand, intermediaries might even introduce Internal Subset DTDs, e.g., when forwarding messages. > What does all this do to digital signatures? If we allowed an > internal subset, should we change our rules to allow attributes to be > defaulted? Digital signatures are defined in terms of canonical XML, which has all entities resolved and DTDs removed. > All of this is complication. It would seem less complicated to say "XML" than "XML except no Internal Subsert DTDs". The complexity is in making excpetions. > Security is another concern. Although we have not formally > demonstrated that XML with internal subset is less secure, several > members of the workgroup shared an intuition that entity > substitution, attribute defaulting, and other manipulation of the > message content was more likely to lead to security exposures, > denial of service attacks (e.g. the billion laughs entity attack), > etc. Any message from any unauthenticated source introduces the potential for a denial of service attack, merely from the possibilities of overly long URI paths, element names, attribute values, content, etc. When parsing any message from an unauthenticated source, it's necesasry to insure that parsing the message doesn't consume undue resources in the receiver. The parsing and substitution of entity definitions is just one of many such considerations. It isn't much more work to insure that there aren't a billion laughs as it is to insure that the URL with a billion hahas isn't being consumed. > Our reasons for disallowing reference to external DTDs were similar > to those given above for the internal subset. In addition, we felt > that it would not in general be appropriate to require a SOAP > processor to open a connection to the Web in order to retrieve > external DTDs. I think the external DTDs are a different consideration. I also wonder about the justification for forbidding processing instructions. Yes, they are ambiguous, because their scope is not defined. But it is the nature of processing instructions to have private semantics, and the ambiguity of scope is only one of many difficulties. Wouldn't it be sufficient to note that processing instructions are unreliable, should be ignored by receivers, may appear in XML messages, and should not be sent? By themselves they are harmless. "Again, we kept it simple by ruling them out." The philosophy of 'everything that is not mandatory is forbidden' doesn't actually make for robust protocol design or simplify the design. By making an exception to standard XML protocol, you make things more complicated. What is the complexity cost of receivers ignoring processing instructions vs. explicitly checking for them and disallowing them?
Received on Tuesday, 10 December 2002 17:32:35 UTC