- From: Giovanni Campagna <scampa.giovanni@gmail.com>
- Date: Mon, 25 May 2009 17:19:50 +0200
I really like this proposal, because entities are not the only thing you can do with DTDs. You have attribute tokenization and normalization, attribute defaulting, content models. In particular, people in this group often say that namespaces are difficult to use for authors. Given the appropriate DOCTYPE declaration (for example XHTML11 plus MathML 2 plus SVG11), namespaces and their attributes are no more a problem for authors. Secondly, attribute normalization at the language level should provide a consistent processing for special attributes (id and class in XHTML10/11). Further, content models could be used for warnings in the developer console (though probably XML schemas are better here) and surely could be used for better well-formedness error messages. Eg, un unclosed <img> tag would be reported immediately after the opening tag, and not at the location of the parent close tag. (This only applies if the XML fragment is not well-formed). On the other side, we have legacy XML content and the fact that many pages refer directly to W3C DTDs. Luckily, the XML specification has a feature to allow the page to indicate that external declarations are not needed: the "standalone" declaration. - standalone=yes means that no external subset is needed, nor are needed external entities. Processing of internal subsets stops at the first unread (external) parameter entity. General entity references (other than amp,gt,lt,quot,apos and those declared in the internal subset) are a well-formedness error. This is the minimum required behaviour of a non-validating parser. - standalone=no means that this document relies on external data, and cannot be processed without such data. All subsets must be read and processed (including attribute and element declarations) and all parameter entities resolved (either internal or external). External general entities referenced in the document are replaced with the appropriate content. - no standalone declaration could mean "standalone=yes" (not conforming with XML), "standalone=no" (not backward compatbile) or could mean a third way, such that only internal entities and entities with a known public identifier are used. The DOCTYPE is processed if and only if it is a known entity and there are no unread parameter entities in the internal subset. Entity retrival is based on the public identifier, if that is known to the application, or on the system identifier if "standalone=no".. Entities that cannot be retrieved (for network errors or unsupported/malformed IRIs) are kept with the EntityReference node in the DOM for general entities (this means that the ampersand followed by the entity name followed by a semicolon is rendered, as per XHTML1.0), and stop the processing of the DTD for parameter entities. This proposal should solve a lot of problems (shown above), allowing to uncover the full potential of XML1.0 while avoiding a DOS on w3.org and keeping existing content working. Giovanni
Received on Monday, 25 May 2009 08:19:50 UTC