- From: Rick Jelliffe <ricko@allette.com.au>
- Date: Wed, 22 Oct 2003 19:51:09 +1000
- To: Chris Lilley <chris@w3.org>
- Cc: xml-editor@w3.org, W3c I18n Group <w3c-i18n-ig@w3.org>, w3c-xml-plenary@w3.org
Chris Lilley wrote: > In my view, adding another XML conformance level below well formed is > not an erratum. Its a major change to the language. Actually, the class "not well-formed but processed as well-formed anyway" exists in the XML Spec, because not all WF errors need be reported by a non-validating processor. Only validating parsers report all WF errors.** <GIST> I have thought through the comments that Chris (Noah and others) have raised, and gone through the spec again. I think that there is an alternative way to achieve the same effect I (and others such as Martin) think is good (that this should be handled by XML APIs transparent to application programmers), still limited to standard character names, but which also does not create an apparant new conformance level, nor change any definitions of WF and Valid, nor change the status of any document. How to do this impossible thing? Instead of the previous proposal, just append something like the following paragraph after the first paragraph of XML 4.4.3 Included If Validating http://www.w3.org/TR/REC-xml#include-if-valid "Applications *may* replace any Unexpanded Entity Reference Information Items[1] which have no replacement text or system identifier defined with the value of the ISO/HTML 4 or the ISO/MathML standard character entities of the same name." [1] http://www.w3.org/TR/xml-infoset/#infoitem.rse and then the XML WG should make it clear (when releasing the rationale for this) that the appropriate place for this to occur is at the SAX processor rather than on an application-by-application process. Specifications for applications such as XSLT and XML Schema may also specify that Unexpanded Entity Reference Items may/must be replaced with default values before schema-processing, to give added impetus. XML needs an erratum to make it clear that this is currently allowed because people including myself have not been clear on it and W3C applications might like to also put in similar errata. The aim is to encourage this into SAX, where it belongs. There is already text in 4.4.3 to clarify what applications may do, but this important case has left out and is currently causing confusion. </GIST> Currently people think they are banned from doing anything with an Unexpanded Entity Reference Information Item by Draconian considerations, whereas actually the XML Spec is silent. My mistake has been to conflate the "XML Processor" with "the thing that produces a SAX stream". We can keep the definition of XML the same, but get SAX parser to expand default entities, with the justification "this is application behaviour, but implemented tightly with the parser". Now expanding Unexpanded Entity Reference Information Item is definitely something that applications are allowed to do: "Browsers, for example, when encountering an external parsed entity reference, might choose to provide a visual indication of the entity's presence and retrieve it for display only on demand." http://www.w3.org/TR/REC-xml#include-if-valid But this does not cover the case of what an application can do when there is no replacement text or SYSTEM or PUBLIC identifier for an Unexpanded Entity Reference Information Item. So this is something that currently slips between the cracks of XML Infoset and the XML spec: the XML spec is not concerned with information but parsing, the infoset is not concerned with parsing. There is nothing in either specs that I can see that prevents an application from attempting to dereference Unexpanded Entity Reference Information Items using the standard entity sets. So, actually, this is something that, I guess, XML Schema and XSLT and XQuery etc could all specify independently. Or it could be made part of some notional layer between XML processing and the infoset. But I believe the simplest thing, and the thing that would make it available to the broadest range of XML users, would be to clarify that it is allowed so that SAX (in particular) can add the feature and we can all go back to our business. That XML APIs do not make this available currently shows that they have a mistaken view of what is required by an XML processor; mistakes in what is allowed by XML processors is grounds for an erratum. I guess this approach also moves a bit towards Michael's suggestion, in that is says the answer lies in something *after* XML proper but somehow before Schema processing. In practise, I think it is better to encourage this into generic SAX processors, though if the schema spec also makes it a requirement of schema processing, that does no harm that I can see (because the substitutions can occur as a layer before other schema processing, and specs such as XSLT 1 or Schematron that may want the effect without heavyweight schema-processing can get it.) As for the concern that it would be bad if some documents that were non-WF become WF, I think the rewording deals with that. There may also be some value in adopting a stripped down version of Richard's proposal, and reserve a special attribute such as @xmlEntityDefaulting="true" which allows a non-validating parser to perform this error recovery but makes other parsers barf (due to the attribute name starting with "xml") but I don't think it is needed. Background quotes from the XML Spec's Conformance section http://www.w3.org/TR/REC-xml#sec-conformance "The behavior of a validating XML processor is highly predictable; it must read every piece of a document and report all well-formedness and validity violations. Less is required of a non-validating processor; it need not read any part of the document other than the document entity. " and "For maximum reliability in interoperating between different XML processors, applications which use non-validating processors should not rely on any behaviors not required of such processors. Applications which require facilities such as the use of default attributes or internal entities which are declared in external entities should use validating XML processors." Cheers Rick Jelliffe ** See http://www.w3.org/TR/REC-xml#sec-conformance "Certain well-formedness errors, specifically those that require reading external entities, may not be detected by a non-validating processor. Examples include the constraints entitled Entity Declared, Parsed Entity, and No Recursion, as well as some of the cases described as forbidden in 4.4 XML Processor Treatment of Entities and References."
Received on Wednesday, 22 October 2003 05:51:14 UTC