Re: XHTML character entity support

On Tue, Nov 3, 2009 at 8:43 AM, Henri Sivonen <hsivonen@iki.fi> wrote:
> On Nov 3, 2009, at 16:32, Simon Pieters wrote:
>
>>> Do you have a reference in the XML
>>> specification that provides support for your contention?
>>
>> "If the entity is external, and the processor is not attempting to
>> validate the XML document, the processor MAY, but need not, include the
>> entity's replacement text. If a non-validating processor does not include
>> the replacement text, it MUST inform the application that it recognized, but
>> did not read, the entity."
>
> And Opera then renders &, the entity name and ; in response to the XML
> Processor informing it about the skipped entity.
>
>> The point is then reiterated twice:
>>
>> "Note that non-validating processors are not obligated to read and process
>> entity declarations occurring in parameter entities or in the external
>> subset; for such documents, the rule that an entity must be declared is a
>> well-formedness constraint only if standalone='yes'."
>>
>> "Certain well-formedness errors, specifically those that require reading
>> external entities, may fail to be detected by a non-validating processor.
>> Examples include the constraints entitled Entity Declared, ..."
>
> What happens in Gecko is that the entity resolver feed expat a zero-length
> stream. Hence, expat *thinks* it hasn't skipped any external entity.
> Therefore, it halts due to the "Entity Declared" WFC, since it can't claim
> not having processed the external entities.
>
> Both the XML Processor in Opera and the XML Processor in Gecko do the right
> thing per XML. The XML Processor in Gecko has been fooled into processing a
> zero-length stream. The XML Processor in Opera knows it has skipped an
> external entity.
>
> One might argue that Gecko's entity resolver is bogus, but the XML Processor
> isn't.
>
>> On Tue, 03 Nov 2009 15:02:04 +0100, Shelley Powers
>> <shelley.just@gmail.com> wrote:
>>
>>> Oops, again. Opera does generate an XML parsing failure when it comes
>>> across an undefined entity when using the XHTML5 doctype.
>>
>> There is no "XHTML5 doctype". Any or no doctype can be used in XHTML5 and
>> the spec does not give a preference.
>>
>> The rule only applies for entities that are declared in the external
>> subset.
>
> When no external entity is referenced, even an XML parser that skips
> external entities knows it has skipped none. Therefore, it has to report the
> reference to an undeclared entity and cannot appeal to having skipped an
> external entity.
>
> --
> Henri Sivonen
> hsivonen@iki.fi
> http://hsivonen.iki.fi/
>
>
>


I'm going to focus on the relevant part of this discussion to the HTML
WG: there are rules defined for how undefined entities are handled,
and these rules defined in the XML specification. There may be some
issues of interpretation, but such issues are specific to the XML
spec, not the HTML5 spec.

As such, no further explanations or additional specifications are
necessary in HTML5.

Am I correct in this?

Shelley

Received on Tuesday, 3 November 2009 20:07:14 UTC