Re: reference not terminated by REFC delimiter

On Sun, 11 Jan 2004, Bjoern Hoehrmann wrote:

>   a XHTML document containing <p>&amp</p> is not well-formed but the
> Validator even claims it is valid. Ok, it often does so, but for this
> particular case both the :80 and the :8001 emit a "reference not
> terminated by REFC delimiter" warning. It should not be too hard to emit
> an error instead for XHTML documents.

I wonder what the formal status of the requirement that a reference be
terminated by REFC really is. Certainly SGML-based HTML allows its
omission in many cases, like the above. There is no note of any change in
this respect in the informative section "4. Differences with HTML 4"
of the XHTML 1.0 specification. This seems to be a defect in the
specification, since the XML specification makes it clear that the
semicolon is required:
http://www.w3.org/TR/REC-xml#sec-references

On the other hand, if we play the game that XML is a profile of SGML
(as kludged by the Web adaptations annex), then it seems that
<p>&amp</p> is valid (naturally assuming a suitable DTD).
Or can someone find a clause in the SGML standard that makes
REFC omissibility dependent on some expression in an SGML declaration
and a clause in the XML specification which uses such an expression
to make semicolons mandatory?

Actually, only now do I realize that such a game is a losing proposition.
All the talk about XML as a "subset" or "profile" of SGML had confused me.
XML and SGML are logically independent of each other (though naturally
related historically and conceptually). If an SGML validator can be
tuned to act as an XML validator depends on the nature of the differences
and the implementation techniques of the validator, but since the REFC
issue is so common in practice, yet not properly handled in the W3C markup
validator, then the question arises: has anyone studied systematically
what else there is in XML validation that needs to be handled differently
from SGML validation?

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Received on Monday, 12 January 2004 06:58:34 UTC