W3C home > Mailing lists > Public > www-validator@w3.org > July 2002

XML vs. SGML and REFEND REFC (was: URI and escape mechanism)

From: Terje Bless <link@pobox.com>
Date: Thu, 11 Jul 2002 07:53:16 +0200
To: W3C Validator <www-validator@w3.org>
cc: Nick Kew <nick@webthing.com>, Karl Dubost <karl@w3.org>, Masayasu Ishikawa <mimasa@w3.org>
Message-ID: <r01050300-1015-7770404D949311D68A3000039300CF5C@[192.168.1.7]>

Karl Dubost <karl@w3.org> wrote:

>At 0:47 +0100 2002-07-11, Nick Kew wrote:
>>On Wed, 10 Jul 2002, Karl Dubost wrote:
>>http://www.example.org/check?uri=http://www.example.net/path/to/
>>yourfile.html&lang=en
>>>
>>>If we escape "&" only the validator will be fine... but the RFC seems
>>>to say you have to escape also the "/"
>>
>>The validator is validating markup.  The fact that "/" is reserved in
>>URI (perfectly legal, but reserved) has no bearing on the validity of
>>the HTML.
>
>Except that the validator on port 8001 do not react on a non-escape
>ampersand.

Oh this is just _perfect_! :-)   (bear with me for a moment)


Which non-escaped ampersand was that? I see no non-escaped ampersand in
that file. :-)

In fact, the only error I can find on that page -- well, a warning not an
error actually -- is "reference not terminated by REFC delimiter"[0]. You
didn't forget to escape the ampersand; what you did do was forget to
properly terminate your entity reference to the &lang; entity!

>From REC-xhtml1-20000126/xhtml-symbol.ent:

<!ENTITY lang "&#9001;"> <!-- left-pointing angle bracket
                              = bra, U+2329 ISOtech -->

So as you can see, the Validator is doing exactly as it should.


Now aren't you happy that NCSA came up with "&" as the CGI parameter
delimiter? (I'd make snide comments about the SGML "WG" and the XML WG's
eagerness to disown any SGML heritage, but I suspect they have their asses
well covered on this issue; cf. the messages from Martin Bryan, Tim Bray,
and Len Bullard[1]). :-)


Any real SGML-heads reading this list? Mimasa? Can we "cheat" and make REFC
required to catch these mistakes that way? Anyone know the right magic
pixie dust to sprinkle on xml1.dcl to make it so? Can anyone see any
unintended consequences of doing that (IOW "will it break anything we don't
want broken")?

Is it sufficient to set "OPTIONS CONTENT REFEND REFC" in the SGML
Declaration[2]? Or does that just exclude "RE" from the list of possible
entity reference terminators?

Or is this a bug in SP's XML support? The (Informative) Annex L to ISO 8879
describes the XML Requirement for a REFC as an addititional requirement to
those allready defined in Annex K (WebSGML). Parhaps the right solution is
to hack up SP to promote this "-wrefc" (implicit in -wxml) to an error
instead of a warning? If so, I'd like some kind of authorative reference to
cite in the CVS checking message / SF bug entry. :-)



[0] - "REFC" being defined as ";" in the SGML Declaration for XML.

[1] - <http://lists.w3.org/Archives/Public/w3c-sgml-wg/1996Sep/0042.html>

[2] - "Annex L: Application Requirements for XML"
      "ISO 8879//NOTATION Application Requirements for XML//EN"
      <http://www.y12.doe.gov/sgml/wg8/document/97alex/wsgml-L1.htm>
-- 
We've gotten to a point where a human-readable,  human-editable text format
forstructured data has become a complex nightmare where somebody can safely
say "As many threads on xml-dev have shown, text-based processing of XML is
hazardous at best" and be perfectly valid in saying it.     -- Tom Bradford
Received on Thursday, 11 July 2002 02:00:35 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 25 April 2012 12:14:03 GMT