Re: Whitespace

Tim Bray writes:
 > At 06:55 PM 5/7/97 CDT, Michael Sperberg-McQueen wrote:
 > >On Wed, 7 May 1997 06:41:14 -0400 Peter Murray-Rust said:
 > >>This is doubtless not news to any of you, but it's a shock to me, that
 > >>WF documents and validated documents ***GIVE DIFFERENT OUTPUT***.  I
 > >>am sure that this will be a rich source of confusion.
 > >
 > >Yes, it will.  What may not be obvious is that despite that confusion,
 > >this behavior really was the best available at the time.
 > 
 > Actually, it's worse than Peter thinks.  There are at least three ways
 > in which DTD-less and DTD-ful processing can produce different 
 > results:
 > 
 >  1. White space in element content

That is easy to fix by selecting a single whitespace handling method
in the XML profile for SGML. `Keep-all-whitespace' is ugly, but
workable; a better rule is be to simply ignore any newline directly
after a '>' or directly before a '<'. The important thing is that this
rule becomes part of the XML profile, and does not depend on the XML
document itself.

 >  2. Default attributes

The previous XML-lang draft had a handy macro <?xml default...?> that
I find very useful, expecially in dealing with XML-link, where a lot
of elements have fixed attributes.

Without it, a document like this

    <!doctype foo "foo">
    <foo/>

with this DTD

    <!element foo any>
    <!attlist foo att (def) def>

is not valid (for some definition of "valid"), since the DTD says that
the "att" attribute cannot be #implied. Note that it could be omitted
if this was SGML-1986, but in XML it cannot.

 >  3. Attribute values that are space/case normalized only if you
 >     read the DTD and know they are NMTOKEN or ID or something.

This is another thing that will have to be added to the XML profile
for SGML: all attributes are always treated as CDATA and never
normalized. NMTOKEN, NUMBER, etc. can still be used for validation,
but do not influence the parsing. I.e., in the XML datamodel the
attributes foo="7" and foo="07" are different, even though some
application may treat them the same.


Bert
-- 
  Bert Bos                                ( W 3 C ) http://www.w3.org/
  http://www.w3.org/pub/WWW/People/Bos/                      INRIA/W3C
  bert@w3.org                             2004 Rt des Lucioles / BP 93
  +33 4 93 65 77 71               06902 Sophia Antipolis Cedex, France

Received on Sunday, 11 May 1997 13:30:38 UTC