Re: Shorthand for default attributes (was: Re: Whitespace) from Bert Bos on 1997-05-14 (w3c-sgml-wg@w3.org from May 1997)

From: Bert Bos <bbos@mygale.inria.fr>
Date: Wed, 14 May 1997 21:19:56 +0200 (MET DST)
To: w3c-sgml-wg@w3.org
Message-Id: <199705141919.VAA07984@mygale.inria.fr>
David Durand writes:
 > At 8:19 PM +0200 5/13/97, Bert Bos wrote:
 > >Alex Milowski writes:
 > > > Hmmm, looks similar to #CURRENT which is in SGML but not in XML.
 > > >
 > > > I have never found a *really good* reason for #CURRENT.  It would seem that
 > > > #CURRENT is also a "shorthand" for a container.
 > >
 > >There are at least two problems with #current: (1) it is not
 > >hierarchical, and (2) it is in the DTD and not in the document
 > >instance.
 > 
 > Both of these are critical.
 > 
 > > > We want parsing to be easy in XML--not harder because of special cases.  I
 > > > believe this is one of the reasons why #CURRENT was not included (among
 > >other
 > > > reasons that we don't need to go over again).
 > >
 > >I agree. It is easier to parse if there are no default attributes, and
 > >I like to keep the language as small as possible, but if there are
 > >many elements that need the same attribute value, then specifying that
 > >value in only one place is better in terms of document maintanance: it
 > >helps to avoid errors and is easier to change later.
 > 
 > That's why defaults are currently in. If you read the back-issues of the
 > list, you will see that we are waiting on ISO to allow separate ATTLIST
 > definitions, so that internal subsets can contain delcarations for default
 > Attribute values.

Internal subsets are bad, for two reasons:

  1. they are hard to parse (I guess the syntax can be changed, but the
     second reason still remains:)

  2. they mix constraints on the format with shortcuts, defaults and
     parsing instructions.

 > 
 > If not, a variety of other solutions have been floated. But it looks as
 > though ISO is oging to help us.
 > 
 > >
 > >The trade-off is whether such a mechanism will complicate the
 > >applications. I think declaring defaults for attributes locally, with
 > >scope until the end of the element in which they are declared, is not
 > >complicated to implement (it's a stack, and every parser needs a stack
 > >anyway), while providing a lot of benefit.
 > 
 > Yes, but it creates unbounded linear dependencies, forcing the parsing of
 > an entire document from the beginning, with all entitiy references
 > resolved. A State-independent solution allows "lazy" entity parsing, and
 > re-use of partial documents as well-formed XML fragments.

True, in the worst case, but there are several arguments why this is
not a big problem:

  - The vast majority of documents is small, on the Web that is even
    more true than elsewhere.

  - When you parse backwards (up in the tree), you can stop as soon as
    you find an appropriate definition. Especially in a large
    document, that is likely to be well before you reach the root.

  - You can arbitrarily limit namespaces by putting a !doctype
    somewhere. When you go up the tree and reach a !doctype node, you
    don't have to go further up in the tree (backwards in the
    document), because this is a hard limimt on the
    namespace. (Confession: my software currently doesn't implement it
    this way... I'll fix it ASAP)

  - For many, if not most applications you'll need the full tree
    anyway, or at least you'll need to know all the ancestors of an
    element (the stack). This is true, e.g., of most TEI xpointers and
    of CSS style sheets.

  - The vast majority of parser will parse from start to end. Parsing
    from the middle out is hard, and people that can do it can also do
    whatever is needed to find declarations. Indeed, I think that
    applications that need parsing from the middle out will simply put
    in their profile that defaulting is not allowed, or provide
    mechanisms that limit the amount of backwards search to a fixed
    length.

And the alternative isn't much better. Instead of parsing backwards,
you have to go back to the start of the document and parse up to the
first element.

 > 
 > Alex's point (1) is so decisive that no other arguments are needed.

Which point is that?

 > 
 > >I don't really mind the XML at the start, I was just trying to save a
 > >few bytes:-)
 > >
 > >And about your world: I haven't found any applications for PIs either,
 > >and I think that there aren't actually any worlds that need them.
 > 
 > PIs are allowed to users in XML (which I believe to be a mistake, but I'll
 > simply never use them, and progandize against their use, when I have a
 > chance). However, their use in XML itself has been very sensible: they
 > allow us to add the equivalent of new declarations and DTD features to XML
 > -- without having to use SGML-incompatible (or SGMl-invisible) syntax.

I agree with you there, but there is a fallacy in calling them "PIs",
since PIs are a term from SGML, and in SGML they are not targeted at
SGML parsers, but at the applications built on top of the parsers.

You're defining XML, you need a widget to define something that is
common to, and obligatory for all XML parsers. You can use whatever
syntax you like. Who cares whether it looks like SGML or not?

 > 
 > The use of namespaces for the PIs helps to control conflicts with the
 > "feature" of user-available PIs (which applications are _always_ entitled
 > to ignore, unlike the XML notations).

"Entitled to ignore" - that looks like a recipe for
incompatibility. Parser X ignores them and parser Y doesn't: now my
application that I developed on top of parser X suddenly stops working
when I switch to parser Y...

 > 
 > >PIs have been suggested for embedding stylistic information into
 > >documents in which people wanted to distinguish between stylistic and
 > >other information. But it turned out that style sheets are much
 > >better.
 > >
 > >They have been suggested for `meta-data', i.e., data *about* the
 > >document as opposed to the data contained in the document, but since
 > >PIs lack structure, that didn't work very well either. A link to
 > >external metadata, or even simply a <meta> element, is much better.
 > 
 > Since these straw men are not behind the syntax chosen, this is not an issues.
 > 
 > >
 > >As soon as you use PIs, you have to define a syntax for the string
 > >inside the PI. That's stupid, because XML already gives you a nice and
 > >flexible syntax, so why invent another?
 > We need a syntax distinct form tag syntax, because otherwise we are
 > imposing structure (and namespace pollution, and possibly unneeded
 > features) on user DTDs  -- whose applications are completely unpredictable
 > to us now.

Very true. (Except that the spec already does that, doesn't it, by
reserving the prefix "xml-" in attribute names. I don't complain: on
the contrary!)



Bert
-- 
  Bert Bos                                ( W 3 C ) http://www.w3.org/
  http://www.w3.org/pub/WWW/People/Bos/                      INRIA/W3C
  bert@w3.org                             2004 Rt des Lucioles / BP 93
  +33 4 93 65 77 71               06902 Sophia Antipolis Cedex, France
Received on Wednesday, 14 May 1997 15:20:14 UTC