Re: Shorthand for default attributes (was: Re: Whitespace) from David Durand on 1997-05-14 (w3c-sgml-wg@w3.org from May 1997)

From: David Durand <dgd@cs.bu.edu>
Date: Wed, 14 May 1997 17:46:00 -0500
To: w3c-sgml-wg@w3.org
Message-Id: <v03007801af9feb1757ab@[205.181.197.79]>
Paul Prescod has already reiterated my point about checking the back issues
of the list:

We have already faced and decided a number of the points you are attempting
to question. Frequently we have not chosen the solutions that I prefer, but
we have chosen _good compromises_ given the goals that Paul also referred
to.

At the least, to revisit these issues you should make sure that you are
attesting new arguments, or an application that will fail under the current
decisions. We spent upwards of 3 months of high traffic on whitespace, all
told, I think, so while it's a lot of work, no-one is going to spend much
time on it without some new information.

While I think that some of the goals of this group have forced us to
technically less-elegant solutions, they _are_ the goals, and we have all
bled to meet them with the best solutions possible.

Anyway, just for fun, I'll address a few of your comments.

At 9:19 PM +0200 5/14/97, Bert Bos wrote:
>Internal subsets are bad, for two reasons:
>
>  1. they are hard to parse (I guess the syntax can be changed, but the
>     second reason still remains:)

SGMl compatibility has decided the syntax, for good or ill.

>  2. they mix constraints on the format with shortcuts, defaults and
>     parsing instructions.

I'm not sure what you mean: they define attlists, entities, and elements.

You don't need to use them if you don't need any of the features a DTD
offers (validation of structure, attribute defaulting, attribute value
checking, external or general entities).

If you want a DTD, but don't need document-specific mods, you still don't
need internal subsets.
> I wrote earlier:
> > Yes, but it creates unbounded linear dependencies, forcing the parsing of
> > an entire document from the beginning, with all entitiy references
> > resolved. A State-independent solution allows "lazy" entity parsing, and
> > re-use of partial documents as well-formed XML fragments.
>
>True, in the worst case, but there are several arguments why this is
>not a big problem:
>
>  - The vast majority of documents is small, on the Web that is even
>    more true than elsewhere.

Bad argument for XML (SGML on the web), wh/ has specially tailored SGML
constraints so that large documents can be served and partially parsed, or
parsed on demand.

>  - When you parse backwards (up in the tree), you can stop as soon as
>    you find an appropriate definition. Especially in a large
>    document, that is likely to be well before you reach the root.

If you are looking for an implied attribute, you either need to see the
declaration, or scan _all the way to the beginning_. This is a fundamental
problem, as you can't tell _what_ attributes might have been omitted on an
element.

>  - You can arbitrarily limit namespaces by putting a !doctype
>    somewhere. When you go up the tree and reach a !doctype node, you
>    don't have to go further up in the tree (backwards in the
>    document), because this is a hard limimt on the
>    namespace. (Confession: my software currently doesn't implement it
>    this way... I'll fix it ASAP)

But this is not the way DOCTYPE works. We are SGML-compatible, not just
"inspired by SGML". If we were in the latter case, we could probably find a
more elegant solution, but that doesn't really signify...


>  - For many, if not most applications you'll need the full tree
>    anyway, or at least you'll need to know all the ancestors of an
>    element (the stack). This is true, e.g., of most TEI xpointers and
>    of CSS style sheets.
You can address within a well-formed resource, even if that is only a part
of the actual document...
The whole tree is great if you want it, but XML is carefully tailored so
that an entity's content can serve as a document at need, without worrying
about long range dependencies (even the DTD's dependencies can have only
_very limited_ effects).

>  - The vast majority of parser will parse from start to end. Parsing
>    from the middle out is hard, and people that can do it can also do
>    whatever is needed to find declarations. Indeed, I think that
>    applications that need parsing from the middle out will simply put
>    in their profile that defaulting is not allowed, or provide
>    mechanisms that limit the amount of backwards search to a fixed
>    length.

If you just get an entity from the middle of a document, you are parsing
"from the middle" without having to do right-to-left parsing. This still
cannot work in the presence of unbounded sequential dependencies.

>And the alternative isn't much better. Instead of parsing backwards,
>you have to go back to the start of the document and parse up to the
>first element.

not relevant.

> > Alex's point (1) is so decisive that no other arguments are needed.
>
>Which point is that?

I re-quote, again:
> >There are at least two problems with #current: (1) it is not
 > >hierarchical,
In  other words, the short for what I just said.

>I agree with you there, but there is a fallacy in calling them "PIs",
>since PIs are a term from SGML, and in SGML they are not targeted at
>SGML parsers, but at the applications built on top of the parsers.

XML PIs are as well (other than the ones in XML syntax, which would have
had their own syntax if SGML-compatibility were not a primary goal). I
think allowing appliocation specific markup is a mistake, but the majority
of the ERB decided that it meets a need that is more important than
avoiding the bad effects of PIs. I don't understand this need, but I can
accept it.

>You're defining XML, you need a widget to define something that is
>common to, and obligatory for all XML parsers. You can use whatever
>syntax you like. Who cares whether it looks like SGML or not?

We do.


>"Entitled to ignore" - that looks like a recipe for
>incompatibility. Parser X ignores them and parser Y doesn't: now my
>application that I developed on top of parser X suddenly stops working
>when I switch to parser Y...

If you Use PIs (other than the ones that are part of XML) you have
explciitly declared that you _don't care_ if the PI is processed by other
software. This is either because you know what software is using it, or
because you know that ignoring the information will not harm the
interpretation of the data outside the PI. Same as SGML... (funny)


  -- David

_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________
Received on Wednesday, 14 May 1997 17:46:48 UTC