Re:Can we be more concrete... from David G. Durand on 1997-01-02 (w3c-sgml-wg@w3.org from January 1997)

From: David G. Durand <dgd@cs.bu.edu>
Date: Thu, 2 Jan 1997 11:57:26 -0500
To: w3c-sgml-wg@www10.w3.org
Message-Id: <v02130502aef18c910ce4@[165.90.139.110]>
At 6:49 AM 1/2/97, Digitome Ltd. wrote:

>>We may yet have to bite the bullet and allow stylesheets within
>>documents: which is a very bad idea, I think.

    I think this is a bullet we can leave unbitten after all -- HTTP 1.1
and persistent connections  should be comonplace by the time XML is
actually deployed. So we should assume that HTTP GETs != HTTP session
establishments. This makes the issue of separate operations _much_ less of
a problem.

>Can you explain why [dgd: it's a bad idea] David?

    Well, it's a bit of a philosophical issue, so we should probably not
waste too much time here. The basic idea of SGML is to decouple processing
from document instances as completely as possible: if we put a stylesheet
directly into a document, we have recoupled processing and the document.
This means that at the least we have an implicit notion of _right_
stylesheet, since there is one distinguished stylesheet that has to be
explicitly overridden. This will wamr the cockles of the hearts of ad-men,
but acts to impair document re-use.

   If we are comitting to processing-time binding of processing
specifications, I'd rather not mix that with authoring-time binding of such
specs. There's also the more minor question of whether it will encourage a
presentational style of markup by contributing to a troff-macro mindset on
authors' parts.

    Basically, it offends my sense that architectural separations should be
a separate as possible, so that components are trivial to replace.

>>But I don't see how we can
>>require DTDs for linking, and claim to have eliminated them from XML -- at
>>least in the web context, linking is a hard requirement.
>
>(Trying to anticipate the though processes of non-SGML savvy developers:-)
>
>XML describes a tree structure.
>A sub-tree, when considered a root is a perfectly valid tree.
>
>Simple to understand stuff. "Technology FOO used to describe Technology FOO".
>Most developers grin when they read things like that. It occurs all over
>the place in software they use on a daily basis.

This is why we finally allowed delayed processing of entities. For
well-formed documents this is now true (as it is not, for SGML), and for
valid documents it is also true, though you can't validate when parsing a
subtree without additional knowledge of the subtree's position in the
larger structure.

>Going further:-
>
>.  A single XML tree can cleanly contain multiple trees. One for
>  "real" data. Others for data about data. Bits about bits (in
>Negroponte-speak)

My little discourse at the beginning expresses why I don't like this. It's
the mxing of document semantics and content with processing information
that bothers me.

>.  Hypertext info. (i.e. AF stuff) can be encoded in XML and housed in
>   a sub-tree. This sub-tree serves to express how the body sub tree is
>   decorated with attributes for Hypertext semantics.

   If we put only what I have called the "declarative" parts of hypertext
information in the document this might be acceptable, though is likely to
be confusing if we put some type information (attribute declarations) in
the DTD, and some in the instance (declarations about what they mean).

   Logically, we haven't violated declarative markup, and
processing-independent instances by this, though it it still messy.


>.  Rendering Style info can be encoded in a sub-tree.

I'd rather not. Style info is processing and not semantic, and so belongs
outside the document.

>.  Hitherto-unthought-of-killer-application semantics can be
>   encoded in a sub-tree.

Semantics maybe, processing specs, preferably not. But of course we are
here out of the standards realm and into opinion on _good_ styles of XML
authoring and application development.

>For *valid* as distinct from well-formed XML I have the option of achieving
>this via fixed attributes in the DTD as per HyTime approach.
>
>Pros:-
>        Single HTTP hit.
   HTTP 1.1 should fix this.
>        Developer friendly (IMO)
   I don't know that separate entities are actually that unfriendly.
Parsing with several special-purpose parsers is sometimes easier than
parsing the union of the languages...
>        No need for the DTD at Browser end in order to do Hypertext or any
>        other document architecture.
   True. But if we are only worried about rendering without a DTD,
>
>Cons:-
>        Duplication of hypertext info in every instance of a given AF set.

    Another advantage of loose coupling.

>        How is the hypertext sub-tree recognised. Reserved GI names?
    Reseved attributes, presumably. Using HT this way would also require a
DTD specially constructed not only to declare how links are, but also to
declare how the the descriptions of how links are tagged are tagged. This
could get confusing.

>        How do browsers distinguish between data and meta-data for rendering
>        purposes?

This is why splitting information between the DTD and the instance could be
annoying. Of course the stylesheet could describe the non-rendering of
meta-tags pretty easily, so it would not be a severe problem (though the
description might get confusing).

>Given that displaying a document with hypertext links is a "rendering" do
>we have to consider how Hypertext functionality will impact on rendering
>functionality? An XML savvy browser will receive its Hypertext meta-data
>somehow. It needs to know that it should not be rendered.

This is why the stylesheet may be the best place for both rendering _and_
linking behavior.

>
>Finally, now that I am loosing the run of myself completely, why not allow
>an XML encoding of the DTD to be an optional way of creating valid XML???

This was considered and abandoned as too incompatible with SGML. It would
have made adding meta-information easier in some ways, but this is not a
decision that is likely to be revisited.

  -- David


I am not a number. I am an undefined character.
_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________
Received on Thursday, 2 January 1997 11:50:49 UTC