Re: XML and EDI from Steven R. Newcomb on 1997-06-05 (w3c-sgml-wg@w3.org from June 1997)

From: Steven R. Newcomb <srn@techno.com>
Date: Thu, 5 Jun 1997 19:47:58 -0400
To: w3c-sgml-wg@w3.org
Message-Id: <199706052347.TAA02498@bruno.techno.com>
> I think you need to give some more concrete examples than "wow" :-)

OK.

For a document to declare that it inherits the characteristics of a
DTD (an architecture) is very, very similar to a <!DOCTYPE
declaration.  (In fact, that's exactly what a doctype declaration
does.)

A doctype declaration determines the DTD in effect, which governs the
element structure as evidenced by the generic identifiers of the
elements.  In contrast, an *inherited architecture* does not necessarily
constrain the structure of every element in the document; it only
constrains the structure of those elements that have an attribute
(usually named after the inherited architecture) whose value is the
generic identifier that would be used by an element in a document that
is actually governed (in the normal way) by the inherited DTD.  For
example, let's say we're writing an XML document that has no DTD, but
we want some of the elements to be processed exactly as if they were
certain well-known element types found in the HTML architecture (i.e.,
the HTML DTD).  According to ISO/IEC 10744 TC1 Annex A.3 (a draft
version can be found at
ftp://ftp.ornl.gov/pub/sgml/wg8/hytime/1920.sgm [search for "A.3"]),
you can declare the HTML architecture as a notation, and as the value
of the "ArcFormA" attribute of that notation, you give the name of the
attribute used to give the "architectural form name" (the element type
in the HTML architecture) to which you intend that element to conform.
I'm leaving out the details, but here's the gist:

<!NOTATION HTML30 ....>
<!ATTLIST #NOTATION HTML30 ...
    ArcFormA NAME html ...>

and in the document, the element

<foo html=a zorp=blah url="http://www.motherhood.org">

can now be recognized and processed as an HTML <a> element, because
html=a, even though its GI is not "a" and it has a "zorp" attribute
that is extraneous as far as the HTML architecture is concerned.  The
fact that an inherited architectural form can be further specialized
(perhaps by adding a zorp attribute) is why an inherited architecture
is called an "enabling" architecture, and, by contrast, the DTD
referenced by a doctype declaration is called a "constraining"
architecture.  (You can't necessarily tell, by looking at a DTD,
whether it is intended to be used as a constraining architecture, an
enabling architecture, or both.  Any DTD can be used as an enabling
architecture, as well as a constraining (<!DOCTYPE...) one.)

In order to allow enabling architectures to be declared, notation
attributes are needed to specify how the mapping from the inherited
architecture to this document's elements is specified.  (And for
several other reasons having to do with the housekeeping necessary to
facilitate inheritance and multiple inheritance, any one of which
would be sufficient justification for the use of notation attributes.)

I say again: XML needs notation attributes.  The importance of having
this feature, in terms of the ultimate usefulness of XML, can hardly
be overstated.  

---------------------

> I can conceive of a world in which object inheritance is performed without
> SGML notation data attributes, and I have no difficulty in doing so.

But there is a standard way of doing it, and there is no reason to do
it some other way.  More to the point, for an inheriting document or
architecture to characterize an inherited architecture as a notation
is simply good SGML.

> ...explain exactly why this cannot be done with CDATA attributes or
> with processing instructions or in some other way.

You can do anything with anything.  It's a good idea to consider
carefully the advantages and disadvantages of all ways of doing it.
That work has already been done by some pretty good minds.  You owe
it to yourself to take a look at what they've done.

> > The usefulness of inheritance for all kinds of purposes (and not least
> > for EDI) is too great to ignore; it is one of the most useful and
> > attractive aspects of SGML.

> Well, that's very subjective.  I for one would strongly disagree.  I'd
> happily get by without them.  In fact, I generally do.  So do most of
> our customers.

I think you'd find it hard to justify to your customers an attitude
that object-oriented information management, with the possibility of
inheriting the semantics of multiple architectures (each of which may
already be implemented in reusable software), is something they should
continue to "happily get by without".

> Remember, we are not going for the high-end complex SGML applications here.
> We are going for pervasive, easy to implement, easy to understand and
> easy to describe.

Well..., I don't know about that.  Is it really so hard for people to
understand that they can have rigorous no-fail interchangeability AND
arbitrary local extensions in their information architectures, and
that they may save themselves a lot of software money and
implementation headaches by using inheritance?  Maybe so.  I think
not.  I'll wager that if you ask EDI people what's the biggest problem
with EDI, they'll tell you "inflexibility."  If you ask most of the
people who have ever had to write SGML documents to a DTD that they
didn't control what they hated most about SGML, it's always the DTD,
and the rigidity that it forced upon them in the name of information
interchange.  With inherited ("enabling") architectures, you can have
it both ways: perfect interchangeability as well as flexibility (read:
locally controlled mods, extensions, and even whole DTDs).  XML made
the DTD optional, but that's the easy part.  On what basis will
interchange occur?

> > There is no good reason not to do it in XML.

> It makes the specification longer.  It adds cost and complexity to
> implementations.  It brings with it a whole new set of convepts to explain
> and understand.  And it is, as far as I can tell, totally unnecessary.
> There is no reason _to_ implement notation data attributes.

I guess that's true, if you take the position that inheritance is
useless and inexplicable.  If, however, you take the position that
inheritance is mighty important, as I believe, then you have to decide
how you're going to interchange information about inheritance.
Regardless of whether you use notation data attributes or some other
mechanism, it will add cost and complexity, and you'll have to explain
and understand it, too.

> > For a discussion of why architectural inheritability is overwhelmingly
> > important, you may want to read my (now slightly dated) paper, "SGML
> > Architectures: Implications and Opportunities for Industry" at
> > http://www.techno.com/sgmlarchitecture.html.
> 
> This paper does not mention using notation data attributes

That's right.  That paper says nothing about how a document declares
the fact that it inherits the syntactical and semantic features of
another DTD.  As I said, it discusses reasons why the notion of
inheritance is overwhelmingly important.

Best regards,

--Steve

             Steven R. Newcomb   President
         voice +1 716 271 0796   TechnoTeacher, Inc.
           fax +1 716 271 0129   (courier: 23-2 Clover Park,
      Internet: srn@techno.com    Rochester NY 14618)
           FTP: ftp.techno.com   P.O. Box 23795
    WWW: http://www.techno.com   Rochester, NY 14692-3795 USA
Received on Thursday, 5 June 1997 19:52:44 UTC