Re: ERB decisions on the LINKTYPE proposal from Steve Pepper on 1997-03-03 (w3c-sgml-wg@w3.org from March 1997)

From: Steve Pepper <pepper@FALCH.NO>
Date: Mon, 3 Mar 1997 19:57:31 +0100
To: w3c-sgml-wg@w3.org
Message-Id: <3.0.1.32.19970303195723.00690ca4@falch.no>
Thanks to Jon and Tim for providing some rationale for the ERB decisions.

The arguments against using LINK syntax seem to be falling into a number
of distinct categories:

1) Malediction
2) Confusion caused by the very word LINK
3) "Nobody understands it"
4) "LINK can't do the job anyway"
5) Verbosity and lack of clarity of the syntax
6) Difficulty of implementation
7) "LINK is unnecessary if WG8 gives us multiple attlists"


1) Malediction
--------------
Although this is currently the largest category, I sincerely hope we all
agree that we should be looking for the reasons behind LINK's bad image,
rather than indulging in name calling.


2) Confusion caused by the very word LINK
-----------------------------------------
David, Jon and Tim have all raised this issue, and it is quite pertinent.
I have already considered it and explained my solution in my reply to
David. Just as a reminder:

   <!DOCTYPE  tei.2 public "-//TEI//DTD P3//EN">
   <!PROCSPEC xml-proc tei.2 #IMPLIED [
   <!ATTLIST  xref
              xml-link CDATA #fixed "xml-tlink">
   <!PROCDEF #INITIAL xref>
   ]>

I am in other words proposing that we exploit the fact that XML uses a
fixed SGML declaration to change the reserved names LINKTYPE and LINK to
PROCSPEC and PROCDEF respectively. An XML document will then consist of
an (optional) document type definition, an (optional) processing
specification, and the instance itself.


3) "Nobody understands it"
--------------------------
This is the key argument, I suppose. If as Tim says "only 17 people in
the world understand LINK", I see that we have an uphill battle on our
hands. The first question is, is this WG prepared to _try_ to understand
it before rejecting it? The next question is, is it the _concepts_ that
are problematic or the _syntax_? (Another interesting question is whether
those that oppose my proposal number themselves among the 17...)

I don't want to waste people's time if only Sam Hunting, one (unnamed) ERB
member and I think this is worth pursuing. If there ARE others, I would
like to ask them to show their hands; otherwise I will shut up and go back
into hibernation!


4) "LINK can't do the job anyway"
---------------------------------
This was Martin Bryan's argument and it is off the mark for the simple
reason that the LINK-based solution is intended to solve the problem at the
document type (or element type) level, not at the level of the individual
elements. (That is why my example uses a FIXED attribute -- as did Steve
DeRose's original example of the so-called "ideal" solution.)

Now, I appreciate that there will be a need to specify values for xml-link
attributes at the individual element level in some documents, in which case
we are no longer talking about algorithmically associating processing
information with structure (i.e. LINK). But I still contend that there is
and will continue to be a very important class of documents for which useful
XML functionality can be added at the element type level. Examples of this
could be something like Jon's Solaris documentation and some of the vast
corpora of TEI based information.


5) Verbosity and lack of clarity of the syntax
----------------------------------------------
Jon talks about "voodoo", "ISO obfuscation at its worst", "gibberish"
and "apparent nonsense" because of the two lines

    <!LINKTYPE xml-link tei.2 #IMPLIED [
and
    <!LINK #INITIAL xref>

I submit that we have already accepted at least as much "gibberish" in
order to keep XML SGML-compliant. Take for example

   <!DOCTYPE tei.2 ...

Totally unnecessary if you ask me. XML only permits one document type
declaration, and the document element is known as soon as we hit the first
start-tag (because XML doesn't allow tag omission). So all that is needed
is the element and entity declarations at the head of the document, in the
manner of #DEFINEs and #INCLUDEs. (Now wouldn't that appeal to Jon's
"highly competent computer scientists"!)

The same goes for the <!ATTLIST gibberish and the requirement to specify
the generic identifier in an attribute definition list declaration. Since
XML doesn't allow name groups instead of generic identifiers, why not just
put the attribute definition list inside the element declaration and save
ourselves a few extra syntax tokens?

I am being facetious, of course. The serious point I want to make is that
I support this project because I care about SGML. I am willing to take the
trouble to explain why specifying a reduced subset of a powerful language
requires carrying a little extra baggage, and I emphatically do *not*
accept epithets like "voodoo" and "ISO obfuscation".


6) Difficulty of implementation
-------------------------------
David drew attention to the fact that link attributes have their own name
space. I am not enough of a computer scientist to know how big a problem
this is, but I suspect that it is exaggerated in this case. Perhaps someone
more knowledgeable could provide some insight?


7) "LINK is unnecessary if WG8 gives us multiple attlists"
----------------------------------------------------------
I do not believe this is the case. (So Jon is wrong to impute that I agree
with the ERB that multiple attlists are the ideal solution -- I gave the
word "ideal" in inverted commas in my posting.)

Why? Well, that isn't easy to explain to anyone who has not "understood" the
basic point of LINK: That it is a smart idea to separate the specification
of the structural relationships (the DTD) from the specification of the
structure-related processing to be performed for a particular purpose (the
LPD).

Yesterday [1] I wanted to send my SGML document to an ICADD-aware processor.
To do so, I had to add a whole bunch of fixed attribute declarations to my
DTD. Today I want to send it to an XML processor and I am being told that
I must pile in yet more fixed attributes in order to accomplish this.

Tomorrow and in the future I will think of ever new ways to process my
information: I don't want to have to revise the DTD every time I do this.

Even if it were *my* DTD, I wouldn't want to overburden it with
ever-increasing numbers of fixed attributes every time a new form of
processing turned up. What I *would* like to do is express that processing
information in a modular and extensible way and "plug in" the relevant spec
at processing time. That is what LINK does. Nothing more and nothing less.

That is the general argument for LINK. There is also a specifically
XML-related argument for (at least) a limited subset:

Assuming WG8 gives us multiple attlists, people will start using these in
their internal subsets. They will soon tire of adding these declarations
to every single document. There will be a tendency to build them into the
DTD itself -- maybe not even as separate declarations, but as additional
attribute definition lists inside the main attlist declaration for the
element type in question: XML-processing attributes along with the more
general structural attributes.

Then one day, along comes the opportunity to deliver the document to some
XML processor that doesn't require the whole DTD, just a well-formed XML
document -- and, of course, the XML-processing attributes. They then have
to go back to the DTD and extricate the XML-related attributes. If they are
wise, they will put them in a separate entity which is referenced in the
DTD but also available for direct inclusion in well-formed documents. At
this point they will have discovered the advantage of separating structure
from processing. They will have implemented LINK, albeit in an uncontrolled
and informal way.

How much easier if the formal distinction between structure and processing
had been built into the XML spec from the word go, so that they could arrive
directly at the correct solution instead of beating around the bush!

Then they would have the flexibility to deliver:

  - just the well-formed instance (WFI),
  - the WFI and the DTD,
  - the WFI and the XML processing specification (LPD)
  - the WFI, the DTD *and* the LPD

depending on the requirements of the processor.

And they would have the freedom to define different XML processing specs
for the _same_document_type_ and plug them in and out as needed.

For these reasons, I believe the LINK-based approach is *vastly superior*
to the multiple attlist approach.


Finally: A NEW PROPOSAL
-----------------------
So is there anything that can be done that will give us all this power,
and remove the syntax-related objections raised by Jon and Tim? I think
there is:

Instead of lobbying WG8 for multiple attlists (OK, then: as well as doing
that), we lobby for a simplification of the LINK syntax that would allow
my example to be expressed as simply as follows:

   <!DOCTYPE  tei.2 public "-//TEI//DTD P3//EN">
   <!PROCSPEC xml-proc [
   <!ATTLIST  xref
              xml-link CDATA #fixed "xml-tlink">
   ]>

I believe this is possible. Would it bring anyone around?

Regards,

Steve

[1] http://www.falch.no/people/pepper/link.htm


--
Steve Pepper, SGML Architect, <pepper@falch.no>
Falch Infotek a.s, Postboks 130 Kalbakken, N-0902 Oslo, Norway
http://www.falch.no/  tel://+47 2290 2733  fax://+47 2290 2599
"Whirlwind Guide": http://www.falch.no/people/pepper/sgmltool/
Received on Monday, 3 March 1997 14:00:50 UTC