RE: on "Versioning XML Languages"

I'd also want to add my voice to the request that the XML Schema specific bits be removed from any documents on versioning XML vocabularies produced by the TAG. The main problem I have with doing this is that W3C XML Schema does not directly support the most common way XML vocabularies are versioned in the wild (change the version attribute & stick in more elements & attributes) without seemingly absurd contortions and limitations. 
 
I'd rather not see a TAG document on versioning contain rationalizations of what are basically design flaws in W3C XML Schema nor would it make sense to promote them as good architecture for the WWW. 

________________________________

From: www-tag-request@w3.org on behalf of Dan Connolly
Sent: Fri 10/17/2003 5:44 AM
To: www-tag@w3.org
Subject: on "Versioning XML Languages"




My comments on

http://www.w3.org/2001/tag/doc/versioning.html
 of 18 Sep 2003

Summary: The enumeration of strategies in section 2 Versioning
Strategies is good and important stuff, but the thesis of
the finding is either buried or off the mark, the boxed
points are insufficiently justified, and it needs a
lot of editorial work (terminology is loose, items
in references section aren't cited in the body, etc.).

caveat: I didn't read the whole thing; I sorta lost
the story line around section 5 or 6. I could stand
for the XML Schema specific bits to be split out
into a separate document.

Comments as I read it:

|XML is designed for the creation of tag sets, languages of elements and
attributes.

The term "tag set" is introduced here but not used elsewhere.

suggest: XML, Extensible Markup Language, provides common constructs,
elements and attributes, for use in a large and growing variety of
languages.

| XML is self-describing in the minimal sense that any XML parser can
recognize the namespaced elements and attributes, attribute values, and
text content of a document.

Hmm... I don't see how that makes it self-describing. suggest
striking that.

XML documents that include DTDs are self-describing in that the DTD
part describes the other part. XML documents can participate
in a self-describing Web by way of namespace names and namespace
documents (I prefer the term "grounded in the web" to
"self-describing" for that sense anyway). I suppose you could
say that <partnum>43</partnum> is somewhat self-descriptive,
but only to an agent that has some prior understanding of
the term "partnum".

| It is designed for the combination of languages in instance documents.

suggest: Its self-similar syntax supports documents composed from
multiple languages.

| This paper discusses how developers can design with extensibility and
change in mind, making backward-compatible and forward-compatible
changes possible in the future.

"This paper discusses..." -- now *that's* self-describing. Too meta for
my tastes. Do we need this sort of preface material in addition to
the abstract? I think not.

Hmm... right into 1.1 Terminology without actually establishing the
thesis of the finding. Surely the thesis is something about evolution
of XML languages being a necessary part of the continuous evolution
of the Web, yes? Perhaps you make that point later...

I'm skipping the 1.1 Terminology section; I have a long-standing
distaste for the definitions-up-front style ala ISO specs.
I much prefer definitions in context; collect them in a glossary
at the end if you like, but don't make me slog thru them
before they're motivated. "Component" is defined in this
section but not used elsewhere. What's up with that?

Hmm... here's a candidate for the thesis:

  The primary motivation to allow instances of a language to be
  extended is to decentralize the task of designing, maintaining,
  and implementing extensions. It allows senders to change the
  instances without going through a centralized authority.

But I'm not sure that's the main point to be made about language
evolution. I think the main point is that agents in the web
come and go, with varying capabilities. New language features
appear to express new capabilities and concepts, but old
agents don't go away... at least not right away.

Are "instances of a language" really extended? I suppose
by "instances of a langauge" you mean documents. Documents
don't change; they're like numbers. The number 4 never
changes. Well... there's another sense of the word "document"
ala file (more generally: resource), which does have state,
but I don't think that's what you're talking about here.
I think you're using "instance" as a synonym (or specialization)
of 'representation'.

A thought: an extensible language is one with some syntax
reserved for future use. To extend a language is to
say what some of the reserved parts of the syntax mean.

Hmm... 1.4 Why Do Languages Change? seems to miss the main
point too. The main reason languages change is that the
agents that use them change, and the things that languages
are used to talk about (i.e. life, the world, business,
poetry, media, data, research, etc.) change.

| Using QNames to identify words in the WordNet database, for example,
or the names of functions and operators in XPath2 are examples of "just
name" languages.

How so? WordNet is a rich structure of generalizations and
specialiations, not just a list of names. The functions and
operators in XPath2 are surely more than just a list of names;
the names are connected to a data model, to datatype semantics,
etc.

|This is by no means an exhaustive list. Nor are these categories
completely clear cut.

Then what's the point of this section 1.6 Kinds of Languages?

|Applications are expected to behave properly

Hmm... elsewhere in the webarch doc and this finding we speak
of "agents". Now we have "Applications". Is this separate term
really called for?

|4.1 An Example
|Throughout this paper, we'll motivate our discussion of versioning with
an ongoing example.

Yes, please! Please give readers the example (or at least a start)
before asking them to slog thru defintions.

(As I said in earlier feedback, I'd like this finding to take a
more historical approach: tell stories of what we know about
the history of evolution of data formats, from RFC822, to
HTML, to XML, to SOAP and so on)

| The processor must understand ...

First agent, then application, now processor. I don't see motivation
for the distinct terms.

| Any Namespace: The language SHOULD provide for extension in any
namespace.

I'm not sure I agree with that. I certainly don't see enough
justification that I could convince some WG of that point
based on this text.

This point is justified by only one example, and a hypothetical
one at that.

Also, SHOULD/MUST/MAY are for agents in protocols. Languages don't
do things; they just are. We don't say "numbers should be
greater than 4". Either they are or they aren't.

"The language" seems odd too... which language? Do you mean
"All languages"?

| Full Extensibility: All XML Elements SHOULD allow any attributes and
allow any elements in their content models.

Again, I'm not sure I agree and I don't see enough justification
here to convince typical WG members.

| The key value of the extension strategy described above is that
existing XML documents can be extended without having to change existing
implementations.

Yes! There's your thesis. Well, generalize it a bit so it's
not specific to "the strategy described above".

And, editorially, we're now up to 4 terms for the
same concept: Agent, Application, processor, and implementation.

| Must Ignore: Receivers MUST ignore

There's #5!

I'm confused by that use of MUST. Taken out of context, it seems
to refer to *all* receivers of any kind. But it's in a "good practice"
box, not a "constraint" box, and it's prefaced by "For many
applications...".





--
Dan Connolly, W3C http://www.w3.org/People/Connolly/

Received on Thursday, 23 October 2003 12:16:59 UTC