Re: A particle is a term and a term is a particle ... circular definition? from C. M. Sperberg-McQueen on 2012-06-15 (xmlschema-dev@w3.org from June 2012)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Fri, 15 Jun 2012 00:56:52 -0400
To: "Costello, Roger L." <costello@mitre.org>
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>, "xmlschema-dev@w3.org" <xmlschema-dev@w3.org>
Message-Id: <5B2E56D4-41A0-4A26-AB03-AAEC27F50B04@blackmesatech.com>
On Jun 14, 2012, at 2:59 PM, Costello, Roger L. wrote:

> Hi Folks,
> 
> In section 2.2.3.2 of the Structures specification it says:
> 
> (1)  A particle is a term ... consisting of either an element declaration, a wildcard or a model group ...
> 
> (2)  Term is ... any of the three kinds of components that can appear in particles.

A slightly different choice of elision may make the distinction being
drawn clearer:

"A particle is a term in the grammar ..., together with occurrence constraints."

"Term is used to refer to any of the three kinds of components which can 
appear in particles."


> Huh?
> 
> A particle is a term which is a particle which is a term which is ...

Not quite. A particle is a term plus occurrence constraints.
A term is the part of the particle that is not the occurrence
constraints:  an element declaration, a wildcard, or a 
model group.

Consider a slightly different case.  

In regular expressions, the expressions a+ and a* differ
in one way (+ vs *), but are the same in one way (a vs a).
The expressions a+ and b+ also differ in one way and are
the same in one way.  If you are going to talk about the structure
of regular expressions, and make rules about their structure
and meaning, you may find you need terms to distinguish
the expression a+ from its two distinct component parts.

Similarly for bits of content models.  If a content model says
the content of an element can contain one or more 'a' 
elements, it will sometimes be helpful to be able to distinguish
cleanly between the reference to the 'a' element and the 
reference to 'one or more of those'.

In XSD, the larger expression (the analog of a+) is called a 
'particle', and the basic part of the particle, to which the 
occurrence constraints expressed by minOccurs and maxOccurs
apply, is called the 'term' of the particle.  The choice of names
for these concepts is inevitably a little arbitrary.


> ...
> 
> This is even more interesting:
> 
> (3)  A basic term is an Element Declaration or a Wildcard.

So, any term that is not a model group.

> 
> (4)  A basic particle is a Particle whose term is a basic term.
> 
> Huh?
> 
> Let's examine (4) shall we? A basic particle is a Particle (which according to (1) is a term) whose term (which according to (2) is a particle) is a basic term.
> 
> This is just gibberish as far as I can tell.

If you have followed what I said above, it should be clear that it's
not gibberish but a straightforward statement.  Particles consist
of terms plus occurrence constraints, and one classification of
particles (basic vs non-basic) depends on the nature of the 
particle's term (also basic vs. non-basic).

> 
> Why can't this stuff be written in a simple, concise manner?

In the case of this text?  Sheer human fallibility on the 
part of the editors.  Sorry about that.  

Send any WG chair an infallible superhuman editor
and they'll thank you for it.  But until you can secure a reliable
supply of them, WG chairs are stuck appointing editors who will
screw things up.   In this case, two editors (at least) screwed up
here, one by drafting sentences which leave some smart readers
high and dry, and the other by failing to see the problem and
revise the sentences to make them give better guidance to the
reader.  Since I was one of those editors, I'll say yes, you're
right, this should be clearer.  If a careful reader is confused, 
then the text should work harder to throw that reader a lifeline.

But I'll also confess (human fallibility again) that until you provided
concrete evidence that these sentences have confused a 
careful reader, I thought they were perfectly clear.  And it's
still not clear to me how to redraft them so the new sentences do
a better job than the existing text.

> Why introduce terminology that has no apparent benefit?

Naming things is an important step in making them easier to
talk about, think about, and understand; introducing terminology for
important concepts in a spec is one of the most important functions
the text of any spec performs.  Good terminological choices help
the WG responsible for maintaining a spec, as well as readers 
of the spec, to clarify their thoughts.  

I think that sometimes it may be hard to perceive the advantage of
even good terminological choices at first glance; in those cases, the
answer to your question is:  because it may have benefits that are
not apparent.

In other cases, the apparent lack of benefit is due to a failure of
understanding on the part of the reader.  There are a lot of
rules in XSD that apply to terms and there are different rules
that apply to the particles which enclose those terms.  Trying to
express those rules without the terms "particle" and "term" would 
not make the spec any clearer; there are plenty of examples in the
XSD spec where the introduction of suitable terminology would 
simplify the text dramatically.

> 
> Why are terms used before they are defined? For example, the term "particle" is used in 2.2.1.3 but isn't defined until 2.2.3.2 and model group is used in 2.2.1.3 but isn't defined until 2.2.3.1.

In the general case, I think the reason is that it's not always possible
to sequence definitions of complex sets of terms so that no 
definition or discussion appeals to any terms defined later in the
sequence.  Here, both of the terms you mention are first introduced
at the beginning of section 2.2 as components of XSD schemas,
so the reader of section 2.2.1.3 can plausibly be expected to know
that "particle" denotes a kind of schema component, even if the
reader may not yet know much about what kind of component a 
particle is.  And that level of understanding really ought to suffice
for understanding what 2.2.1.3 says about particles.

> 
> Why is this specification 380 pages long? 

That one's easy.

Because the 1.0 WG did not have the time it would have taken to 
make it shorter.  And the 1.1 WG was unable, for reasons of backward
compatibility, to perform the kind of conceptual simplification that
would be necessary to make the text shorter; instead, the editorial
revisions in 1.1 mostly took the form of simplified syntax, some
modest refactoring of the prose, and the addition of explanatory 
material which made the text even longer.

Shorter specs are nicer, when they are possible and when they do
the job.  Sometimes, however, the time comes when it seems necessary
to ship an imperfect spec rather than delay it further.  



-- 
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com 
* http://cmsmcq.com/mib                 
* http://balisage.net
****************************************************************
Received on Friday, 15 June 2012 04:57:19 UTC