comments on 12 April 2011 Last Call draft of XML processor profiles from C. M. Sperberg-McQueen on 2011-04-15 (public-xml-processing-model-comments@w3.org from April 2011)

From: C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com>
Date: Fri, 15 Apr 2011 14:18:47 -0600
To: public-xml-processing-model-comments@w3.org
Cc: "C. M. Sperberg-McQueen" <cmsmcq@blackmesatech.com>
Message-Id: <C92D390F-2A78-41C0-A966-D8759057642F@blackmesatech.com>

I congratulate the XML Processing Model working group on 
the publication of the last call draft of the XML processor
profiles specification.  I attach a document with comments on
the draft; a markup-free version is appended to this mail
for the use of those averse to the use of the typographic
arts.

Thank you for your work on this document.

-CMSMcQ

-- 
****************************************************************
* C. M. Sperberg-McQueen, Black Mesa Technologies LLC
* http://www.blackmesatech.com 
* http://cmsmcq.com/mib                 
* http://balisage.net
****************************************************************

...........................


Comments on 12 April 2011 draft of

XML processor profiles

C. M. Sperberg-McQueen, Black Mesa Technologies LLC

15 April 2011

   1. Choice of facets for characterizing processors
   2. Respect for the stand-alone declaration
   3. Validating processors
   4. Definitions of terms
   5. Are the profiles disjoint?
   6. Identification of xml:id attributes as IDs
   7. Processing of external declarations
   8. Providing information items
   9. Data models and information sets
  10. Rigidity
  11. Relation of profiles to current practice
  12. Implementability of the spec
  13. Conformance clause
  14. Documentation of implementation-defined features
  15. The information expressed in XML documents
  16. The information classes
  17. Recursive XInclude processing
  18. Minor editorial points, typos, etc.

This document comments on the Last Call draft of 12 April 2011 of the
XML processor profiles specification (hereinafter XPP). The comments
are the sole responsibility of the author, who is not here speaking
for any other persons or organizations.

I have tried to organize my comments into distinct observations that
can be dealt with separately, but I have been unable to make them
wholly discrete and orthogonal. For each topic I have tried to
indicate whether I believe it to be a substantive or an editorial
point, but the distinction is not always helpful and I have not tried
to apply it in all cases. Trivial points of style and spelling are
gathered together at the end.  


1. Choice of facets for characterizing processors

Any formulation of profiles for a specification chooses to clump some
things together and to separate others, by placing those things (here,
XML processors) either into the same class (processors satisfying a
given profile) or into different classes. By defining profiles in
terms of certain chosen criteria and not in terms of other criteria,
any such spec makes an implicit choice among the infinite number of
facets that could be used for characterizing the objects being
classified. It is almost always helpful if that choice is made
explicit rather than implicit. The current draft leaves the choice
unexplained, its rationale unarticulated.

The introduction should describe at least briefly the rationale for
the choice made and characterize both the dimensions along which the
profiles defined here distinguish among processors and also some of
the more obvious dimensions along which the profiles do not
distinguish processors. At the very least, the spec needs to
acknowledge explicitly that some properties have been left out of
account instead of being made the basis for defining different
profiles, for example by saying explicitly that propertes P, Q, and R
are not taken into consideration in defining the profiles.

For example: in many kinds of practical work the most important
characteristic among processors for any given programming language is
probably the distinction between processors with event-based and those
with tree-based interfaces, as exemplified by the difference between
the SAX and DOM interfaces. A reader of XPP might plausibly expect,
therefore, that if XPP is intended for practical use, that distinction
will show up in the definition of profiles, in order to allow the
profiles to provide a useful characterization of processors. What
should such a reader infer from the failure of XPP even to mention
this distinction? That you did not think the distinction important?
That you thought it was not practically relevant? That it is too
difficult to specify cleanly? Or that it didn't occur to you? Right
now the text of the spec is compatible with each of these inferences;
I think that if you explain your choice you may be able to elimimate
at least the last one.

N.B. this issue is not about the choice of facets (concerning which
there are other comments elsewhere) but about the need to say clearly
that a choice has been made and to indicate why some facets were
chosen and others not. Improving the choice of facets, as I hope you
will, will not excuse you from the need to justify the choice, or at
least identify and explain it.

SUGGESTED FIX: explicitly acknowledge that XPP involves a choice among
possible ways of characterizing processors; identify the processor
properties used as the basis for the classification proposed and
identify at least some potential properties which are not used in the
classification. Explain the basis for the choice.


2. Respect for the stand-alone declaration

It would be helpful, I think, for the processor profiles to
distinguish more carefully the different behaviors possible with
regard to the stand-alone declaration in the input XML document.

  - All declarations are read and handled appropriately, so documents
    with standalone='no' are processed without information loss.

  - No external declarations are read if standalone='yes'; if
    standalone='no' then external declarations are read, so all
    documents are processed without information loss.

  - No external declarations are read; if standalone='yes', the
    document is processed without information loss, and if
    standalone='no', the processor signals an inability to process the
    document without the possibility of information loss.

  - No external declarations are read, so documents with
    standalone='yes' are processed without information loss, and
    information will typically be lost in the processing of documents
    with standalone='no'. (Since documents may have standalone='no'
    even if standalone='yes' would be permitted, there can be cases
    where no information is lost in practice.)

In particular, it would be helpful for users of XML and for writers of
specifications for XML-based processing to distinguish the last case
from the others, in order to exclude it.

SUGGESTED FIX: augment the basic profile to require either that
external declarations be read when necessary or that the processor
signal an inability to handle non-standalone documents
properly. Optionally also keep the profile now called basic, giving it
a new name (personally, I could go for “sub-optimal”, but some people
might think that that name was ungenerous).


3. Validating processors

Why do none of the defined profiles include validation? Are there no
validating XML parsers? Or is it the view of the authors that in
characterizing an XML processor it is unimportant whether the
processor performs validation or not?

This reader particularly objects to the use of the word recommended to
name a processor profile which does not involve validation of the
input. It might be better to avoid the value judgement intrinsic to
the word. But if you are going to make the value judgement, then I
think you should make the correct value judgement, which is that
validation is a more valuable service than non-validating
well-formedness checking.

SUGGESTED FIX: Define at least one profile for validating
processors. Either eliminate the name recommended or reserve it for
validating processors.


4. Definitions of terms

The specification has a section on terminology; I think this is
helpful. It could be made more helpful if the terminology section were
more systematic.

A section on terminology can help the reader of a spec by identifying
the key concepts of the specification and defining the terms used to
denote them. It can help the authors of the spec by forcing them to
attempt a coherent statement of what they mean by a given term; the
effort of providing a formal definition is often repaid by the
discovery of inconsistencies or infelicities in the authors'
understandings of key concepts. In the current draft, however, the
terminology section does not have much success with either of these
tasks. Some important concepts appealed to in the current draft are
not defined at all; others are hyperlinked to other specs but the
definitions of the terms are not repeated in the current document,
which makes the document unnecessarily hard to read.

Among the terms that ought to be defined, since they are crucial to
the intellectual work of the spec, are these.

  - XML processor (the definition given in the XML spec is quoted in
    section 1; it might usefully be given again in the terminology
    section)

  - rigid (used to characterize mappings from XML documents to data
    model)

  - profile (what is a profile? How is it distinguished from a thing
    which is not a profile? Is my cup of coffee an XML processor
    profile? Why not?)

  - processor profile (ditto)

  - data model (in addition to specifying what is meant by this term
    in XPP, the spec should probably also take a little more care to
    distinguish data models and data model instances)

  - faithful provision

  - expose (is faithful provision the act of exposing? or is it
    possible to expose information in a way that does not constitute
    faithful provision? faithless provision?)

  - construction (esp. of data models)

  - identification as IDs (esp. of xml:id attributes)

  - reading (esp. of external markup declarations)

  - processing (esp. of XML documents and of external markup
    declarations)

  - packaging (of information; is packaging the same as faithful
    provision and/or exposure?)

  - provide (of information items and properties; identical to or
    different from packaging? faithful provision? exposure?)

  - implementation-defined

It is not a coincidence that defining these terms well will require
clarity in the central concepts they denote. If the concepts are
already sufficiently clear to the authors of the spec, you owe it to
your readers to share that clarity with them. If they are not
currently clear enough, then you owe it to the potential users of your
spec to sharpen them; muddy concepts lead to poor designs. A careful,
explicit definition of profile will make it easier for readers to see
whether the profiles described later are well defined or note, and
easier to judge their utility. It might also help the authors of the
spec improve the specification of the profiles.

Among the terms that are appealed to but not defined locally are

  - well-formed

  - namespace well-formed

SUGGESTED FIX: Use the terminology section to provide explicit
definitions for your key terms. Use the exercise of defining the terms
to clarify your key concepts. Revise the rest of the spec to reflect
the increased clarity of the analysis.


5. Are the profiles disjoint?

Is it intended that every processor conform to at most one profile
from among those defined here? Or is the set of processors conforming
to the minimum profile intended to be a superset of those conforming
to the basic profile?

It would be helpful to say, one way or the other.

SUGGESTED FIX: Say, one way or the other.


6. Identification of xml:id attributes as IDs

In 2.1, 2.2, and 2.3, what does it mean to “[identify] all xml:id
attributes as IDs as required by [xml:id Version 1.0]”? How does one
distinguish between a processor which satisfies this requirement and
one which fails to satisfy it?

Is identification of xml:id attributes as IDs distinct from
“processing” them as IDs? As far as I can tell, it's much easier to
find rules in the xml:id spec that define processing than to find
rules that define identification of attributes as IDs.

SUGGESTED FIX: define identification as an ID as it is used
here. (Since [attribute type] is a class-A property, I think that
identifying an attribute as an ID simply means that attributes named
xml:id are given [attribute type] = ID. If that is what you mean, say
so. Do not assume that the reader of XPP has perfect recall of the
full text of the Infoset and xml:id specs.)


7. Processing of external declarations

Sections 2.3 and 2.4 specify “reading and processing all external
markup declarations”; what kind of processing is involved? Is there a
difference between reading and processing? (The fact that both are
mentioned may suggest that there is; the spec's habit of using three
words where one would do may suggest that there is not.)

SUGGESTED FIX: Define reading and processing, as they apply to
external markup declarations. If they are synonymous, drop one of
them.


8. Providing information items

Section 3 describes class A as “Items and properties which must be
provided by all profiles.”

This reader finds the relative clause confusing. In ordinary English,
I would have thought that XML processors provide access to, or expose,
the information in the XML document. The words quoted suggest that it
is not processors but profiles that provide (access to) the
information, which in turn persuades me that you are either using one
or more of the terms item, property, provide, and profile in some
special sense distinct from ordinary-language usage, or that you are
using words too carelessly. If nothing else, this sentence seems to
underscore the desirability of defining terms explicitly.

SUGGESTED FIX: Decide what it means to provide information, and
whether the act is performed by processors or by profiles, or (by a
kind of type overloading) both. Define the relevant terms. Recast the
spec to use the terms as defined.


9. Data models and information sets

At several points, the words of the spec suggest that XPP believes
itself to be in the business of talking about how processors map from
sequences of characters matching the document production in the XML
spec to data models (or more probably instances of data models). But
nothing in the spec actually talks about anything that seems
recognizable as a data model (in my understanding of any of the many
ways term is normally used); the most visible differences among the
profiles have to do with how much information they provide, and in
some cases with whether particular items in the input are
characterized in accordance with their declarations or not.

SUGGESTED FIX: Either define what you mean by construction of a data
model and recast the definitions of the profiles so that they do in
fact say how it is to be done, or stop talking about data models and
admit that XPP is about which subset of the defined infoset is
provided by processors.


10. Rigidity

Section 1 reads in part

    Such definitions [of XML applications] have suffered to some
    extent from an uncertainty inherent in using that kind of
    foundation, in that the mapping XML processors perform from XML
    documents to data model is not rigid.

This way of putting things seems to convey an implied rebuke to the
authors of the XML specification and to reflect an assumption that
they were trying to define a rigid mapping, but that they failed. The
rebuke may be warranted (although I think the failure to specify the
information a parser must provide is more clearly an error than the
decision not to specify a data model), but the historical assumption
is false. I think it would be historically more accurate to say that
the authors of the XML specification intended the XML specification to
be compatible with a wide variety of data models and processor
interfaces and that the XML specification, as a result, provides a
flexibility and generality that the authors of other specifications
have not always found helpful. In search of flexibility, the XML spec
leaves to other specifications some responsibilities which,
empirically, other specifications have not always bothered to
discharge.

This is primarily a rhetorical issue, but it seems important because
it touches on the purpose of this specification and the task it sets
itself to solve.

SUGGESTED FIX: If you think the XML spec screwed up, say so cleanly
and say how XPP proposes to mitigate the damage. If you think the XML
spec got it right but later users of XML have screwed up by not
meeting their responsibilities, then say so cleanly and say how XPP
proposes to mitigate the damage. If you don't think anyone screwed up,
strictly speaking, but the situation can still be improvied, then find
a non-pejorative characterization of how the current situation arose
and explain in neutral terms how it can be improved.


11. Relation of profiles to current practice

When profiles are defined for a new specification, they involve
predictions about which kinds of variation in processor behavior are
likely to be interesting and useful to developers and users. In the
case of new specifications, there is no existing practice that could
be appealed to as a justification of the classification or profiles,
or to provide examples of software fitting one profile or another.

That is not the case here, and I think the specification should not
progress until an empirical survey of existing processor
characteristics is performed, as a simple way of field-testing the
profiles defined here for applicability in the real world and of
clarifying the intent of the profiles by providing examples, where
applicable, of existing interfaces or processors that satisfy the
profile.

In particular, I could have sworn (but I am too lazy to look it up
now) that I had used some parser interfaces which did not provide
access to namespace prefixes, and other interfaces which provided only
inconvenient access to namespace names. Is a set of profiles which
assumes that namespace name, local name, and prefix are always all
three provided a good match for a world in which some parser
implementors give their users a choice (prefix plus local name or
namespace name plus local name)?

Note that actually classifying real parsers will require a crisp
definition of what it means to make a particular information item
available; that will be a good thing, although it is likely to involve
some work.

SUGGESTED FIX: Identify ten or twenty existing XML processors with
different behaviors (for purposes of this exercise, all conforming SAX
processors may well turn out to be alike; ditto for conforming DOM
parsers). Using the definitions given in XPP, identify which
profile(s) each parser matches, if any. If there are significant
numbers of parsers which match no profile, consider whether the
profiles need to be revised to provide a better connection with
existing practice. Use a non-normative document to provide examples of
processors matching the different profiles.


12. Implementability of the spec

The status section says

    [T]his specification is not implementable as such ....

This makes no sense.

The specification defines processor profiles for XML processors. On
the face of it, it seems entirely possible for XML processors to make
meaningful claims to having implemented one or more of the profiles
defined here — not only meaningful, but desirable as a way of
simplifying communication between software provider and software
user. If a vendor claims that an XML parser provides an interface
which matches the modest processor profile defined here, it would (or
so it seems) be quite possible to put that claim to the test and
decide rationally whether the claim is true or not. In what sense,
then, is this spec not implementable?

The status section says, further down, that XPP “is intended for use
by other specifications which themselves define one or more XML
languages”. I take that to mean that the idea is that just as XSD, for
example, now specifies that its input documents must be exposed in a
way that makes certain infoset properties available, specs might in
future say that their upstream processor must conform to this or that
processor profile. But if XPP denies that an XML processor can
implement XPP, it must follow that the processor cannot conform to
XPP, or to any profile defined by XPP. So how can your intended
clients use XPP to characterize the class of processors they require?

As things stand, they cannot; XPP provides no conformance rules for
XML processors, so it is not in fact very useful as a tool for
identifying classes of processors.

SUGGESTED FIX: Remove the claim that XPP is not implementable.


13. Conformance clause

The conformance clause suggests that other specifications can make use
of this one by using words such as “Conforming implementations must
construct input data models from XML documents as required by the
recommended XML processor profile.” There are at least two problems
with this formulation.

First, as noted elsewhere, XPP does not in fact define what a data
model is or how to construct one. So it's hard to see just how to
construe the phrase “construct input data models from XML documents as
required by the ... profile”.

Second, requirement is merely a name used in specification writing to
denote criteria which must be satisfied by an objects making true
claims of conformance to a specification. (To quote the ISO/IEC
Directives for the structure and drafting of international standards,
a requirement is an “expression in the content of a document conveying
criteria to be fulfilled if compliance with the document is to be
claimed and from which no deviation is permitted”.) If a spec does not
define conformance for a given class of objects, it follows logically
that the spec does not (and logically cannot) define requirements in
this sense for objects of that class.

Perhaps you are using requirement in some other sense? But no, section
1.1 specifies that you use the word “as described in [RFC 2119]”. RFC
2119 unfortunately provides only an implicit definition for the term
requirement, which has the unfortunate additional property of being
circular:

    MUST     This word, or the terms "REQUIRED" or "SHALL", 
    mean that the definition is an absolute requirement of the
    specification

From the equation of required with must and shall, however, it seems
likely that the intent of RFC 2119 appears to be similar to that of
the ISO/IEC Directives. That is, as far as I can tell, the usual usage
of the terms in W3C and IETF specifications. So I infer that the word
requirement really does mean here a property whose absence will
invalidate any claim to conformance.
    
Now, XPP spec explicitly denies that processors can conform to XPP. It
follows logically that neither XPP nor the profiles defined in XPP
define requirements for XML processors, and that any spec that
requires XML processors to conform to a given processor profile
defined in XPP is asking them to perform the impossible, to conform to
a specification which denies that they can possibly conform to it, and
also making a vacuous requirement, that they satisfy the requirements
of a specification which formally defines no requirements.

SUGGESTED FIX:

  - Replace the phrase “construct input data models ...” with some
    words which are given a meaningful definition by the spec. Or
    alternatively define what is meant by data model, construct, etc.

  - Define criteria for conformance of XML processors to the profiles
    defined.


14. Documentation of implementation-defined features

In section 3, the definitions of classes V and X say that processors
“should document whether they provide this information to applications
or not.”

The term implementation-defined is used by other specifications, both
within and outside the W3C, to characterize features or behaviors
which conforming processors are required to document as part of a
claim to conform to the specification. If the term is taken to have
that meaning here (XPP does not define the term, so it's hard to be
sure), then the statement that processors “should” document their
behavior is logically inconsistent: if the behavior is
implementation-defined, then the correct verb is must, not should.

If the intent is to specify that behavior is allowed to vary from
processor to processor and that processors are not required to
document their behavior as a condition of conformance, then the term
implementation-dependent is less flagrantly inconsistent with usage in
other W3C specifications. It, too, however, is incompatible with the
following sentence, since by default implementation-dependent is used
to characterize features and behaviors which should not be documented,
since users of the technology in question are to be discouraged from
relying on particular processor-dependent behaviors. In any case, I do
not believe there can be a good reason for a processor not to provide
the documentation in question, so I think should is out of place. It
should be a requirement, and the verb to use is must.

I note in passing that the use of should here is logically
incompatible with XPP's failure to define conformance criteria for XML
processors.

SUGGESTED FIX: Add a definition of implementation-defined compatible
with that used in XPath 2.0 and related spec, and delete the sentence
saying that processors “should” document the behavior in
question. (Optionally add a redundant statement saying that they
“must” document the beahvior, or better a note observing that it is a
consequence of the behavior's being implementation-defined that the
implementation must define it.)


15. The information expressed in XML documents

Section 3 begins

    For the profile definitions above and the invariants below, we
    categorize the information expressed in XML documents into a
    number of (overlapping) classes.

This is incorrect. What is characterized in section 3 is not the
information expressed in an XML document, but the particular subset of
that information for which the Infoset spec defines names. The two are
the same neither in theory nor in practice.

SUGGESTED FIX: Replace the sentence quoted with one that's not
false. Perhaps “For the profile definitions above and the invariants
below, we categorize the information identified and named in [XML
Information Set] into a number of (overlapping) classes.”


16. The information classes

A reader might plausibly wonder about the principles which guided the
classification of information items and properites in section 3. At
least, this reader wonders. After reading the spec, I'm still
wondering. An explicit statement of the principles which guided the
classification should be provided.

Some description of why the letters, A, B, P, V, and X were chosen as
the names for their classes would also help.

It would be preferable, I think, for the classes to be characterized
in terms of their content rather than solely in terms of which
processor profiles are required to expose them. As it is, the
statements that class A consists of “Items and properties which must
be provided by all profiles” and class B of “Items and properties
which must be provided by 2.3 The modest XML processor profile and 2.4
The recommended XML processor profile” look as if they are intended to
serve as definitions, but as definitions they are wholly unsuitable
and as normative statements they are wholly redundant with 2.1 through
2.4.

SUGGESTED FIX: Characterize classes A, B, etc. not in terms of which
profiles they are associated with but in terms of what information
they contain. Either explain the choice of letters, or label the
classes A through F. Optionally make the classes disjoint to reduce
confusion.


17. Recursive XInclude processing

It might be helpful to readers to remind them in a note that XInclude
requires recursive processing of include elements, so that the output
of a processor matching the ‘recommended’ profile will be guaranteed
never to contain xi:include elements.

SUGGESTED FIX: Say it explicitly. Do not assume your readers have
memorized the XInclude spec.


18. Minor editorial points, typos, etc.

Some typographic and editorial problems caught my eye.

  1 The term profile is not used in this specification to denote any
    thing other than the processor profiles defined here. The terms
    profile and processor profile denote the same thing. So in most of
    the twenty-five or so occurrences of the phrase “processor
    profile”, the first word supplies no information or meaning not
    already supplied by the second word. The spec would be shorter and
    easier to read if the first word were struck from, say, twenty or
    so of those occurrences.

  2 The status section says

        [T]his specification is not implementable as such ....

    What does “as such” mean here? Normally, one would take “such” to
    have an anaphoric reference, so the sentence would be equivalent
    to saying “this specification is not implementable as a
    specification”, but the meaning of this rephrasing is also opaque
    to me. How would that be different from not being implementable?
    Perhaps the anaphoric reference is to the concept of
    implementation, so the phrase ought to be expanded to “not
    implementable by means of an implementation in the strict
    sense”. This does not seem very promising, either.

    Perhaps the simplest repair of the stylistic problem would be to
    delete “as such” without replacement. (But the stylistic problem
    is not the only problem here; see 12, “Implementability of the
    spec”.)

  3 In section 1, horizontal ellipses are used with whiespace between
    the full stops without whitespace before or after the ellipsis.

    For “a software module. . .used”, read “a software module
    ... used” or optionally “a software module … used” (the latter
    using the standard hellip entity for character U+2026).

  4 Section 1 reads in part

        Another kind of uncertainty stems from the growth of the XML
        family of specifications: if the input document includes uses
        of XInclude, for instance.

    This is not a well-formed English sentence. Perhaps a continuation
    of the sentence has been lost? Something like “, then the results
    provided by the XML processor may vary among processors, so that
    the application does not know what to expect”?

  5 All the manuals of style I know frown on subdividing sections of a
    document into fewer than two subsections. Section 1.1 on
    terminology should either be given a sibling, or folded into its
    parent section, or promoted to be a sibling of its current
    section.

    Since terminology is not really part of the background of the
    specification, the last possibility seems best.

  6 In 1.1, the paragraph about base URI says the term is used “as it
    is defined in [RFC 3986]”. But RFC 3986 does not provide any
    definition properly so called for the term base URI. It specifies
    rules for establishing and using a base URI, but it does not
    “define” it.

    I think what is meant is that XPP assumes that the base URI is
    established and used as specified in RFC 3986. So perhaps read

        A base URI is an absolute URI against which relative URIs are
        applied; this specification assumes that base URIs are
        established and used as specified in [RFC 3986].

    But you should probably also decide whether XPP assumes it or
    requires it.

  7 In the first paragraph of 2, the phrase the steps necessary to
    construct a data model from a well-formed and namespace
    well-formed XML document seems ill chosen. The descriptions that
    follow are not, in fact, procedural in nature, so steps doesn't
    seem right. Nor do they in fact redeem the promise of information
    on how to construct a data model (or even a data model instance).

    In principle, I'd like to propose better wording, but I can't
    because I don't know what you are trying to say here. I think you
    are mostly just trying to characterize the sections which follow
    by talking about what the profiles do or are. Unfortunately, I
    also don't understand precisely what you mean by the word
    profile. Judging by this phrase's flamboyant failure to connect
    with anything that actually happens in sections 2.1 through 2.4,
    you may be experiencing some trouble in that area, too.

  8 In 2.1, the mention of information set classes A, B′, P, and X
    comes out of nowhere; this reader felt completely blind-sided.

    It would be better if somewhere closer to the top of the document
    there were some words to say something like

        Profiles are defined in terms of a processor's behavior with
        regard to external markup declarations, its support or lack of
        support for xml:id and XInclude, and the information it
        provides to its downstream applications. For this purpose,
        section 3 of this specification partitions the information
        items and properties defined by [XML Information Set] into
        classes; each profile specifies which classes of information
        are exposed by processors in that class.

  9 In 2, the clauses about faithful provision of the information in
    the document all take the form “Faithful provision of the
    information ... corresponding to information items and properties
    ...”.

    Perhaps it would suffice to provide, or expose, the information
    items and properties specified.

    If it is absolutely necessary to provide not the information items
    and properties themselves but instead information corresponding to
    (but, implicitly, not identical to?) the specified items and
    properties, then I think the spec has an obligation to explain
    clearly what the difference is, and why exposing the items and
    properties does not satisfy the requirements of the spec. In
    particular, you need to provide an answer to the reader who is
    asking “How can a piece of information correspond to an
    information item without being indistinguishable from it (qua
    information) and thus without being that information item?”

    The editors might do well to review their dusty copies of Strunk
    and White's Elements of style, especially the maxim “Omit needless
    words”, and to revise accordingly. If they do, the individuals
    corresponding to their readers will feel an emotion corresponding
    to gratitude. (Or, at least, a diminished desire to seek out sharp
    objects and perform dangerous acts with them.)

 10 2.4 reads (rule 4):

        Replacement of all include elements in the XInclude namespace,
        and namespace, xml:base and xml:lang fixup of the result, as
        required for conformance to [XML Inclusions (XInclude) Version
        1.0 (Second Edition)];

    This sentence seems unnecessarily awkward; this reader, at least,
    found it hard to read and follow.

    Perhaps “and fixup of the namespace, xml:base, and xml:lang
    properties of the result ... ”?

 11 In section 3, the labels of the classes are reduplicated. For
    “Class AClass A” read “Class A”, and similarly for the other
    classes, and for the lists of information items in section 4.2.

 12 In 4.2.2 and 4.2.3, a number of items are described in terms like
    “Entirely, per the Element case above.” The spec would be clearer
    if full sentences were used; this reader is not certain whether
    the intended verb is “may differ” or “may be absent”, and I do not
    know what the subject of the sentence is intended to be.

    Also, my Oxford American dictionary defines per as meaning ‘in
    accordance with’, but what is meant here seems to be something
    more like ‘as described in’. The Collins COBUILD dictionary agrees
    with Oxford in saying that the use of per in this way normally
    involves things happening or being done “in the way that the plan,
    system, or set of instructions says it should be done”. But I
    don't think the description of the element case includes any
    instructions or provides any sense of what should or should not be
    done; I think per is out of place here.

 13 In 4.2.2 and 4.2.3, the list of differences between information
    sets (I think that's what is being listed) is made unnecessarily
    long and opaque by being organized around classes of information
    items instead of around cases of difference.

    In 4.2.3, there is a list of seven items, six of which turn out on
    inspection carry as explanation the words “Entirely, for exactly
    the same reason”. It would be a lot easier for the reader to see
    what is going on if the list were replaced with a pargraph:

        Where processors conforming to the modest profile report an
        xinclude element, processors conforming to the recommended
        profile will report the result of XInclude processing, which
        will consist of zero or more element, processing instruction,
        unexpanded-entity, character, or comment information items. In
        consequence, the results reported by processors matching these
        two profiles may differ in the presence or absence of those
        information items, as well as in the presence or absence of
        attribute and namespace information items on the elements in
        question.

    Though more detailed and clearer than the current description,
    this takes less space than the current formulation.

 14 In 4.2.2 and 4.2.4 various occurrences of the word element are
    capitalized for no discernible reason. Samuel Johnson is dead; it
    is too late to bring back his capitalization habit.

Attachments

text/html attachment: xml-proc-profiles-comments.html

Received on Friday, 15 April 2011 20:19:31 UTC