XPointer: many comments

Comments on
"XML Pointer Language (XPointer): W3C Working Draft 9 July 1999"

Submitted to www-xml-linking-comments@w3.org by Michael Dyck
(jmdyck@netcom.ca) on July 31, 1999.

Most of my comments are piddly typographical/editorial details, but some
are more serious, tending to deal with inconsistencies within the spec
and between it and the XPath spec (especially in the sections on `range'
and `string'). Rather than attempt to sort my comments by severity, I'll
just give you them all in document order.

I use "A -> B" as a shorthand for "Change A to B".

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

The "previous version" points to "TR/WD-xml-link-970731".
    Shouldn't that be "TR/1998/WD-xptr-19980303"?

------------
1.1 Language Design Goals

title:
    De-capitalize "Design Goals". (Most of the section titles capitalize
    only the first word.)

------------
1.2 Relationship to Other Documents

title:
    De-capitalize "Other Documents".

4th para:
    Put "XML 1.0 specification" in italics?

5th para:
    Put "Namespaces in XML" in italics.

7th para:
    Insert "The" before "XHTML".

9th para:
    Insert "The" before "draft".

------------
2. XPointer Usage

title:
    De-capitalize "Usage".

------------
3. The XPointer model and language

1st bullet:
    "The semantic .. is" -> "The semantics ... are"?

grammar:
    Change "XFragment" to "XFragmentIdentifier"? (It seems to me that
    the XML Fragment WG should be the ones to define a symbol named
    "XFragment".)

    "when scheme is xptr":
        Capitalize "scheme".  Put "xptr" in quotes?

------------
3.1 Character sets and escaping

last para:
    "SchemSpecificExpr" -> "SchemeSpecificExpr"  (insert "e")

    Insert comma after "i.e.".

------------
3.2 Schemes

3rd para:
    "GenLocationPath" -> "GenlLocationPath" (insert "l")

Editor's Note:
    Talking about "multiple LocationSpecs in an XPointer" is not
    consistent with the grammar given in Section 3.  For that, you
    should change the grammar to:

        XFragment    ::= BareName | Tumbler | ( LocationSpec )+
        LocationSpec ::= Scheme '(' SchemeSpecificExpr ')'
        etc

    (You could then go back to the 2nd para and change "scheme, along
    with its arguments," to "LocationSpec" if you liked.)

    "attribtues" -> "attributes"

    "An XPointer like #...":
        The "#" sign is not part of the XPointer.

    "would work": Optimistic. Maybe "could work".

    "the second <par> in the <body>":
        The convention in the rest of the spec is to put "par" and
        "body" within a <CODE> element.  The same goes for the two
        fragment identifiers and the URI also in the Editor's Note.

------------
4. Summary of XPath

2nd para:
    Surely the definition of an XPointer belongs in a normative section.

------------
4.1 XPath basics

1st box:
    For consistency with the XPath grammar:
        "axis-name" -> "AxisName"
        "node-test" -> "NodeTest"
    XPath also includes the square brackets within "Predicate" -- I
    don't know if you want to follow the XPath grammar that closely.

Editor's Note:
    "descendents" -> "descendant"

3rd para:
    "axis-name" -> "AxisName"

    "The context node is initially the document root":
        In the XPath spec, I believe this is only stated for an absolute
        location path, and left undefined for a relative location path.

    "A context in XPointer is the same thing as a context in XPath":
        Well, that isn't really true.  As section 5 points out, here()
        and origin() "in effect add information to the context of
        evaluation".

    "as described below..": delete a period.

    "The curly braces": What curly braces?

5th para:
    "the first following siblings":
        "siblings" -> "sibling"
    The second clause of this sentence might be clearer as: 'and then
    for each of those nodes, find the first following sibling of type
    "list".'

6th para:
    "portions (for ...)." -> "portions. (For ....)"

7th para:
    "SEC", "MYNOTE", "SEC": put within <CODE>.

------------
4.3 XPath Relative Axes

title:
    XPath does not use the term "relative axis".  Delete "relative".

    De-capitalize "Relative Axes".

3rd para:
    "relative axes" -> "axes" (3 times)

child & descendant:
    "pi" -> "processing instruction"

descendant-or-self:
    "constraint" -> "constraints"
    "summay" -> "summary"

parent:
    "nodes" -> "node"

ancestor-or-self:
    "of the SEC itself" -> "or the SEC itself"
    ";" -> "," (twice)

    "and then taking the last one":
        Shouldn't you take the first one? That is, the innermost
        definition takes precedence.  (If the outermost one takes
        precedence, what's the point of having inner definitions?)

    box: I don't think that this path does what you want.  Note that the
        predicate belongs to the last step. That is, the last step is
            attribute::lang[position()=last()]
        which selects the last attribute named "lang" from the context
        node, which is presumably not what you want. Instead, I think
        you need to take everything before the predicate and put it
        within parentheses.  (Note that this makes the whole expression
        a FilterExpr rather than a LocationPath.)

self:
    "nodelist" -> "node list"

    "This is useful for applying multiple predicates to a single axis":
        But that's what every location step does: apply multiple
        predicates to a single axis.

    "particular" -> "particularly"

    "particular when predicates other than the first one must test a
    context node's position among all those context nodes that were
    selected by the prior predicates":
        But again, that's what always happens: a predicate tests a node
        with respect to the list of nodes that were selected by the
        previous predicates in the step.

    An example would help convey the intent.

The `namespace' axis is missing.

------------
4.4 XPath node-tests

title:
    "node-tests" -> "node tests" or "NodeTests"

1st box:
    'if the element type is "x:para"':
        This is not quite correct. I think it actually locates child
        elements whose type is of the form "y:para" where the NCName "y"
        expands (in the element's namespace context) to the same URI as
        "x" does (in the expression evaluation context).  Still, for a
        non-normative summary, perhaps it's good enough.

3rd para:
    The node test "node()" is not mentioned.

4th para:
    "Nodetest" -> "NodeTest"

    The principal type of the "namespace" axis is namespace.

------------
4.5 XPath Predicates

title:
    De-capitalize "Predicates".

1st para:
    "relative axes": Delete "relative". (see above)

    "node-test" -> "node test"

    "variable binding" -> "variable reference"

    "and defines XPointer's additions": No, it doesn't.

------------
4.5.1 Introduction to Predicates

title:
    De-capitalize "Predicates".

1st para:
    "node-test" -> "node test"

    "special-case" -> "special case"

point 1:
    "candidate nodes, such as ... all substrings of the content":
        As is pointed out elsewhere, these are not nodes.

    "the axes are applied" -> "the axis is applied"

    "and the results are unioned together":
        No, the union does not happen here. What gets unioned are the
        results of applying the whole step (points 1,2,3) to each node
        in the step's context node list.

point 2:
    "node-test" -> "node test" (twice)

2nd para:
    "node-test" -> "node test"

    How can no node test be specified?  Even an AbbreviatedBasis must
    have a NodeTest.

3rd para:
    "node-test" -> "node test"

    'is a name or "*".':
        This omits
            NodeType '(' ')' | 'processing-instruction' '(' Literal ')'

    Shouldn't this whole para be in section 4.4?

1st bullet:
    XPath does not define functions named "text" or "processing-
    instruction" (although it does define NodeTests that use those
    identifiers).

2nd bullet, Comparisons:
    Add "!=".

7th bullet, Numeric...:
    "and quo (quotient)": delete.

------------
4.5.2 Positional tests

1st, 5th, 7th paras:
    "predicate function" -> "boolean function"?
        (XPath doesn't use the term "predicate function", but it does
        use the term "boolean function".)

4th para:
    "locates the first <item> element":
        Don't need angle brackets.

------------
4.5.3 Local structure tests

1st para:
    "attribute-values" -> "attribute names"?
        (It's possible to test for attribute value, but that hasn't been
        "described above".)

2nd para:
    "document contexts": add "or contents"?  (An embedded location path
        can look outside or inside the context node.)

------------
4.6 Examples of axis usage

child::
    "of whose" -> "whose"

    "and parentheses": Delete.
        (I wouldn't have guessed that "axis identifier" includes the
        double-colon, but that's how it's used elsewhere. Note that
        XPath's AxisName does not include the double-colon.)

descendant::
    "the descendents axis" -> "the <CODE>descendant</CODE> axis"
        (Note two spelling changes in addition to the font change.)        

    "This is expected" -> "This axis is expected"

    "The XPathLocationPath":
        Insert space after "XPath", or delete "XPath".

ancestor::
    "typically used to obtain the parent node"
        Wouldn't one use the "parent" axis for that?        

------------
5. Xpointer extensions to XPath

title:
    "Xpointer" -> "XPointer"

1st para:
    Append period.

1st bullet:
    Append period.

3rd bullet:
    Put "here()" and "origin()" within <CODE>?

4th bullet:
    "predicate function" -> "boolean function"?

    Put "unique()" within <CODE>?

------------
5.1.1 Initialization of the context node

1st para:
    "document elements": Delete "s"

    Append period.

------------
5.1.5 Initialization of the namespace declarations

box:
    "Xpointer" -> "XPointer" (4 times).

------------
5.2 XPointer axes

1st para:
    "relative axes": Delete "relative".

------------
5.2.1 The range axis

1st para:
    "location source": This term is no longer defined.  You'll need to
        replace the sentence with something like:
        "For each node X in the result of the first argument, the second
        argument is evaluated in a context whose context node is X, and
        whose context node list is a singleton list consisting of X."
        Also, you should really say that the evaluation context of the
        first argument is that of the Range.

    What happens if the second argument selects (perhaps among other
    things) a location that is before the location selected by the first
    argument?  Is it an error, or does the Range simply not locate
    anything (for that particular result of the second argument)?

2nd para:
    "range::": Put within <CODE>.

    "It selects ...": Delete. (It's redundant.)

1st box:
    It would be helpful to point out that the `LocationPath' symbol
    comes from the XPath grammar.

    Not all of the example XPointers given so far match the
    GenlLocationPath production.  For instance

        id("MYNOTE")/ancestor::SEC

    is not a GenlLocationPath, because it's not a LocationPath, because

        id("MYNOTE")

    is not a Basis. Instead, it's a PrimaryExpr and a FilterExpr, and

        id("MYNOTE")/ancestor::SEC

    is a PathExpr.  You should probably change "LocationPath" to
    "PathExpr" (which includes LocationPath) in the GenlLocationPath
    production.

    And similarly in the Range production.

    You might want to change the name of the symbol "Range" to
    "RangeIdentifier" or "RangeLocator" or something like that,
    otherwise it becomes unclear, when speaking of "a range", whether
    one means the expression or its result. (And I think it should mean
    the latter.)

2nd box:
    This expression does not conform to the grammar, because 'range' is
    not the outermost construct.  Presumably you mean

        range::id("a23")/child[1],following-sibling[2]

    which conforms if you make the change from `LocationPath' to
    `PathExpr' suggested above.

------------
5.2.2 The string axis

    (If XPath *does* take on the functionality of this section, please 
    convey the following comments to the XPath WG.)

1st para:
    "point positions":
        This term is not defined or used elsewhere in the spec.

    "location source" -> "value [in the XPath sense] of the context node"
        Either do that throughout this section, or say something like:
            "In this section, we will use the term `location source' to
            mean `the value of the context node'."

2nd box:
    Why are only *these* productions numbered?

    "StringTerm":
        Something should say how `StringTerm' fits into the XPath
        grammar.  It appears to be another alternative for `Step', so
        perhaps it should be called `StringStep' instead.

    "SkipLit" + "Digit":
        It would be helpful to point out that these symbols come from
        the XML specification. Maybe you could give (non-normative)
        reminders of their definitions.

    "predicate" -> "Predicate" (if you're referring to the XPath symbol)

    "Digit":
        Note that the XML definition for `Digit' is a lot broader than
        just [0-9]. Thus, it is incongruous that the first digit in a
        Position or Length must be [1-9].

2nd para:
    "Matches ... are non-overlapping"
        Is this a requirement on matches to the SkipLit, or is it a 
        requirement on the results of the whole term?

        For instance, if the location source is "banana", then
            string::"an"
        locates two substrings:
            b[an]ana   and   ban[an]a
        They're non-overlapping, so there's no problem.  But consider
            string::"an",1,3
        This would seem to locate two substrings:
            b[ana]na   and   ban[ana]
        However, these overlap, so is the second one disallowed?

        Conversely, consider
            string::"ana",1,2
        The SkipLit-match yields the substrings
            b[ana]na   and   ban[ana]
        (which overlap) but the Length reduces the yield to
            b[an]ana   and   ban[an]a
        which *don't* overlap, so do both count?

        To me, "matches" suggests "matches against the SkipLit": when
        you add the modifications with respect to Position and Length,
        it doesn't feel like "matching" any more.  On the other hand,
        the blurb for "predicate" suggests that the whole SkipLit,
        Position,Length construct must yield non-overlapping strings.
        So it's unclear.

SkipLit:
    "null SkipLit string":
        I don't think XML defines this sense of "null".  You could maybe
        say "a SkipLit with no characters between the delimiting quotes".

    'contains the character string "Thomas"':
        For the example to work, it is not enough that the element
        contains (somewhere) "Thomas" -- it must *begin* with "Thomas".

    box:
        This example does not conform to the given grammar, because the
        Position (3) precedes the SkipLit ("").  (In fact, this is true
        of all the examples in this section!) This one should presumably
        be
            id("x37")/string::"",3

Position:
    "specified string" -> "candidate string" (for consistency)

    "reserved position value": Delete "reserved"?
        (It's unclear what it adds to the sentence.)

Length:
    "A length of zero":
        How can this happen, given that Length must begin with a
        non-zero?

predicate:
    This should have a capital "P" and be within a <CODE> element.

    "Selects among the resulting list":
        XPath says: "A predicate filters a list of nodes to produce a
        new list of nodes." The PredicateExpr is evaluated with respect
        to a context, including a context node and a context node list.
        Therefore, it's not immediately clear how a Predicate (in
        general) can be used to filter a list of substrings.  Does
        *anything* other than a position test have an obvious
        re-interpretation?

    "non-overlapping": delete? See discussion above.

    "occurrences of the specified string" -> "located strings"
        ("the specified string" is unclear. One easy interpretation
        would be "the SkipLit", but this would be wrong in general. The
        modifications of Position and Length yield strings that aren't
        really "occurrences" of anything.)

3rd para ("When the context node"):
    This para should be moved earlier in this section, because the
    content that the StringTerm operates on is an important part of its
    semantics.

    "PI" -> "processing instruction" (twice)

5th box:
    This example doesn't conform to the grammar. You could change it to
        string::"Thomas Pynchon",8,0,[position()=3]
    but only if you change the grammar to allow a Length of zero.

6th box:
    Doesn't conform to the grammar. Change to:
        string::'!',1,2,[position()=5]

6th para ("For purposes..."):
    Move this para earlier as well.

    "in the element(s) in" -> "in"
        (Otherwise, you exclude the character data in the context node
        itself.)

    "current location source" -> "context node"

7th box:
    Doesn't conform. Change to:
        string::'affect',1,6,[position()=1]
    or some prefix of that, depending on what you want.    

9th box:
    Doesn't conform. Change to:
        string::"hello   world" ...        

nth para ("No case-folding ..."):
    Presumably, there should be a space in "ThomasPynchon", otherwise
    there *would* be a match.

Editor's Note:
    "add" -> "adding"

------------
5.3.1 / (root)

    This section is covered by XPath, so it should be deleted or moved
    to Section 4, Summary of XPath.

1st para:
    "which is over it" -> "which contains it"

    "RelPath" -> "RelativeLocationPath"

------------
5.3.2 id

    This too is covered by XPath.

1st para:
    "containing resource" -> "document containing the context node"

4th para:
    "along" -> "alone"

-------------
5.3.3 here

1st para:
    Maybe here() should locate the text or attribute node that contains
    the XPointer, rather than the element that contains/bears that node.

    "reusable relative links when the links reside directly at one of
    their endpoints":
        This isn't really meaningful to someone who hasn't read the
        XLink spec.

------------
5.3.4 origin

1st para:
    The second occurrence of "origin()" should be within <CODE>.

------------
5.4 XPointer predicate functions

title:
    "predicate" -> "boolean"

1st para:
    "predicate" -> "boolean"

    "Unique()": Use lower-case "u". Put within <CODE>.

    "current context node list": Delete "current".

3rd para:
    "unique()": Put within <CODE>.

    "an XPath expression that counts the number of items in the context
    node list and compares it to 1":
        That is, "count()=1".  (So "unique()", as a shorthand, is only 1
        character shorter!)

------------
5.5 Locations That Are Not Simply Nodes

title:
    De-capitalize "That Are Not Simply Nodes".

1st para:
    "let the user select":
        Select what? We need a referent for "it" later in the sentence.    

    "it; perhaps" -> "it (perhaps"    

3rd para:
    "construct. axes":
        It appears that something is missing here.

------------
5.5.1 String axis semantics

    Maybe this discussion should be moved to section 5.2.2, "The string
    axis".

2nd para:
    "at the c below": Put "c" in quotes.

3rd para:
    "Even if characters are considered individual nodes, the EMPH node
    itself remains an issue: it is only *partly* included in the located
    data portion."
        Why does that make it an issue?  The (non-spanning!) XPointer
            child::CHAPTER/child::SECTION
        locates a set of nodes that is only *partly* included by any
        particular CHAPTER element. So what? Do the CHAPTER nodes
        "remain an issue"?

------------
5.5.2 Range axis semantics

    Maybe this discussion should be moved to section 5.2.1, "The range
    axis".

1st para:
    "chapter; and the" -> "chapter. The"

2nd para:
    "Like string, range": Put "string" and "range" within <CODE>.

    "Also like string": Ditto.

------------
5.5.3 Multiple spanning locations

    This also belongs in section 5.2.1, The range axis.

2nd para:
    "[The Range is] simply evaluated for each member of the context node
    list independently"
        But a Range must be the outermost construct, and as such its
        context node list is defined to be a singleton list.

3rd para:
    "span" -> "range"

    "element to" -> "elements to"

4th para:
    "Multiple locations ..."
        This sentence would be clearer as:
        "It is prohibited for the second argument of the range axis to
        produce multiple locations from a single result of the first
        argument, on grounds of simplicity."

    If an XPointer violates this prohibition, what kind of error does it
    have?

------------
5.6 Link Persistence

title:
    De-capitalize "Persistence".

    Is this section normative? It seems like it should be informative.

3rd para:
    "where relative axes are used" -> "in node tests"

------------
6. Conformance

2nd para:
    "syntactic requirements":
        append "for an XFragment"?        

------------
A. Glossary

1st para:
    "This appendix is normative."
        All of the terms in this glossary are (or should be) defined
        by other specs. Shouldn't *those* be normative and *these* be
        informative?

axis:
    "reserved name":
        It's not clear what this means.

    "Other axes define a sequence that does not depend on context, such
    as the    element with a particular ID, or the abolute root of the
    document."
        Delete. (This is talking about `id()' and `/', which are not
        axes.)

Context node list:
    De-capitalize "Context"

    "A context node list can be the final result of evaluating an
    XPointer":
        XPath doesn't support this usage of the term.        

element tree:
    This only use of this term in the spec is in the subsequent
    definition of "root", which could live without it.  Moreover, it is
    incongruous to call it an "element tree" when it includes many
    things that are not elements.

linking element:
    This term is not used in the spec.

locator:
    "primarily": Delete?
        (Is this spec secondarily concerned with anything?)

predicate:
    "given a sub-resource location" -> "given an evaluation context"

    "in documents" -> "in the context"

root:
    "an abstract node" -> "a node"
        (All nodes are abstract.)

    "The root() function (see below) gives direct access to this node.":
        Delete.

------------
B. References

Other specs put the name of the reference in italics. That would be
helpful.

XLINK:
    The editors have changed.

IETF RFC 1808:
    "/rfc 1808.txt": Delete space.

TEI:
    Add "(See http://www.uic.edu/orgs/tei/p3.)".

DOM:
    The `range' construct is defined in DOM Level 2. Also, the
    convention seems to be to put the author/org first. So change to:
        World Wide Web Consortium.
        Document Object Model (DOM) Level 2 Specification.
        W3C Working Draft, 1999.
        (See http://www.w3.org/TR/WD-DOM-Level-2.)

CHUM:
    "V&agrave;(c)ronis" -> "V&eacute;ronis" (I think)

No reference to the XML spec?

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Received on Saturday, 31 July 1999 16:11:56 UTC