Comments on XPointer (Last Call Working Draft)

Comments on:
XML Pointer Language (XPointer) Version 1.0
W3C Last Call Working Draft 8 January 2001

[Once again, I invoke the Hawaii time zone to get in under the deadline!]

In general, I think the spec would be ultimately clearer (especially to
those who would introduce a new scheme) if it were structured as two specs,
one that defined the general XPointer framework, and another that defined
two schemes ('xpointer' and 'xmlns') that fit into that framework.

At the very least, it would be less confusing if the spec used different
terms for the two things. (E.g., "XFragmentIdentifier" [XFragId] for the
general framework, and "XLocator" or [the existing] "XPtrExpr" for the
`xpointer'-specific expressions.)


Throughout, change
    "[IETF I-D XMT]"
to
    "[IETF RFC 3023]"
and delete the "Internet Draft" that occasionally precedes it.

(I use "->" as an abbreviation for "should be changed to".)

----------------------------------------------------------------------------
Status of this Document

-- 4th para

"This document is intended to be taken in conjunction with [x], in order for
that document to officially designate XPointer ..."
    Awkward construction. How about just:
    "It is intended that [x] officially designate XPointer ..."

Italicize "XML Media Types"?

"However, because of the timing problem associated with publishing two
related documents on separate tracks, currently that document ..."
    It's not so much the timing problems of publishing, but simply the fact
    that the XML Media Types document came into existence before the
    XPointer document reached Recommendation stage.
    Change to "Currently, that document ..."
    Maybe join it to the next sentence with "but".

"the Internet Draft"
    Change to just "it" ?

----------------------------------------------------------------------------
1 Introduction

-- 1st para

"IETF RFC 2376"
    Obsoleted by "IETF RFC 3023".

Section 5.2 (first bullet) makes it clear that "an application [may use]
XPointers in another context than as a URI reference's fragment identifier",
but it would be helpful if you said this in the introduction too.

-- 4th para, 3rd bullet:
"in URI references as fragment identifiers"
    -> "as fragment identifiers in URI references"

-- Note
"the basis of"
    Vague? Change to "the language for"?

"XPointer is intended ... only for [text/xml, etc]"
    No, not *only* those resources, as the rest of the note goes on to say.
    Maybe "primarily" or "initially", but not "only".

"a recent Internet Draft [...] suggests the use of a naming convention ..."
    -> "[IETF RFC 3023] encourages the use of the suffix "+xml" ..."

----------------------------------------------------------------------------
1.1 Origin and Goals

pass

----------------------------------------------------------------------------
1.2 Notation and Document Conventions

-- last para

"[XPath] is the normative version"
    "version" -> "reference" ? "specification" ?

----------------------------------------------------------------------------
2 XPointer Terms and Concepts

Definition: range
    "end points" -> "points"
    (The rest of the spec is consistent in using "end point" only in contra-
    distinction to "start point", and not as a catch-all for both.)

Definition: location
    "range" -> "ranges"

Definition: location-set
    "ordered" -> "unordered"
    (As Andrew Watt has pointed out.)

Definition: singleton
    "A point is always a singleton. A range is always a singleton as well,"
    No, this is a confusion of categories. A point or range is a *location*,
    not a *location-set*. Thus, it is meaningless to speak of a point or
    range being a singleton or not.

    Maybe this definition should be deleted, given that the only use of the
    term outside this definition is, in fact, a mis-use. (See 5.3.2.)

Definition: sub-resource
    "is an XML document" -> "may be an XML document"
    (Since it might instead be an external parsed entity.)

----------------------------------------------------------------------------
3 XPointer Processing and Conformance
3.1 Processing Dependencies
3.2 String Conformance

pass

----------------------------------------------------------------------------
3.3 Application Conformance

I think it would be useful for the definition of application conformance and
classes of errors, to define an "XPointer-processor" abstraction. The
definition would go something like this:
    Given a resource (purportedly an XML document or external parsed entity)
    and a string (purportedly an XPointer), an XPointer-processor yields the
    subresource (location-set) identified by the XPointer or else an error.

Then, for instance, instead of phrasing such as
    "Minimal conformance entails ..."
(which avoids even a referent for the thing whose conformance is in
question), one could say
    "An XPointer-processor is minimally conforming if it ..."

-- Definition: Minimal conformance

"series of XPointer parts" -> "FullXPtrs"

"routing XPointer parts with particular schemes to an application that
can evaluate those parts"
    This wording seems fairly implementation-specific. From the point of
    view of someone judging the conformance of XPointer-handling software,
    whether that software routes parts to other applications is immaterial.

The placement of the closing bracket is arbitrary. Move to end of sentence.

So, as long as it handles bare names, an XPointer-processor that yields a
sub-resource error for every FullXPtr is minimally conforming, since it can
claim not to understand any schemes.  In which case, why not just say that
minimal conformance only entails interpretation of bare names?

-- last para

"for the convenience of XML-based Internet media types"
    Maybe insert "other" before "XML-based".

----------------------------------------------------------------------------
3.4 Classes of XPointer Errors

-- Definition: resource error

"If a syntactically correct XPointer ... is appended to a URI that
identifies no resource, ... the XPointer has a resource error."
    This is odd. If a URI identifies no resource, then there is no media
    type, and thus no fragment identifier language (FIL). So it's a moot
    point whether the fragment identifier is an XPointer, and "even mooter"
    whether the fragment identifier is erroneous. The XPointer-processor
    would presumably not even be invoked in such a case.

As for "a URI that identifies ... a resource that is not well-formed XML",
    this includes resources of most media types (e.g., image/jpeg,
    audio/mid, text/plain), and these do not use XPointer as their FIL.
    Thus, again, it is incorrect to think of the fragment identifier as an
    XPointer.

    In fact, even if the resource *is* well-formed XML, it doesn't
    necessarily have a media type that uses XPointer as its FIL. So it
    *still* might be incorrect to think of the fragment identifier as an
    XPointer.

Moreover, the definition suggests that whether an XPointer has a resource
    error is determined when it is appended to a URI, but this is presumably
    not what was intended.  Whether a URI identifies a resource, the content
    of that resource, and the media type of that resource, all vary over
    time. Thus, whether an XPointer has a resource error would depend on
    when the URI is resolved.

I think all of these problems would be solved by rephrasing the errors in
    terms of an XPointer-processor abstraction:

    If the string passed to the processor does not match the syntax 
    specified in this document, the processor yields a syntax error.

    Otherwise, if the resource passed to the processor is not a well-formed
    XML document entity or external parsed entity, the processor yields a
    resource error.

    Otherwise, the processor attempts to evaluate the XPointer with respect
    to the resource as described in section 4. If the evaluation fails as
    discussed in 4.3 Schemes, the processor yields a sub-resource error.

Note that this doesn't mention URIs at all, which is as it should be, since
    "resource error" is a meaningful concept whether the XPointer is the
    fragment identifier of a URI or not.

----------------------------------------------------------------------------
4 XPointer Model and Language

-- 2nd para
'such as "footnote"; or select'
    Insert "one can" before "select".

----------------------------------------------------------------------------
4.1 Character Escaping

-- 1st para
"The XPointer language is designed to be used in the context of URI
references, ..."
    Once again (as in the introduction), it would be good to say that
    XPointers can be used in other contexts as well.
    "is designed to be" -> "may be" ?

"XPointers (in URI references) also often appear ..."
    Insert "possibly" before "in".

"Finally"
    Presumably you only mean "finally in this list of contexts", but it
    suggests "finally in the order of applying the escaping".
    Change to "Moreover"?

----------------------------------------------------------------------------
4.1.1 Escaping Contexts

You should point out that an XPointer can occur in many other contexts
(a string-literal in a Java program, a comment in a LaTeX document, etc.)
and these will generally have their own escaping requirements.  The ones
outlined in this section are just the most common.

This spec is normative only for context A. In all other cases, the
specification that governs the context is the normative reference for the
necessary escaping.

Also, an XPointer may appear in many contexts at once (e.g., an XPointer in
a URI reference in an XML document in a Java string-literal), but these are
always nested in a particular order, and it is the nesting order that
determines the order in which the escaping and unescaping is done.  (The
most deeply-nested context does the first escaping and the last unescaping.)

"The following contexts"
    This suggests that each of the four subsections considers a separate
    context, but they do not. See my comments under D.

-- A

"When an XPointer is created that addresses into an XML resource"
    This is redundant. An XPointer *can* only address into an XML resource.
    Unless you mean to exclude XPointers that don't address into anything,
    because of resource errors. But that would be silly, because you'd still
    need the escaping. I suggest changing it to:
        "When an XPtrExpr [or XPath Expr] occurs in an XPtrPart"
    because that reflects the syntactic level at which this escaping becomes
    necessary.

"Thus, occurrences of the circumflex..."
    This sentence is mostly redundant given the previous sentences, and
    completely redundant given the referenced section.

-- B

"Escaping of reserved URI characters"
    This heading is not very appropriate, since the section only considers
    the escaping of one URI-reserved character, the percent sign.

[IURI] is an expired draft. Is it appropriate to refer to it?

-- C

"If an XPointer appears in a URI reference or IURI reference in an XML
document"
    Delete "in a URI reference or IURI reference": it is immaterial to the
    need for escaping.

    You may wish to change it to "in the character data of an XML document",
    since this escaping wouldn't be necessary if an XPointer appeared in,
    for instance, a comment in an XML document. (Although in that context,
    you *would* need to somehow escape any occurrences of "--".)

"must be escaped as character references"
    Or as entity references (e.g., ")
    Or you could embed them in CDATA sections.

"This escaping is removed when the XML document is parsed."
    "removed" sounds like it's removed from the file containing the XML doc.
    Maybe change to "undone" or "reversed".

-- D

"an XPointer (perhaps originating in an XML document)"
    What does "originating" mean here? If the XPointer was "originally" in
    an XML document, where is it "now"? Do you just mean "(perhaps in an XML
    document)"?

I dislike the division between B and D. Section B considers two contexts,
    URI references and IURI references, but only gives the escaping that is
    common to the two.  To get the rest of the escaping for the URI
    reference context, the user must also consult section D. Moreover, the
    escaping described there supposedly occurs when an IURI reference is
    converted to a URI reference, which casts the process in terms of IURI
    references, which is unnecessary.  (If the user doesn't understand what
    an IURI is, or knows that the situation does not allow them, s/he
    shouldn't have to think in terms of IURIs.)

    Instead, I suggest you rework these two sections so that one considers
    just IURIs, and the other just URIs.

    Also, because they are related, I would make them adjacent sections.
    (Insert D between B and C.)

"Square brackets ..."
    Move to 4.1.2.

-- last para

"in the reverse order"
    The reverse of what? You haven't specified the "forward" order. (But
    I've given some wording above that does.)

"If the result [of undoing the encodings and escapings] does not conform to
the syntactic rules for XPointers in this specification, a syntax error
results."
    But if you undo escaping A, the result may have unescaped unbalanced
    parentheses, which are not permitted by the "Parenthesis escaping" VC.
    And if you think that "the syntactic rules" does not include VCs, note
    that the bare EBNF disallows unbalanced parentheses, escaped or not.
    So either way, a syntax error would supposedly result, which is not what
    you want.
    I suppose you could say "If the result, before undoing escaping A, ..."
    but that's pretty ugly. If you adopt the idea of an XPointer-processor,
    you could say something like (very roughly):
        The XPointer-processor handles undoing A. All other escaping is
        undone (in reverse order of application) outside the processor.
        If the result (passed to the processor) does not conform ...
        the processor yields a syntax error.


----------------------------------------------------------------------------
4.1.2 URI Reference Encoding and Escaping

Various occurrences of "byte(s)":
    Change to "octet(s)", I think.

----------------------------------------------------------------------------
4.1.3 Examples of Escaping

-- 1st para
"and spaces that is"
    Awkward. Put comma after "spaces"?
    Put "containing ... spaces" in parentheses?

"appear in an XML document"
    Insert "in a URI reference" after "appear".

-- Initial
"The desired XPointer ..."
    Maybe change "XPointer" to "XPtrExpr", and remove "xpointer(" and ")"
    from the example text. (See my suggested rewording under 4.1.1 A.)

-- A
    Delete "doc.xml#", since it's part of the (I)URI reference, which
    doesn't enter into it yet. (Note that the second example doesn't have
    "doc.xml#" at stage A.)

The example is a little muddled. The string at C shows the escaping for an
    IURI reference in an XML document, whereas the string at D shows the
    escaping for a URI reference, possibly in an XML document. If D is the
    final form in this example, the string at C is irrelevant. On the other
    hand, it could be that C is the final form and D is irrelevant. I think
    you should pick one.

Ditto for the second example.

----------------------------------------------------------------------------
4.2 Forms of XPointer

-- 1st para
"as if the full form of the XPointer were specified"
    "specified" -> "given"

-- 3rd para
"Any XPointer whose evaluation returns anything other than a non-empty
location-set must signal a sub-resource error"
    This makes sense for an XPtrPart whose scheme is "xpointer", but what
    if some other scheme adds 'thingies' to the data model, and has
    SchemeSpecificExprs that yield them?

    Anyway, this sentence doesn't belong here.
    Move it to the 4th para of 4.3?

-- production [4]
    In 4.3 Schemes, you allow for the possibility that a future XML-based
    media type might choose to adopt the XPointer scheme mechanism, but
    define its own scheme instead of the "xpointer" scheme.  Consider the
    specification for the fragment identifier language for that media type:
    in defining the syntax, would it have to say something like "ignore
    the first two alternatives for XPtrPart"?

    It might be more convenient for future media types if XPointer said just

        [4] XPtrPart ::= Scheme '(' SchemeSpecificExpr ')'

    and then defined `xpointer' and `xmlns' as particular schemes, in the
    same way that future schemes will presumably have to be defined.

-- Validity constraint: Parenthesis escaping

"XPointer part"
    -> "<code>XPtrPart</code>" ?

"even within literals"
    "even" -> "typically"
    "literals" -> "a literal"

"escaped with a circumflex (^) character preceding it"
    -> "escaped by preceding it with a circumflex (^) character"

"Any other use of a circumflex results in a syntax error."
    So escaping *balanced* parentheses is invalid?
    (Software that generates xpointers might find it easier to escape *all*
    parentheses rather than figure out which ones are unbalanced.)

-- Validity constraint: Namespace Name

"XPtrNsURI", "Name", "S", "Char", "NCName", "Expr"
    Put in "<code>...</code>".

----------------------------------------------------------------------------
4.2.1 Full XPointers

-- 1st para
"one or more [Definition: ..."
    This is awkward. I suggest deleting the whole sentence, as it just
    repeats in prose what the EBNF already says more succinctly.

"(except for nodes representing CDATA sections and entities)"
    Delete. No such nodes exist in the XPath data model.

"and access to arbitrary non-node locations"
    If you mean "locations" in its XPointer sense, then this is
    tautologically true, but if it's read with a more casual sense, then
    "arbitrary" is misleading.
    Maybe delete the phrase, and change "nodes" in the previous clause to
    "nodes and non-node locations".

----------------------------------------------------------------------------
4.2.2 Bare Names

-- 1st para
"a location step using the id() function"
    Syntactically, "id()" isn't a location step (i.e., Step), it's a
    FilterExpr.

-- 2nd bullet
"that use a markup language similar to that of HTML"
    "markup language" -> "vocabulary" ?

    But is this phrase necessary at all? I mean, even for XML resources that
    *don't* use a HTML-like vocabulary, bare names *still* provide an analog
    of HTML fragment identifier behavior.

----------------------------------------------------------------------------
4.2.3 Child Sequences

-- 1st para
"The first integer in the sequence refers to"
    "refers to" -> "locates", to match the verb used earlier in the para.

    The sentence as a whole is a bit too abbreviated. I suggest:
    "If the resource [passed to the Xpointer-processor] is a document, the
    first integer in the sequence will be 1, and locates the document
    element; if the resource is an external parsed entity, the first
    integer locates one of the top-level elements."
    Really, the sentence is unnecessary, since the semantics are defined by
    the "*[n]" equivalence, but it's helpful.

----------------------------------------------------------------------------
4.3 Schemes

-- 1st para

"XPtrPart", "Scheme", "XPtrPart".
    Put in "<code>...</code>".

"reserves all others"
    It's not clear what this means.

    Is any other scheme a syntax error? A resource error? Or is the
    XPointer-processor required to treat it as an unknown scheme (and thus
    fail that part)? Or something else?

    What about when the XPointer is used in a context other than as the
    fragment identifier of a URI reference? Then there isn't necessarily a
    media type involved.

-- 3rd para
"XPtrParts"
    Put in "<code>...</code>".

-- 1st bullet
"An unknown scheme"
    -> "The part's scheme is unknown."

-- 2nd bullet
"A scheme that is not applicable ..."
    -> "The part's scheme is not applicable ..."

"the media type of the resource"
    What if there isn't a media type involved?

-- 3rd bullet
"A scheme that does not locate ..."
    -> "The part does not locate ..."

-- 4th bullet
"If the scheme being interpreted is xpointer:"
    -> "If the part's scheme is xpointer:"

    Minimal conformance does not require interpretation of the 'xpointer'
    scheme, so this bullet does not belong here.

    Moreover, I disagree that these conditions should be grounds for a part
    to fail. (But I'll discuss that later.)

    Lastly, what does it mean for a point to be "of type attribute or
    namespace"?

-- 4th para
"consume a failed XPointer part"
    "consume" -> "skip"?
    "XPointer part" -> "XPtrPart"

"the first XPointer part"
    "XPointer part" -> "XPtrPart"

    (This is about the point where you'd want to bring in the sentence from
    4.2: "Any XPointer whose evaluation returns anything other than ...")

"the result for the XPointer as a whole has a sub-resource error"
    Delete "the result for".

"If a syntax error is detected in any part or in the construction of the
XPointer as a whole, evaluation stops and additional parts are not consumed"
    If there were a syntax error in the construction of the XPointer as a
    whole, then according to section 3.4, the application should not have
    attempted to evaluate it in the first place. (But apparently it did,
    since it must now stop evaluation.)

    Presumably, the processor is not required to detect syntax errors in the
    SchemeSpecificExpr of any part whose Scheme it doesn't understand.

    Say there is a syntax error in the SchemeSpecificExpr of a part whose
    Scheme the processor *does* understand, but that part occurs after
    another part that would succeed if evaluated. e.g., something like:

        xpointer(/)4Dgrafix-xml("unterminated literal)

    Is the processor required to detect the later syntax error (and thus
    stop with an error) before evaluating the first part, or is it required
    to detect scheme-specific syntax errors only if and when it gets to the
    part that contains the error?  (In a "routing" implementation, you
    probably want the latter.)

----------------------------------------------------------------------------
5 XPointer Extensions to XPath

Really, these extensions are specific to the 'xpointer' scheme, not the
general XPointer framework. In this section (i.e., all the 5.* sections),
most occurrences of "XPointer" (other than to refer to the specification)
should probably be changed to "XPtrExpr".

-- 1st bullet
"locations, location types, and location-sets, which subsume nodes, points,
and ranges"
    Put "which subsume ..." in parentheses and insert after "locations".

-- 2nd bullet
    This is kind of redundant given the first bullet. Maybe swap the two?

-- 4th bullet
"string-range and range-to"
    Maybe reverse the order, as that's how they appear in 5.4.

-- 6th bullet
    This should probably go after the fourth bullet, since they're both
    about ranges.

    Also, you missed the range and range-inside functions.

----------------------------------------------------------------------------
5.1 XPointer Additions to XPath Terms and Concepts

pass

----------------------------------------------------------------------------
5.2 Evaluation Context Initialization

-- 1st para
"An XPointer is evaluated to yield an object of type location-set."
    Well, the evaluation could yield another XPath object (boolean, number,
    or string), but 4.2 says that's an error.

    As I say in 4.2, this is reasonable for the 'xpointer' scheme, but not
    in general. As I suggest in 5, maybe change "XPointer" to "XPtrExpr".

"context identical ... except for the generalization of nodes to locations"
    Actually, the context must also contain properties for the locations
    that the origin() and here() functions return.

-- 1st bullet
"an XML document", "the document", "applicable document"
    Append "or external parsed entity".

-- 5th bullet
"Only functions defined in XPath or XPointer can be used in XPointers."
    Future schemes will almost certainly define their own functions, so this
    is another obvious case where "XPointers" should be changed to
    "XPtrExprs".

    (The 4th bullet can be similarly criticized, though perhaps less
    plausibly.)

----------------------------------------------------------------------------
5.2.1 Namespace Initialization

-- 1st para
    "XPointer part" -> "XPtrPart"
    "XPointer parts" -> "XPtrParts"
    "NCName", "XPtrNsURI": Put in "<code>...</code>".

-- 3rd para
"The evaluation of the following XPointer appearing in a non-XML document
(or in an XML document with no declaration of the namespace prefix x) will
result in a sub-resource error"
    This suggests that if the XPointer appeared in an XML document that
    *did* have a declaration of the namespace prefix x, it *wouldn't* result
    in a sub-resource error. I think this is incorrect, since the first para
    doesn't mention anything about namespace declarations leaking from the
    containing document into the XPointer evaluation context.
    So delete "appearing ... prefix x)".

----------------------------------------------------------------------------
5.3 The point and range Location Types

-- Note
"DOM Level 2, which is based on UTF-16 units"
    It would be more accurate to say that DOMStrings encode UCS characters
    using UTF-16, and the DOM indexes into them using 16-bit units. Thus,
    one UCS character results in one or two 16-bit units. 

"XPath and XPointer are based on UCS characters"
    In effect, implementations must behave as if they encode UCS characters
    using UTF-32 (UCS-4).

"a sequence which in DOM counts as two characters might count in XPointer
as one character"
    This is a misuse of the term "character", but I'm not sure what the
    proper term is. Something like "string indexing unit"?

----------------------------------------------------------------------------
5.3.1 Definition of Point Location

-- 1st para

"A location of type point"
    Is "point" not in bold because there's already a definition for it in
    section 2?  That would be a shame, because this is the real definition.

"preceding any individual character"
    Insert "or following" after "preceding".

"The self axis of a point is the point itself."
    "is" -> "contains" (XPath terminology)

"The parent axis of a point is a location set containing a single location"
    "is a location set containing" -> "contains"

    But these two sentences don't really belong here (the content of its
    axes are not the defining properties of a point), and anyway, they're
    redundant given the 2nd and 3rd bullets. Delete them.

-- 6th para

"(such as text nodes, comments, and processing instructions)"
    Change "such as" to "i.e.," and insert "attribute nodes" and "namespace
    nodes". (Why not be complete?)

-- 1st bullet
    Are "preceding" and "following" empty too?

-- 2nd bullet
    Presumably the "descendant-or-self" axis also contains the point itself.

-- 4th bullet
    Does "ancestor-or-self" contain the point itself and the contents of the
    "ancestor" axis?

-- 5th bullet
"A node-point's siblings are the children of the container node that are
before or after the node-point."
    But the first bullet says that the *-sibling axes are empty.

----------------------------------------------------------------------------
5.3.2 Definition of Range Location

-- 1st para
"A location of type range"
    Again, if "range" isn't bold because of the definition in 2, I think
    this is a better definition.

"in the same document"
    If the target resource is an external parsed entity, there isn't a
    document.

-- 3rd para
"a range ... (encompassing both nodes but still a singleton)"
    The word "singleton" is mis-used here. A range is not a location-set,
    therefore it is meaningless to say it is a singleton or not.
    "singleton" -> "single location"

-- 5th para
"If the points"
    Insert "start and end" before "points".

    On second thought, this sentence is subsumed by the last sentence of the
    para, so replace the whole para with this:

        The string-value of a range consists of the characters that are in
        text nodes and that are between the start and end points of the
        range.

-- 6th para
"the parent axis of a range returns ..."
    "returns" -> "contains"

    You might want to add the weirder example that a range's self axis
    contains the range's start point, and not the range itself, thus
    falsifying the (reasonable) assumption that a location's self axis
    contains the location itself.

    I think you might be better off to decide that a range's self axis
    (and its *-or-self axes) contain the range itself, and all the other
    axes are empty. That way, you don't get snookered by the arbitariness
    of picking the start point over the end point. And you don't lose
    any capability, because you can always use "start-point" explicitly.
    For instance, if "r" selects (a location-set containing) a range, then
        r/parent::x
    (under the current semantics) would be replaced by
        start-point(r)/parent::x
    (under my semantics).

-- 7th para (Note)
"with respect to the respective boundaries"
    Clank. Delete "respective".

----------------------------------------------------------------------------
5.3.3 Covering Ranges for All Location Types

-- 4th bullet
    Insert this after the first bullet, since it's simple and deals with an
    XPointer-specific location type, like the first bullet.

----------------------------------------------------------------------------
5.3.4 Tests for point and range Locations

-- 1st para
    "NodeType": Put in "<code>...</code>".

-- production
    "[11]" -> "[38]"

-- 2nd para
    "all three types"
        "three" -> "six" (or just delete it, or say "of many types")

----------------------------------------------------------------------------
5.3.5 Document Order

-- 2nd para
"there is an immediately preceding node"
    Put "immediately preceding node" in bold. It's a definition.

"(except that there is no point defined preceding or following the root)"
    This appears to be completely irrelevant. Delete.

-- 1st bullet
"the immediately preceding node is the node immediately preceding the point"
    This is circular.

    The rest of the bullet is verbose. Replace it with:
        "For a node-point with a non-zero index n, the immediately preceding
        node is the nth child of the node-point's container-node."

-- 2nd bullet
"the container node is also the immediately preceding node"
    For consistency of wording, change to:
    "the immediately preceding node is the node-point's container node"

"the last of those [attribute or namespace] nodes"
    Append:
        "in document order. (Note that this is implementation-dependent.)"

-- 3rd bullet
"For a character-point the"
    Insert a comma after "character-point".

-- 3rd para
"For any point, the immediately following node"
    Put "immediately following node" in bold.

    Actually, this term isn't used anywhere. Delete the para.
    (Which is just as well, because I don't think "immediately after" is
    defined anywhere.)

The rest of the section is muddled. Look at it this way: you have to define
an ordering for every combination of node, point, and range:
    {node, node}   - defined by XPath
    {node, point}  - 4th para, using terms "before" and "after"
    {node, range}  - 5th para, using the term "relative order"
    {point, point} - 6th para, using the term "document order"
    {point, range} - missed
    {range, range} - 7th para, using the term "before"
So you missed the combination of point and range, and you used lots of
different terms to do it. XPath mostly just defines the "before" relation.

-- 6th para
"Document order for a point"
    "a point" -> "two points"

"If the immediately preceding nodes of the two points are the same, then
either the points are the same or they are both character-points with the
same container node"
    This is not true. For instance, in "<e>foo</e>", consider:
    (1) the node-point whose container node is the <e> element node, and
        whose index is 1 (the point after the text node); and
    (2) any character-point whose container node is the text node.
    For both points, the immediately preceding node is the text node, but
    they are not the same point, nor are they character-points with the same
    container node.

-- 7th para
This relies on identifying "the one range" in the bullets with "one range
location" in the first line, and "the other range" in the bullets with
"another range location" in the first line. Which I suppose isn't
unreasonable, but wouldn't it be clearer if you called them A and B, say?

----------------------------------------------------------------------------
5.4 XPointer Functions

For consistency with XPath, in every function prototype, remove the space
before the closing parenthesis.

----------------------------------------------------------------------------
5.4.1 range-to Function

-- 1st para
"For each location in the context,"
    This is misleading; there *is* only one location in the context. Delete.
    (You could say "For the location in the context" or "Given the location
    in the context" or just "Given the context", but they're all implied by
    the definition of evaluation context -- no other XPointer/XPath function
    is defined with such a phrase.)

"The start of the range"
    "start" -> "start point"

"the start-point of the context location"
    Insert "(as determined by the start-point function)" after "start-point"

"the end of the range"
    "end" -> "end point"

"the end-point of the location"
    Insert "(as determined by the end-point function)" after "end-point".

"the location found by evaluating the expression argument with respect to
that context location"
    It's not the function's job to evaluate the expression(s) that appear in
    a call to the function. XPath semantics say that the arguments are
    evaluated before the function is called, and the results are passed to
    the function.
    Change to:
        "the location that is the only member of the location-set argument".
    You could add:
        "(Note that [in accordance with XPath semantics] the Argument in the
        FunctionCall will have been evaluated with respect to the same
        context as the FunctionCall itself.)"

There should be a sentence saying that it's some kind of error if the
location-set argument doesn't contain a single location (isn't a singleton).

-- 4th para
"for each of the nodes in the current node list"
    "the current node list" is an undefined term. If we allow that readers
    will understand what you mean, you should still change "nodes" to
    "locations" and "node list" to "location list".

-- 5th para
"start-point for the element", "end-point for the element"
    "for" -> "of"

----------------------------------------------------------------------------
5.4.2 string-range() Function

-- prototype
"location-set ,"
    Delete blank before comma.

-- 1st para
"For each location in the location-set argument, string-range returns a set
of string ranges"
    Italicize "location-set".

    And presumably, the function returns (a location-set that is) the union
    of all those sets. (Note that the sets are not necessarily disjoint.)

"a set of [Definition: string ranges ..."
    Awkward to start a definition in the middle of a sentence.

    Also, it's not a very good definition.
    You could say: "A string range is a range whose start point and end
    point are both character-points."

    On the other hand, why bother? Just say "a set of ranges".

-- 3rd para

"If the string argument is not found in the string-value of the location,
... the XPointer part in which the function appears fails."
    I'm pretty sure you don't want this. For instance, consider
        string-range(//title, "Thomas Pynchon")
    If *any* <title> element doesn't contain "Thomas Pynchon", the string
    argument will not be found in the string-value of that location, and so
    (according to the above rule) the part will fail.

----

"If the third or fourth argument indicates a position that is beyond the
beginning or end of the document, the XPointer part in which the function
appears fails."
    Why? This seems unnecessarily harsh to me. Consider this situation:

    Some on-line document has a sentence that I'd like to refer to.
    It begins "There comes a time" and goes on for 160 characters.
    So I check that that string doesn't occur elsewhere in the document,
    and use:
        xpointer(string-range(/,"There comes a time",1,160))
    Later, however, the document is modified. The sentence I refer to
    is not changed at all, but the author adds another sentence beginning
    the same way. Chances are, my xpointer will now select two ranges of
    the document, which is tolerable. But if the author happens to start
    the new sentence less than 160 characters from the (new) end of the
    document, then (according to the rule above) the whole XPtrPart fails,
    and the XPointer has a sub-resource error.

Instead of failing the whole XPtrPart, two gentler reactions would be:

(1) "Clamp" any outside-document positions to the start or end of the
document, as appropriate. (In my example situation, the xpointer would
select two ranges, regardless of where the new sentence was added.)

(2) Simply disregard any matches that result in outside-document positions.
(In my example, the xpointer would select two or one ranges, depending on
where the new sentence was added.) 

----

What happens if the third or fourth arguments indicate a position that is
within the document, but outside the string-value of the location?  For
example, with this as the document:

    <doc>Thomas <em>Pyn</em>chon</doc>

and this as the xpointer:

    string-range(/doc/em, "P", 1, 7)

Does it select "Pynchon", "Pyn", or nothing?

----

And you'll have to generalize "document" to "document or external parsed
entity, as appropriate".

-- 4th para
"The points of the range-locations"
    "points" -> "start and end points"
    "range-locations": delete hyphen.

-- last para
"locate ranges in elements"
    "elements" -> "the string-values of elements"

----------------------------------------------------------------------------
5.4.3 Additional Range-Related Functions
5.4.3.1 range Function

pass

----------------------------------------------------------------------------
5.4.3.2 range-inside Function

-- 1st para
"If x is a range location or a point, the x is added to the result..."
    But if x is a point, then you'd be adding a point to the result, and you
    just said that the function returns ranges.  Instead, you presumably
    want to add the collapsed range at that point.

"If x is not a range location"
    -> "If x is a node"

"and otherwise is"
    Insert "it" before "is".

----------------------------------------------------------------------------
5.4.3.3 start-point Function

-- 1st para
"of type point"
    Put "point" in "<code>...</code>"?

"to the result location-set"
    "result" -> "resulting"

-- first 3 bullets
"the start point is"
    Change to "the resulting point is", for consistency with end-point()?

-- 4th bullet
"If x is of type attribute or namespace, the XPointer part in which the
function appears fails."
    I'm mystified: why is it so wrong to ask for the start-point (or
    end-point) of an attribute or namespace location? Why can't these
    functions treat such locations just like text, comment, and
    processing instruction locations? That's what range-inside does.
    In fact, if someone really wanted to write
        start-point(@foo)
    they could get around start-point's dislike of attribute locations just
    by writing
        start-point(range-inside(@foo))
    If the latter expression isn't erroneous, why is the former?

    In fact, it seems to me that:
        start-point(range-inside(L)) = start-point(L) for all locations L
    would be a useful identity. (Ditto end-point.) Not that you'd have to
    say so explicitly; but you could, for instance, define range-inside(L)
    as the range from start-point(L) to end-point(L), roughly speaking.

----------------------------------------------------------------------------
5.4.3.4 end-point Function

Ditto the comments for start-point.

----------------------------------------------------------------------------
5.4.4 here Function

-- 1st para
"possibilties"
    -> "possibilities"

You need to say that an invocation of the here() function is only meaningful
if the containing XPointer appears in a XML document (or external parsed
entity?), because the rest of the section seems to just assume that it does
(except for the very last sentence in the section).

-- 4th para
"If the resource in which the XPointer appears is not XML"
    This phrasing assumes that the XPointer appears in a resource, which is
    not necessarily the case. Better to say:
    "If the XPointer is not located in an XML resource"

----------------------------------------------------------------------------
5.4.5 origin Function

-- 1st para
"traversal of the link"
    There isn't really a referent for "the link".
    I suggest you swap the 3rd and 4th sentences, and change "from an XML
    document" to "from a link in an XML document".

-- 2nd para
"a containing resource"
    Delete "containing".

----------------------------------------------------------------------------
5.5 Root Node Children

It isn't just a change to the children that the root node can have -- it's
the very fact that an external parsed entity even *has* a data model.

----------------------------------------------------------------------------
A References
A.1 Normative References

-- IETF RFC 2376
    Change to 3023.

----------------------------------------------------------------------------
A.2 Non-Normative References

-- IETF I-D XMT
    Change to RFC 3023, move to Normative.

----------------------------------------------------------------------------

B Working Group Members (Non-Normative)

pass
----------------------------------------------------------------------------

-Michael Dyck

Received on Tuesday, 30 January 2001 04:58:32 UTC