Comments on the XPointer Framework CR from Michael Dyck on 2002-12-12 (www-xml-linking-comments@w3.org from October to December 2002)

From: Michael Dyck <jmdyck@ibiblio.org>
Date: Wed, 11 Dec 2002 22:18:46 -0800
To: www-xml-linking-comments@w3.org
Message-id: <3DF82A46.3698D596@ibiblio.org>
XPointer Framework
W3C Proposed Recommendation 13 November 2002

------------------------------------------------------------------------
Status of this Document

para 1
"XML Mime types"
    Change "Mime" to "MIME".

------------------------------------------------------------------------
1.2 Terminology

[Definition: application]
"The occurrence and usage of XPointers, and the behavior to be applied
to resources and subresources obtained by processing those XPointers"
    Change "XPointers" to "pointers" twice.

    Delete "resources and"? Everything else indicates that you obtain
    subresources, not resources, from pointers.

[Definition: namespace binding context]
"A binding of ... prefixes to ... names."
    Change "binding" to "mapping". A "binding" refers to a single
    (prefix,name) pair, not the whole collection. (At least, that's how
    you use it later.)

"XML-namespace-defined [XML-Names] namespace prefixes"
    This is a fairly awkward phrase. How about changing the sentence to:
        "A mapping of namespace prefixes to namespace names, as defined
        in [XML-Names]."

------------------------------------------------------------------------
2 Conformance

para 3
"the information items and properties tabulated below may be relevant."
    Perhaps change "may be relevant." to:
        "are relevant to the evaluation of shorthand pointers.
        (Individual XPointer schemes will have different infoset
        requirements.)"

------------------------------------------------------------------------
3.1 Syntax

XPointer Framework Syntax
    You've gone from using an initial lowercase letter for all symbols
    (in the July 10th draft), to using an initial capital for all
    symbols. If you want to follow the XML spec's convention, the
    following symbols should not have an initial capital:
        pointer
        schemeBased
        pointerPart
        schemeData
        escapedData

------------------------------------------------------------------------
3.2 Shorthand Pointer

para 1
    "has a matching NCName as an identifier"
        Perhaps just "has a matching identifier".

"identifier" / "identified by"
    You use both the noun "identifier" and the (passive) verb "is
    identified by", which reduces clarity, I think. Moreover, that same
    verb is used elsewhere in the spec with a different meaning, which
    might lead to confusion.  I think you should try to eliminate this
    use of the verb.  The 4 points could be rephrased:
        If an element information item has [thing1], then it has
        the value of [thing2] as an identifier.
    or you could introduce a "variable" to refer to the element:
        The identifiers of an element information item (E)
        are determined as follows:
          - If E has [thing1], then the value of [thing2]
            is an identifier of E.
          - ...

point 4
    In any event, point 4 should be made parallel with the other points:
        If an element information item has an externally-determined ID,
        then it is identified by the value of that ID.
        [... it has the value of that ID as an identifier.]
        [... the value of that ID is an identifier of E.]

para after point 4:
"is identified by"
    Change to "has an identifier that matches".

Note 1
"might be identified by multiple values"
    Change to "might have multiple identifiers".

Note 2
"and the creator of a pointer can, instead of a shorthand pointer, use a
scheme-based pointer or provide one or more schemes that address the
desired element in other ways."
    Change "and the creator" to "or the creator".
    Change "or provide" to "and provide".

Note 3:
"the value which identified an element information item is unique within
the document"
    Change to "the identifiers of an element information item are unique
    within the document", or simply "multiple element information items
    within a document have the same identifier".

"not affected ... because ..."
    I think this note would be clearer if you said something like:
        Within a document, multiple element information items can have
        the same identifier (XML and XMLSchema allow it), but the
        semantics of a shorthand pointer are unaffected by this, because
        it always picks the first in document order.

------------------------------------------------------------------------
3.3 Scheme-Based Pointer

para 2
"If the XPointer processor does not support the scheme used in a pointer
part"
    Maybe append "(as described below)", referring to para 4.

para 4
"Abstractly, scheme names are a tuple"
    Keep it singular: "Abstractly, a scheme name is a tuple"

    (For a nice parallelism, you might change the preceding sentence to
    "Syntactically, a scheme name consists of ...")

"that Prefix"
    Change to "the Prefix".

"no corresponding prefix"
    Change to "no binding for the Prefix".

------------------------------------------------------------------------
3.4 Namespace Binding Context

para 2
"to add a (prefix/namespace name) binding"
    Change the slash to a comma?

------------------------------------------------------------------------
4 Character Escaping

"The set of characters for XPointers"
"XPointers and IRI references containing XPointers"
"applied to XPointers"
    Change "XPointers" to "pointers" four times.

------------------------------------------------------------------------
4.1 Escaping Contexts

"The following contexts require various types of escaping to be applied
to XPointers"
    Change "XPointers" to "pointers".

B and C:
    I think it needs to be made clearer that B and C are not specifying
    how to encode reserved characters in IRIs/URIs, or even (in general)
    in IRI/URI references (for both of which, this spec cannot be
    normative), but only in pointers-as-fragment-identifiers. And that
    they must do this because the URI/IRI specs don't (completely?)
    specify how any fragment identifier language should do that.
    (Presumably it wasn't that clear within the Linking WG either, or
    you wouldn't have omitted section 4 from the July draft.)

    One possibility would be to create a new section (entitled, say,
    "Using Pointers as Fragment Identifiers") that yanks the relevant
    sentences from section 4, and adds at least one sentence explaining
    why this is necessary (i.e., why it isn't handled by the URI/IRI
    specs). Then the remains of 4 could simply refer to that section,
    and be completely non-normative. (Basically, it would become more
    focussed on the examples of 4.2; 4.1 would just be a set-up.)

    Separate question: if a URI/IRI containing percent-escapes appears
    in a pointer (in a pointer part using the xmlns scheme, say), and
    then that pointer is used as the fragment identifier of a URI/IRI
    reference, do the percents in the percent-escapes get further
    escaped? For example, does
        xmlns(p=http://www.7%25solution.com/ns)
    become
        whatever#xmlns(p=http://www.7%2525solution.com/ns)
    I'm guessing they'd have to, but I thought I'd check.

B. Escaping and Encoding of reserved IRI characters

"Thus, when an XPointer is inserted into an IRI reference"
    Change "an XPointer" to "a pointer".

    Maybe change "is inserted into" to "is used as the fragment
    identifier of".

"is converted to UTF-8 [...] as one or more bytes"
    Would it be better to say:
        is converted to one or more bytes according to UTF-8 [...]
    If so, ditto this under C.

C. Escaping and Encoding of reserved URI characters

"reserved URI characters"
    The set of characters that are reserved in URIs is irrelevant,
    because the pointer doesn't appear in a URI. It appears as the
    fragment identifier of a URI reference.

"IRI references can be converted to URI references for consumption by
URI resolvers."
    This is irrelevant, I think. Even if they weren't convertible,
    you'd still have to have this section, and it would be identical.

"The disallowed characters in URI references"
    This isn't what's important; what's important are the disallowed
    characters in *fragment identifiers* in URI references.

"include [blah] except for the number sign (#)"
    Delete "the number sign (#)". Although '#' is an allowed character
    in URI refs, it is *not* allowed in URI ref fragment identifiers.

"and percent sign (%)"
    Delete this too. Although the percent sign does appear, it does
    so only in escape sequences. A "normal" occurrence of '%' must be
    escaped, so it must be considered a "disallowed character" for
    the purposes of the encoding mechanism described.

"Disallowed characters are escaped as follows:"
    Couldn't you just say that they're escaped using the same procedure
    as in B?

D. XML escaping

"It is not recommended that URI references ... be placed in XML
documents."
    I think it would be better to say:
        It is recommended that IRI references (rather than the more
        restricted URI references) be used in XML documents.

para after D
"syntactic rules for XPointers"
    Change "XPointers" to "pointers" (or <code>Pointer</code>).

------------------------------------------------------------------------
4.2 Examples of Escaping

both examples
"The following table shows the escaping ... of an XPointer"
    Change "an XPointer" to "a pointer".

"C. IRI reference converted to URI reference"
    The IRI -> URI conversion is irrelevant.
    Change to "C. Pointer in URI reference".

example 2
"xpointer(id('...'))"
    It would be a nice change to use "element(...)" instead.

A.
"The XPointer"
    Change "XPointer" to "pointer", or just delete "The XPointer".

-Michael Dyck
Received on Thursday, 12 December 2002 01:35:41 UTC