- From: Michael Dyck <michaeldyck@shaw.ca>
- Date: Mon, 04 Mar 2002 02:27:58 -0800
- To: www-xml-linking-comments@w3.org
Comments on section 5 of XML Pointer Language (XPointer) Version 1.0 W3C Candidate Recommendation 11 September 2001 ------------------------------------------------------------------------ 5 XPointer Extensions to XPath bullet 2 Delete "and corresponding". Insert comma after "location types". bullet 7 ("Allowance...") Rather than putting this bullet in the midst of bullets about added functions, maybe put it before or after them (in 4th or 8th spot). ------------------------------------------------------------------------ 5.2 Evaluation Context Initialization para 1 "except for the generalization of nodes to locations" Append "and the addition of properties for here and origin". "XPointer applications" Change to "XPointer processors". bullet 1 "When the XPointer is a fragment identifier of a URI reference, the document or external parsed entity is the one identified by the URI portion." Note that this requirement must be enforced outside the XPointer processor. As described in 3.3, the XPointer processor is simply handed a resource. It is presumably this resource's root node that is the initial setting for the context location. bullet 4 "XPointer applications" Change to "XPointer processors". ------------------------------------------------------------------------ 5.2.1 Namespace Initialization para 1 "Any xmlns parts attempting to override the xml prefix must be ignored." What about the xmlns prefix? ------------------------------------------------------------------------ 5.3 The point and range Location Types para 1 "Locations of the point and range type" Change "type" to "types". para 2 See XP121 in the Linking Issue List. The decision was: "He is looking for clarification, but more properly this should come from the DOM side, not ours." Yes, I want clarification, but first I want correctness. I believe you are misusing Unicode terminology, and I think you should either correct it or (since it's just a Note) drop it. ------------------------------------------------------------------------ 5.3.1 Definition of Point Location para 1 Delete "] [Definition:" in the middle of the sentence. "Two points are identical if" After "identical", insert "(equal)", since that term is sometimes used (e.g., in the definition of collapsed range). After "if", append "and only if". para 2 Change "applications" to "XPointer processors". para 3 "a text node inside an element" Delete "inside an element". It's redundant. Every text node is inside an element. para 4 "a non-zero index n indicates the point immediately after the nth child node" Note that the point is (in general) *not* immediately after that node in document order, because the node contains descendant nodes or points that intervene. ------------------------------------------------------------------------ 5.3.2 Definition of Range Location para 1 Put "[Definition:" "]" around the whole sentence. Put "range", "start point", and "end point" in bold. para 3 "a range from the start of a processing instruction" This meaning of "start" (meaning the point to the immediate left of the PI) differs from that of "start-point" (meaning the leftmost point inside the PI). Instead of "the start of", you might say "immediately before". para 5 "between the start point and end point" "Between" might rate another forward reference to document order. para 6 "The axes of a range location are identical to the axes of its start point." On the previous XPointer draft, I raised a comment on the weirdness of this definition, and suggested that you'd be better off saying that a range's self axis (and its *-or-self axes) contain the range itself, and all the other axes are empty. This went into the Linking Issue List as XP123(g), but it appears that the WG's response was misplaced under XP123(e): Discussion: after some discussion it appeared that keeping ranges 'terminal' w.r.t. axis computation wasn't a problem and keeping all axis of a range being empty is not a problem in practice. Decision: accepted all the axis for a a range are empty excepted *self which are the range itself. Add a note about start-point() or end-point() as intermediary step for doing axis computation from a range . Perhaps the misplacement explains why the decision was not carried out. ------------------------------------------------------------------------ 5.3.3 Covering Ranges for All Location Types bullet 5 "For any other kind of location" Append "(i.e., element, text, comment, or processing instruction)". ------------------------------------------------------------------------ 5.3.4 Tests for point and range Locations para 1 "production for NodeType... by adding" Delete "...". Put "NodeType" in a 'code' element. production Here, when you modify XPath production [38], you label it [11]. In 5.4.1, when you modify XPath production [4], you label it [4xptr]. I think I prefer the latter technique. Thus, change "[11]" to "[38xptr]". ------------------------------------------------------------------------ 5.3.5 Document Order para 2 "node point" (3 times) "character point" (2 times) Insert hyphen. "Conceptually, node points label gaps between nodes," I'm not sure people will understand what you mean by "gaps". You might say "positions" instead. "while character points occur within a node, between the node points to the right and left of the node." This is also true of node-points. Wouldn't it be more accurate (and more parallel) to say that character-points label the gaps/positions between characters? para 3 "node point" (2 times) Insert hyphen. para 4 "character point" Insert hyphen. paras 2-4 It seems to me that these three paragraphs aren't particularly pertinent to document order, and mostly just try to give people a conceptual grasp of point locations. So they might fit better back in 5.3.1. para 5 "immediately preceding node" Put in bold. "except that there is no point defined preceding or following the root" So? I don't think this affects the definition in any way. Delete it. (In what follows, I abbreviate "immediately preceding node" as "IPN".) "The following diagram..." It would be helpful to have the original XML text that this diagram represents. It appears to be: <p id='p1'>Everything_is_<em>deeply_</em>intertwingled.</p> (Although the underscores in the diagram are probably just placeholders for spaces.) I think it would be more common to put the space *after* the </em> tag rather than before it. Diagram This diagram doesn't seem particularly related to document order. It too might fit better back in 5.3.1, unless you use it to give examples of IPN and document order. There should be an attribute node for id='p1', and markers for its three character-points. The text talks about the "gaps" between nodes: it would be nice if the diagram showed such gaps! (e.g., between the right side of the 'text node 1' pentagon and the left side of the 'em' box) "postion 1 in p" "postion 0 in em" Change "postion" to "position". Note that "position" is not an XPointer term. The phrase "position 1 in p" presumably means "point with index 1 and container-node p": you should either explain this or change the wording on the diagram ("point with index 1 in p" might be okay). All indications of "startpoint" and "endpoint" disagree with the definitions of the start-point and end-point functions. For example: --- The start-point of 'text node 1' is the character-point with container node = 'text node 1' and index = 0, i.e., the point labelled '0' just before the 'E', which is not what "text node 1 startpoint" indicates. --- The start-point of 'p' is the node-point with container node = 'p' and index = 0, which is presumably what is meant by "position 0 in p", but that label does not coincide with "p startpoint". Here is a rough ASCII-art version of the diagram that fixes the above problems. (You'll need to view it in a fixed-width font.) Please excuse the terse labels. +-------------------------------------------------------------------------------------------------------------------+ | | | p | | | +-------------------------------------------------------------------------------------------------------------------+ . | | . | | . | +---------------------------+----+--------------------------------+ . | | | | . | ! | ! | ! | ! . +-------+ ! +-------------------------------+ ! +-------------------------+ ! +-------------------------------+ ! . | | ! | | ! | | ! | | ! . | id | ! | text node 1 | ! | em | ! | text node 3 | ! . | | ! | | ! | | ! | | ! . | | ! | | ! +-------------------------+ ! | | ! . | | ! | | ! . | ! | | ! . | | ! | | ! . ! | ! ! | | ! . | | ! | | ! . ! +-----------------+ ! ! | | ! . | | ! | | ! . ! | | ! ! | | ! . | | ! | | ! . ! | text node 2 | ! ! | | ! . | | ! | | ! . ! | | ! ! | | ! . +-------+ ! +-------------------------------+ ! . ! +-----------------+ ! ! +-------------------------------+ ! . . p 1 ! . E v e r y t h i n g _ i s _ ! . ! . d e e p l y _ ! ! . i n t e r t w i n g l e d . ! . . ! ! ! ! . ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! . ! . ! ! ! ! ! ! ! ! ! ! . ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! . . 0 2 ! . 0 5 10 ! ! . ! . 0 5 ! ! ! . 0 5 10 ! ! . . ! ! ! . ! ! ! . ! . ! ! ! ! . ! ! ! . . +sp ! ! . +------ TN1 start-point ! ! . ! . +-TN2 start-p ! ! ! . +----- TN3 start-point ! ! . . ep+ ! . TN1 end-point ------+ ! . ! . TN2 end-p --+ ! ! . TN3 end-point -------+ ! . . ! . ! . ! ! ! . ! ! ! 0 <- node-pts in em-> 1 ! ! ! ! ! ! ! ! ! ! +--- em start-point ! ! ! ! ! em end-point ----+ ! ! ! ! ! ! 0 <-- node-points in p ---> 1 2 3 ! ! +-- p start-point p end-point --+ (Vertical lines made of exclamation marks denote points. Vertical lines made of dots indicate where the respective nodes occur in document order, relative to points, assuming a reasonable definition of document order.) Node and point "A node is before a point if the node is before or equal in document order to the IPN of the point; otherwise, the node is after the point." Let point X be node-point 2 in the <p> node (between the <em> node and text node 3). Its IPN is the <em> node. So every node before or equal to the <em> node is before that point. That's fine. However, every *other* node is defined to be after the point, and that includes text node 2. But I really don't think you want text node 2 to be after point X. Point and point "Two points P1 and P2 are equal if their IPNs are equal and the indexes of the points are equal." This is incorrect. Consider: P1 = the character-point between 'E' and 'v' (container = text node 1, index = 1) P2 = the node-point between text node 1 and the <em> node (container = <p> node, index = 1) For both P1 and P2, the IPN is text node 1 and the index is 1. So by the above definition, P1 and P2 are equal. But obviously they are not. "P1 is before P2 if P1's IPN is before P2's" Consider: P1 = node-point 2 in the <p> node ("point X" previously) P2 = any character-point in text node 2. P1's IPN is the <em> node and P2's is text node 2. The <em> node is before text node 2, so the above definition says that P1 is before P2. But I don't think you want that to be the case. "[P1 is before P2] if their IPNs are equal and P1's index is less than P2's." Consider: P1 = the node-point between text node 1 and the <em> node (container = <p> node, index = 1) P2 = the character-point between the 'v' and the 'e' (container = text node 1, index = 2) For both P1 and P2, the IPN is text node 1. P1's index is less than P2's, so P1 is supposedly before P2. But I don't think you want it to be. document order in general The problem with these definitions stems from the definition and use of the IPN concept. It's very tempting to think that a point's "immediately preceding node" is the node that immediately precedes it in document order. If it *did* mean that, the definitions above would make a lot more sense (although some would still be wrong). So now you might want to redefine IPN so that it does mean that, but I don't think it would be worth the effort. I think you'd still have trouble defining the relative order of points with the same IPN. Instead, how about just giving a nice recursive definition of document order? Something like this: Let "point(C,I)" denote the point whose container node is C and whose index is I. Let "child(N,I)" denote the Ith child of node N. Let "doc_order(N)" denote the document order of the nodes and points under node N, defined as follows: doc_order(N): if N is an element node or root node: Let k be the number of children of N. N For each namespace node S of N, doc_order(S) For each attribute node A of N, doc_order(A) point(N,0) For each i such that 1 <= i <= k, doc_order( child(N,k) ) point(N,k) if N is any other kind of node: Let k be the length of the string-value of N. N For each i such that 0 <= i <= k, point(N,k) last para "Note that one consequence of these rules is that a point can be treated the same as the equivalent collapsed range." Only for the purpose of determining document order. ------------------------------------------------------------------------ 5.4 XPointer Functions para 1 "XPointer applications" Change "applications" to "processors". Throughout 5.4.x: For consistency with XPath, in every function prototype, remove the space before the closing parenthesis. ------------------------------------------------------------------------ 5.4.1 range-to Function para 1 "For each location in the context" This is still misleading. Yes, I made this comment on the previous draft of XPointer, and yes, the WG decided (XP126(b) in the Linking Issue List) to keep it as is. However, I was not satisfied with the rationale for the decision, as detailed in http://lists.w3.org/Archives/Public/www-xml-linking-comments/2001AprJun/0073.html under "xp126-b-dyck". I have had no response to that posting. "the start point of the context location (as determined by the start-point function)" It would be better to put the parenthetical remark after "start point". That's what you do for "end point" in the same sentence. "the location" As I pointed out on the previous draft, and as Elliotte Rusty Harold has pointed out on this draft, you don't say what happens when the location-set argument contains other than a single location. Perhaps you should say that the function returns a location-set containing a range for each location in the argument location-set. ------------------------------------------------------------------------ 5.4.2 string-range() Function Delete "()" from the section title. None of the other section titles for functions has parentheses. para 1 "For each location in the location-set argument, string-range returns a set of ranges..." This suggests that, for instance, if the location-set contains two locations, the function returns two sets of ranges, one for each location. Presumably, it really only returns one set of ranges, the union of those two. So I suggest rewording to something like: This function returns a location-set containing ranges determined as follows. For each location in the location-set argument, the function searches the string-value of the location for substrings that match the string argument. "An empty string" Maybe italicize "string" to indicate that it refers to the argument string, not the string-value of a location. "Each non-overlapping match" Consider searching "banana" for substrings that match "ana". One possible interpretation of the phrase "non-overlapping match" would say that there are two matches, but they overlap, therefore there are no non-overlapping matches. I suspect the intent is that there is one non-overlapping match, but this is not at all clear. para 2 "matched string" (2 times) Change "string" to "substring". "The default value is 1, which makes the range start immediately before the first character of the matched string." Are numbers less than 1 allowed? If so, it would be nice to give an example of such. If not, you should definitely say so. Are non-integral numbers allowed? "The fourth argument gives the number of characters in the range" Presumably this must be greater than or equal to zero. What happens if a negative number is passed in? Are non-integral numbers allowed? This sentence doesn't completely define the resulting range. Consider the document: <doc>Thomas <em>Pyn</em>chon</doc> and the function call: string-range( /doc/em, "Py", 1, 3 ) The resulting range starts at point( container = /doc/em/text(), index = 0 ) but it could end at any of: point( container = /doc/em/text(), index = 3 ) point( container = /doc/em, index = 1 ) point( container = /doc, index = 2 ) point( container = /doc/node()[3], index = 0 ) and still satisfy the constraint that there be three characters in the range. "Thus, both of the start point and end point of each range ... will be character points." This statement does not logically follow from the previous. As my example shows, there can be node-points that satisfy the constraints. "character points" Insert hyphen. para 4 "For any particular match, if the string argument is not found in the string-value of the location" This phrase doesn't make sense, because if the string argument isn't found, there *is* no match. I suggest rewording to something like: For any particular location, if no match is found, no range is added to the result for that location. For any particular match, if the third and fourth argument ... "wholly beyond" So if they indicate a range that is only *partially* beyond the beginning or end of the document or entity, a range *is* added to the result? It wouyld be good to give an example. "beyond the beginning or end of the document or entity" On the previous draft, I asked: What happens if the third or fourth arguments indicate a position that is within the document, but outside the string-value of the location? For example, with this as the document: <doc>Thomas <em>Pyn</em>chon</doc> and this as the xpointer: string-range(/doc/em, "P", 1, 7) Does it select "Pynchon", "Pyn", or nothing? XP127(g) in the Linking Issue List shows a recommendation: sounds clear that this will select "Pynchon", since "Element boundaries, as well as entire embedded nodes such as processing instructions and comments, are ignored" but no actual decision. The thing is, once you "leave" the string-value of location being searched, where are you? In some nearby text node presumably, i.e. still in the string-value of some higher node. In fact, it seems like the endpoints of the range are located with respect to the string-value of the whole document or external parsed entity. On a related note, consider the document: <doc>Pynchon<!-- Pyn-->chon</doc> and the function call: string-range(/doc/node(), "Pyn", 1, 7) Matches are found in the first text node and the comment node. The former will certainly add a range to the result, but what about the latter? You can imagine similar examples involving attribute, namespace, and processing instruction nodes. para 5 "character points" Insert hyphen. This sentence repeats the last sentence of para 2. para 9 "string content" "retain the structural context" These phrases are not well-defined. Maybe this paragraph should just be a Note. "For example, if the 17th occurrence of "Thomas Pynchon"..." Because this pertains to the first example, it would probably make more sense to put this paragraph after the first example. "XPointer application" Change "application" to "processor". ------------------------------------------------------------------------ 5.4.3.1 range Function "representing the covering range" Append "(see 5.3.3)". ------------------------------------------------------------------------ 5.4.3.2 range-inside Function "If x is ... a point, then x is added to the result location-set." On the previous draft, I said: But if x is a point, then you'd be adding a point to the result, and you just said that the function returns ranges. Instead, you presumably want to add the collapsed range at that point. XP128(a) in the Linking Issue List says "Decision: approved", but the decision has not been carried out. "character point" Insert hyphen. "If the end point is a character point then its index is the length of the string-value of x; otherwise its index is the number of children of x." This is somewhat circular, in that you're defining the end point based on a property of the end point. Of course, it works, because the property of it being a character-point or node-point is dependent only on its container node, which was specified in the previous sentence. Still, I think it would be clearer if you said something like: If x is an element node or root node, the index of the end point of the range is the number of children of x; otherwise its index is the length of the string-value of x. ------------------------------------------------------------------------ 5.4.3.3 start-point Function 5.4.3.4 end-point Function "If x is of type attribute or namespace, the XPointer part in which the function appears fails." There is no reason for this. Attribute and namespace nodes are perfectly fine as containers for points and ranges. On the previous draft, I said: I'm mystified: why is it so wrong to ask for the start-point (or end-point) of an attribute or namespace location? Why can't these functions treat such locations just like text, comment, and processing instruction locations? That's what range-inside does. In fact, if someone really wanted to write start-point(@foo) they could get around start-point's bizarre dislike of attribute locations just by writing start-point(range-inside(@foo)) If the latter expression isn't erroneous, why is the former? XP129(d) in the Linking Issue List gives the decision: "keep as is we would prefer to not add complexity at this point". My response to that appears in http://lists.w3.org/Archives/Public/www-xml-linking-comments/2001AprJun/0073.html under "xp129-d-dyck": Complexity? The following change would satisfy me: In the description for each of start-point() and end-point(), delete the bullet regarding attribute or namespace, and in the previous bullet, change "text, comment, or processing instruction" to "text, attribute, namespace, comment, or processing instruction". Can you honestly say that this adds complexity? To my thinking, the result is simpler than the current definition. Moreover, I'd say it's easier to implement. I have received no reply to that submission. 5.4.3.4 para 1 "to the result location-set" Change "result" to "resulting". ------------------------------------------------------------------------ 5.4.4 here Function para 1 "the XPointer part in which the here function appears fails" Does a resource error occur? Note "The returned location for an XPointer appearing in element content does not have a node type of element because the XPointer is in a text node that is itself inside an element." Huh? This seems to ignore what the first bullet says. ------------------------------------------------------------------------ 5.4.5 origin Function para 1 "a link expressed in an XML document" Is it important that the link be expressed in an XML document? (Could it be expressed in any other kind of document? Would it make a difference if it was?) I think it would a be a bit clearer if the last sentence of the paragraph were inserted after the first sentence. para 2 "It is a resource error to use origin in the fragment identifier portion of a URI reference where a URI is also provided and identifies a resource different from the resource from which traversal was initiated, or in a situation where traversal is not occurring." Why? It seems like it would be a useful thing to do. Imagine that document A has emphasized words: <em>frimmin</em> on the <em>jimjam</em> and document B is a glossary for these words: <entry><word>frimmin</word><defn>...</defn></entry> and you want to create third-party links such that from any <em> node in A, you can initiate traversal to the corresponding glossary entry in B. Wouldn't you need a URI reference something like this?: B.xml#xpointer(//entry[word = origin()]) And wouldn't that be a resource error according to the quoted sentence? ------------------------------------------------------------------------ 5.5 Root Node Children "XPointer extends the XPath data model" Fine, but where is the data model of an external parsed entity defined? ------------------------------------------------------------------------ -Michael Dyck
Received on Monday, 4 March 2002 06:40:12 UTC