- From: <bugzilla@wiggum.w3.org>
- Date: Sat, 23 Jun 2007 09:55:36 +0000
- To: public-qt-comments@w3.org
- CC:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=4698 Summary: [FT] editorial: 2.1 Processing Model Product: XPath / XQuery / XSLT Version: Last Call drafts Platform: All OS/Version: All Status: NEW Severity: minor Priority: P2 Component: Full Text AssignedTo: jim.melton@acm.org ReportedBy: jmdyck@ibiblio.org QAContact: public-qt-comments@w3.org 2.1 Processing Model [1] section I think the spec might be better off with the contents of this section put elsewhere. E.g., the stuff on tokenization can be merged into 4.1; pretty much everything else is specific to full-text contains expressions, so can be merged into 2.2.1. [2] para 1 "As part of the External Processing that is described in the XQuery Processing Model, when an XML document is parsed into an Infoset/PSVI and ultimately into a XQuery Data Model instance, a full-text process called tokenization is usually executed." With respect to the Processing Model, tokenization is *not* part of external processing, because: (a) there's no allowance for tokens in the Data Model, and (b) the only place/time where the thing-to-be-tokenized and the options-by-which-to-tokenize-it are guaranteed to come together is within the query at evaluation time. (Implementations may be able to statically determine [or guess] some combinations, and so do pre-tokenization, but that's not something that is [or should be] captured in the Processing Model.) Replace the para with something like: "At various points in full-text processing, the processor is called upon to 'tokenize' a string." [3] para 3 'including the definition of the term "words"' Delete. (Avoid using the term "words".) [4] "interprete" Change to "interpret". [5] list 1 "2. ... the containment hierarchy (e.g., paragraphs contain sentences, which contain words)" I think you mean "i.e.", not "e.g.". (If that's just an *example* of a containment hierarchy, then who gets to define the actual hierarchy that the tekenizer must preserve?) [6] para 5 "evaluated within the normal Query Processing (XQuery Processing Model)," Odd. Delete "the"? De-capitalize "Query Processing"? Is the parenthesized text supposed to be a link? Could just delete the whole quoted phrase; it doesn't seem relevant. [7] list 2 "3. ... which contents may be ignored" [7a] s/which contents/whose contents/ [7b] s/may/must/ [8] para 8 (2nd after diagram) "Tokenization normally occurs at the time of parsing of the original XML documents, for example, during the Data Model Generation process" That may be true in the real world, but not in the Processing Model. See my comment for para 1 above. [9] para 9, 11, ... "Full Text expression" When this section refers to a "Full Text expression", it specifically means a full-text contains expression. Might as well be specific. [10] list 3 "1. ... the set of search context items" s/set/sequence/ [11] "2. Evaluate the (optional) ignore expression, resulting in the set of ignored nodes and virtually delete the ignore nodes from the search context nodes tree." [11a] The ignore option must be evaluated for each search context item, so 2 should be the new 4a. [11b] s/ignore expression/ignore option/ [11c] s/nodes and virtually/nodes, and virtually/ (or "nodes. Virtually") [11d] s/ignore nodes/ignored nodes/ [11e] s/the search context nodes tree/the search context item/ [12] "4a. Apply the tokenization algorithm" In terms of the processing model, you can't do tokenization at this level. Each different FTPrimaryWithOptions within the FTSelection is allowed to have different FTMatchOptions, some of which affect tokenization. So theoretically, each FTWords causes its own tokenization of the search context item. [13] '4b. Evaluate the simple "FTWord" operators' s/FTWord/FTWords/ [14] 'against the tokenized input' s/input/context item/ ("input" suggests an external document) [15] "4c. ... in a bottom up fashion" s/bottom up/bottom-up/ [16] "At each step the AllMatches instance produced by the previous steps" s/instance/instances/ [17] "and a new instance of the AllMatches" s/instance of the AllMatches/AllMatches instance/ [18] "the FTMatchOptions are controlling the semantics" s/are controlling/control/ [19] "5. Convert the AllMatches instance" s/the AllMatches instance/the topmost AllMatches instances/ (since each search context item results in one topmost AllMatches instanmce)
Received on Saturday, 23 June 2007 09:55:39 UTC