[Bug 3739] [FT] Description of tokenization (Editorial/Technical) from bugzilla@wiggum.w3.org on 2006-09-18 (public-qt-comments@w3.org from September 2006)

From: <bugzilla@wiggum.w3.org>
Date: Mon, 18 Sep 2006 19:22:58 +0000
To: public-qt-comments@w3.org
CC:
Message-Id: <E1GPOhm-0004Qw-FG@wiggum.w3.org>

http://www.w3.org/Bugs/Public/show_bug.cgi?id=3739

           Summary: [FT] Description of tokenization (Editorial/Technical)
           Product: XPath / XQuery / XSLT
           Version: Working drafts
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Full Text
        AssignedTo: jim.melton@acm.org
        ReportedBy: holstege@mathling.com
         QAContact: public-qt-comments@w3.org


== Section 1.1 (Full-Text Search and XML)
Bullet 3, final paragraph: 
"The tokenizer has to evaluate two equal strings..."
(1) Suggest replacing "evaluate" with some other word that doesn't carry the
same implications in the XQuery context, perhaps "process".
(2) "equal" is troubling as well: equal as in XQuery equals in the face of
a collation? Or codepoint-by-codepoint equal?  I believe we mean the latter.

Bullets 4 and 5
Should mention the relationship of markup to tokenization, particularly
paragraph identification.  I expect for most XML markup that it will be the
markup, not white space, that identifies paragraph boundaries.

Received on Monday, 18 September 2006 19:23:03 UTC