[Bug 4946] "all markup creates token boundaries"

http://www.w3.org/Bugs/Public/show_bug.cgi?id=4946

           Summary: "all markup creates token boundaries"
           Product: XPath / XQuery / XSLT
           Version: Working drafts
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Full Text
        AssignedTo: jim.melton@acm.org
        ReportedBy: jmdyck@ibiblio.org
         QAContact: public-qt-comments@w3.org


As decided at meeting 147, a sentence has been added (in the editor's draft) to
section 1.1:
    In the absence of an implementation-defined way to
    differentiate, all markup creates token boundaries.

However, XML's definition of "markup"
    <http://www.w3.org/TR/2006/REC-xml-20060816/#syntax>
is perhaps broader than what we had in mind when we said "all markup". For
instance, it seems unlikely that we meant for a character reference
to create a token boundary. Similarly for entity references and perhaps CDATA
section delimiters.

We could be more specific about which kinds of markup we mean, but instead,
maybe we shouldn't be relying on the idea of markup. Full-Text operates on
instances of the XQuery/XPath Data Model, where markup doesn't exist. So, for
example, we might say:
    In the absence of an implementation-defined indication otherwise,
    a token must not contain characters from more than one node.
(although we might have to make that more precise).

Received on Tuesday, 14 August 2007 04:57:02 UTC