- From: <bugzilla@wiggum.w3.org>
- Date: Mon, 27 Aug 2007 06:15:14 +0000
- To: public-qt-comments@w3.org
- CC:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=4697 ------- Comment #1 from jmdyck@ibiblio.org 2007-08-27 06:15 ------- A specific suggestion for point [6c], which also takes care of [6a] and [6b]: In section 1.1, second list, item 3: --- Extract the definitions of "sentence" and "paragraph" and put them between items 2 and 3. (Append them to item 2, or make a new item, whichever you prefer.) --- Delete the three sentences at the end of the item: Whatever a tokenizer for a particular language chooses to do, it must preserve the containment hierarchy: paragraphs contain sentences, which contain tokens. The tokenizer has to process two codepoint equal strings in the same way, i.e., it should identify the same tokens. Everything else about the behavior of the tokenizer is implementation-defined. --- Move the definition of tokenization (and the subsequent constraints, and the Note re overlapping tokens) from 4.1 to replace the sentences deleted above. But instead of the 4.1 phrasing: paragraphs contain sentences contain words use the 1.1 phrasing: paragraphs contain sentences, which contain tokens --- As for the three sentences at the start of the item, delete or reposition or leave them, as you please. (It might be more stylistically consistent to put them after the definition.) In section 2.1, delete the repeated paragraph and list: "Tokenization, including .. same tokens in each."
Received on Monday, 27 August 2007 06:15:19 UTC