[Bug 3939] [FT] Section 4.1.1: Example for overlapping tokens

http://www.w3.org/Bugs/Public/show_bug.cgi?id=3939


jim.melton@acm.org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED




------- Comment #1 from jim.melton@acm.org  2007-02-01 21:57 -------
The Task Force has agreed to provide such an example.  In Section 4.1,
Tokenization, immediately prior to section 4.1.1, we will insert a paragraph
that reads:
For some languages, some tokenizers may identify overlapping tokens.  For
example, the German word "Donaudampfschifffahrtskapitaensmuetzen" might be
tokenized into the following tokens: Donaudampfschifffahrtskapitaensmuetzen,
Donau, dampf, schiff, dampfschiff, kapitaen, muetzen, kapitaensmuetzen,
schifffahrt, dampfschifffahrt, and perhaps others. 

Received on Thursday, 1 February 2007 21:57:46 UTC