- From: <bugzilla@jessica.w3.org>
- Date: Thu, 06 Dec 2012 12:08:37 +0000
- To: public-css-bugzilla@w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=20272 Bug ID: 20272 Summary: Word Boundaries (Hyphenation) in Indian languages (UAX#29) Text Segmentation Classification: Unclassified Product: CSS Version: unspecified Hardware: PC URL: http://w3cindia.in/ABNFValidSegmentationdocument.html# uax29 OS: Windows XP Status: NEW Keywords: needsAction Severity: major Priority: P2 Component: Text Assignee: fantasai.bugs@inkedblade.net Reporter: tyagi@w3.org QA Contact: public-css-bugzilla@w3.org CC: kojiishi@gluesoft.co.jp, somnath@w3.org, swaran@w3.org, tyagi@w3.org Created attachment 1261 --> https://www.w3.org/Bugs/Public/attachment.cgi?id=1261&action=edit complete description of this issues Word Boundaries (Hyphenation): Word boundaries are used in a number of different contexts. The most familiar ones are selection (double-click mouse selection, or “move to next word” control-arrow keys), and “Whole Word Search” for search and replace. They are also used in database queries, to determine whether elements are within a certain number of words of one another. Recommended solution: ABNF Valid Segmentation and hyphenation dictionary (if available) Sentence Boundaries Recommended solution: Some special sentence boundaries like the double poorna virama, possibly with numbers (as in Sanskrit text, shlokas etc.) A string of Unicode-encoded text often needs to be broken up into text elements programmatically. Common examples of text elements include what users think of as characters, words, lines (more precisely, where line breaks are allowed), and sentences. The precise determination of text elements may vary according to orthographic conventions for a given script or language. The goal of matching user perceptions cannot always be met exactly because the text alone does not always contain enough information to unambiguously decide boundaries. For example, the period (U+002E FULL STOP) is used ambiguously, sometimes for end-of-sentence purposes, sometimes for abbreviations, and sometimes for numbers. In most cases, however, programmatic text boundaries can match user perceptions quite closely, although sometimes the best that can be done is not to surprise the user. Solution Grapheme Cluster Boundaries: ABNF Valid Segmentation Based, Possible Extension for handling some cases (?) Deletion and backspace: Code point wise as well as ABNF Valid Segmentation Mouse Selection: At ABNF Valid Segmentation and code point level. -- You are receiving this mail because: You are the QA Contact for the bug.
Received on Thursday, 6 December 2012 12:08:44 UTC