- From: <bugzilla@jessica.w3.org>
- Date: Thu, 06 Dec 2012 12:08:37 +0000
- To: public-css-bugzilla@w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=20272
Bug ID: 20272
Summary: Word Boundaries (Hyphenation) in Indian languages
(UAX#29) Text Segmentation
Classification: Unclassified
Product: CSS
Version: unspecified
Hardware: PC
URL: http://w3cindia.in/ABNFValidSegmentationdocument.html#
uax29
OS: Windows XP
Status: NEW
Keywords: needsAction
Severity: major
Priority: P2
Component: Text
Assignee: fantasai.bugs@inkedblade.net
Reporter: tyagi@w3.org
QA Contact: public-css-bugzilla@w3.org
CC: kojiishi@gluesoft.co.jp, somnath@w3.org,
swaran@w3.org, tyagi@w3.org
Created attachment 1261
--> https://www.w3.org/Bugs/Public/attachment.cgi?id=1261&action=edit
complete description of this issues
Word Boundaries (Hyphenation):
Word boundaries are used in a number of different contexts. The most familiar
ones are selection (double-click mouse selection, or “move to next word”
control-arrow keys), and “Whole Word Search” for search and replace. They are
also used in database queries, to determine whether elements are within a
certain number of words of one another.
Recommended solution: ABNF Valid Segmentation and hyphenation dictionary (if
available)
Sentence Boundaries
Recommended solution: Some special sentence boundaries like
the double poorna virama,
possibly with numbers (as in Sanskrit text, shlokas etc.)
A string of Unicode-encoded text often needs to be broken up into text elements
programmatically. Common examples of text elements include what users think of
as characters, words, lines (more precisely, where line breaks are allowed),
and sentences. The precise determination of text elements may vary according to
orthographic conventions for a given script or language. The goal of matching
user perceptions cannot always be met exactly because the text alone does not
always contain enough information to unambiguously decide boundaries. For
example, the period (U+002E FULL STOP) is used ambiguously, sometimes for
end-of-sentence purposes, sometimes for abbreviations, and sometimes for
numbers. In most cases, however, programmatic text boundaries can match user
perceptions quite closely, although sometimes the best that can be done is not
to surprise the user.
Solution
Grapheme Cluster Boundaries: ABNF Valid Segmentation Based, Possible Extension
for handling some cases (?)
Deletion and backspace: Code point wise as well as ABNF Valid Segmentation
Mouse Selection: At ABNF Valid Segmentation and code point level.
--
You are receiving this mail because:
You are the QA Contact for the bug.
Received on Thursday, 6 December 2012 12:08:44 UTC