W3C home > Mailing lists > Public > public-css-bugzilla@w3.org > December 2012

[Bug 20272] New: Word Boundaries (Hyphenation) in Indian languages (UAX#29) Text Segmentation

From: <bugzilla@jessica.w3.org>
Date: Thu, 06 Dec 2012 12:08:37 +0000
To: public-css-bugzilla@w3.org
Message-ID: <bug-20272-5148@http.www.w3.org/Bugs/Public/>
https://www.w3.org/Bugs/Public/show_bug.cgi?id=20272

            Bug ID: 20272
           Summary: Word Boundaries (Hyphenation) in Indian languages
                    (UAX#29) Text Segmentation
    Classification: Unclassified
           Product: CSS
           Version: unspecified
          Hardware: PC
               URL: http://w3cindia.in/ABNFValidSegmentationdocument.html#
                    uax29
                OS: Windows XP
            Status: NEW
          Keywords: needsAction
          Severity: major
          Priority: P2
         Component: Text
          Assignee: fantasai.bugs@inkedblade.net
          Reporter: tyagi@w3.org
        QA Contact: public-css-bugzilla@w3.org
                CC: kojiishi@gluesoft.co.jp, somnath@w3.org,
                    swaran@w3.org, tyagi@w3.org

Created attachment 1261
  --> https://www.w3.org/Bugs/Public/attachment.cgi?id=1261&action=edit
complete description of this issues

Word Boundaries (Hyphenation): 
Word boundaries are used in a number of different contexts. The most familiar
ones are selection (double-click mouse selection, or “move to next word”
control-arrow keys), and “Whole Word Search” for search and replace. They are
also used in database queries, to determine whether elements are within a
certain number of words of one another.

Recommended solution: ABNF Valid Segmentation and hyphenation dictionary (if
available)

Sentence Boundaries
Recommended solution: Some special sentence boundaries like 
the double poorna virama,
possibly with numbers (as in Sanskrit text, shlokas etc.)
A string of Unicode-encoded text often needs to be broken up into text elements
programmatically. Common examples of text elements include what users think of
as characters, words, lines (more precisely, where line breaks are allowed),
and sentences. The precise determination of text elements may vary according to
orthographic conventions for a given script or language. The goal of matching
user perceptions cannot always be met exactly because the text alone does not
always contain enough information to unambiguously decide boundaries. For
example, the period (U+002E FULL STOP) is used ambiguously, sometimes for
end-of-sentence purposes, sometimes for abbreviations, and sometimes for
numbers. In most cases, however, programmatic text boundaries can match user
perceptions quite closely, although sometimes the best that can be done is not
to surprise the user. 

Solution

Grapheme Cluster Boundaries: ABNF Valid Segmentation Based, Possible Extension
for handling some cases (?)
Deletion and backspace: Code point wise as well as ABNF Valid Segmentation 
Mouse Selection: At ABNF Valid Segmentation and code point level.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
Received on Thursday, 6 December 2012 12:08:44 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 6 December 2012 12:08:44 GMT