- From: Robert Sanderson <azaroth42@gmail.com>
- Date: Wed, 27 May 2015 19:30:58 -0700
- To: "Phillips, Addison" <addison@lab126.com>
- Cc: Doug Schepers <schepers@w3.org>, i18n WG <www-international@w3.org>, Richard Ishida <ishida@w3.org>, W3C Public Annotation List <public-annotation@w3.org>
- Message-ID: <CABevsUGVZ+fxbs81i_+USrsZKD31eiEodWDgxe++a03zajOZWw@mail.gmail.com>
Heh. I confused myself with thinking in American dates (6/3 to me is the 6th of March). Yes, June 3rd. We would love to have as many people from i18n WG join as can make it! Our call in details: Logistics: IRC channel is #annotation voice WebEx via computer: https://mit.webex.com/mit/j.php?MTID=me422bef2c6690852d7d9a2cf39f591b8 or direct dial in: +1-617-324-0000, Access code: 645 413 954 We continue to use zakim to manage the queue (q+, q-, q? etc), Details in the wiki https://www.w3.org/annotation/wiki/WebEx Many thanks for your prompt and willing responses, Addison! Greatly appreciated :) Rob On Wed, May 27, 2015 at 9:03 AM, Phillips, Addison <addison@lab126.com> wrote: > Hello Rob, > > > > Please confirm that this call would be on Wednesday, June 3rd (6 June is > Saturday?). I’d be glad to participate. Others from the > Internationalization WG may also wish to: you should plan for no more than > three of us to turn up. Please provide participation information. > > > > Thanks, > > > > Addison > > > > *From:* Robert Sanderson [mailto:azaroth42@gmail.com] > *Sent:* Wednesday, May 27, 2015 8:28 AM > *To:* Phillips, Addison > *Cc:* Doug Schepers; i18n WG; Richard Ishida; W3C Public Annotation List > *Subject:* Re: Feedback on i28n in Rangefinder API > > > > > > Dear all, > > > > Apologies from Frederick and myself for letting the timing for the > discussion fall off the radar. > > > > Would it be possible to join a call next week on Wednesday June 6 at 8am > PST / 11am EST / 4pm UK / 5pm Europe to discuss internationalization issues > regarding annotation? > > > > In particular, it would be great to make progress on the points that > Addison made and also the issue that Takeshi brought up at the F2F > regarding different lengths of character strings in different (programming) > languages. > > > > Thanks! > > > > Rob > > > > > > > > On Tue, May 12, 2015 at 1:09 PM, Phillips, Addison <addison@lab126.com> > wrote: > > Some comments from reading the document through initially. I understand > that this is a work in progress. > > 'caseFolding': There is a default Unicode case folding. However, it is not > applicable in all cases. For example, see the note box in [1]. Certainly a > default case folding could be the default. But there should be a means of > tailoring the case fold using a language tag. > > 'unicodeFolding': This also presents a number of difficulties. Not just > canonical (NFC/NFD) equivalence but also compatibility equivalence > (NFKC/NFKD) is sometimes useful. In addition, there are textual variations > that are not related to Unicode character properties that searches may wish > to deal with. For example, Japanese uses both katakana and hiragana > phonetic scripts: one might wish to normalize these differences away when > searching text. In other words, I think probably this parameter needs more > thought. > > As an aside, there are other things that you note that users might want to > ignore/not ignore when searching. This is discussed at length in UTS#10, > Chapter 8 [2] and language-specific tailoring and different "weights" come > into play. > > 'wholeWord': This seems simple at first, but some languages (Thai, > Japanese, Chinese) that do not use spaces between words have a difficult > relationship with this feature. This doesn't make the feature invalid, but > does require a health warning that the items selected may not, in fact, > always be words. > > Normalization in general: it may be possible that the searched text is > itself not provided in a normalized form. Health warnings or solid > implementation guidance is certainly necessary here. > > The discussion of using Unicode decomposition in section 9 might need to > be carefully thought through. For example, the Korean Hangul script > decomposes in a way that might interfere with searching operations (a > character that had a Levenshtein distance of '1' when composed might have a > distance as large as '4' when decomposed). > > The example 'character count': what exactly would be counted here? Unicode > code points? Graphemes? > > There are invisible characters in Unicode, such as variation selectors or > the new emoji skin tone characters, which may not meaningfully affect the > user's intention, but might prevent searches from being successful. > > Anyway, food for thought. I look forward to further discussion. > > ~Addison > > [1] http://w3c.github.io/charmod-norm/#definitionCaseFolding > [2] http://www.unicode.org/reports/tr10/#Searching > > > -----Original Message----- > > From: Doug Schepers [mailto:schepers@w3.org] > > Sent: Tuesday, May 12, 2015 11:47 AM > > To: i18n WG; Richard Ishida; Phillips, Addison; W3C Public Annotation > List > > Subject: Feedback on i28n in Rangefinder API > > > > > Hi, Addison, Richard, I18n– > > > > Oops, hit send too soon, sorry... resending. > > > > (BCCing the Web Annotation WG mailing list, to keep them in the loop) > > > > I'd like to schedule a liaison telcon between the Internationalization > WG and > > the Web Annotation WG, to discuss issues around a client-side API for > > searching for strings in a web document. > > > > The Web Annotation WG is chartered to deliver a spec for "fuzzy > anchoring", > > which basically means a way to link to a specific passage in a document, > even > > if there is no ID and even if the document may have changed. > > > > One manifestation of this is my Rangefinder API spec [1], which is > basically a > > find-in-page API with fuzzy matching (e.g. case folding, Levenshtein > distance > > tolerance, Unicode normalization [2]) and location scoping. > > > > For the Unicode normalization, we'd like to refer normatively to the > updated > > Charmod-Norm [3]. In any case, we'd like to discuss our use cases and > > requirements around i18n with you, for your best advice on how we should > > proceed. > > > > I spoke with Richard today, and he suggested the best next step would be > > have you take a look at my rough early draft of the Rangefinder API, so > we > > have some basis for discussion. Please excuse the sketchy nature of the > spec, > > and note that the examples are illustrative but out of date with the > spec's > > development. > > > > If you want to meet, would you want to join us, or have some of us join > you? > > We normally meet on Wednesdays at 11am ET. > > > > > > [1] http://w3c.github.io/rangefinder/ > > [2] http://w3c.github.io/rangefinder/#widl-RangeFinder-unicodeFolding > > [3] http://www.w3.org/TR/2014/WD-charmod-norm-20140715/ > > > > Regards– > > –Doug > > > > > > > > -- > > Rob Sanderson > > Information Standards Advocate > > Digital Library Systems and Services > > Stanford, CA 94305 > -- Rob Sanderson Information Standards Advocate Digital Library Systems and Services Stanford, CA 94305
Received on Thursday, 28 May 2015 02:31:30 UTC