Re: Feedback on i18n in Rangefinder API from Robert Sanderson on 2015-05-28 (public-annotation@w3.org from May 2015)

From: Robert Sanderson <azaroth42@gmail.com>
Date: Wed, 27 May 2015 19:30:58 -0700
To: "Phillips, Addison" <addison@lab126.com>
Cc: Doug Schepers <schepers@w3.org>, i18n WG <www-international@w3.org>, Richard Ishida <ishida@w3.org>, W3C Public Annotation List <public-annotation@w3.org>
Message-ID: <CABevsUGVZ+fxbs81i_+USrsZKD31eiEodWDgxe++a03zajOZWw@mail.gmail.com>
Heh. I confused myself with thinking in American dates (6/3 to me is the
6th of March).  Yes, June 3rd.
We would love to have as many people from i18n WG join as can make it!

Our call in details:

Logistics: IRC channel is #annotation

voice WebEx via computer:
    https://mit.webex.com/mit/j.php?MTID=me422bef2c6690852d7d9a2cf39f591b8
or direct dial in: +1-617-324-0000, Access code: 645 413 954

We continue to use zakim to manage the queue (q+, q-, q? etc),
Details in the wiki  https://www.w3.org/annotation/wiki/WebEx


Many thanks for your prompt and willing responses, Addison! Greatly
appreciated :)

Rob

On Wed, May 27, 2015 at 9:03 AM, Phillips, Addison <addison@lab126.com>
wrote:

>  Hello Rob,
>
>
>
> Please confirm that this call would be on Wednesday, June 3rd (6 June is
> Saturday?). I’d be glad to participate. Others from the
> Internationalization WG may also wish to: you should plan for no more than
> three of us to turn up. Please provide participation information.
>
>
>
> Thanks,
>
>
>
> Addison
>
>
>
> *From:* Robert Sanderson [mailto:azaroth42@gmail.com]
> *Sent:* Wednesday, May 27, 2015 8:28 AM
> *To:* Phillips, Addison
> *Cc:* Doug Schepers; i18n WG; Richard Ishida; W3C Public Annotation List
> *Subject:* Re: Feedback on i28n in Rangefinder API
>
>
>
>
>
> Dear all,
>
>
>
> Apologies from Frederick and myself for letting the timing for the
> discussion fall off the radar.
>
>
>
> Would it be possible to join a call next week on Wednesday June 6 at 8am
> PST / 11am EST / 4pm UK / 5pm Europe to discuss internationalization issues
> regarding annotation?
>
>
>
> In particular, it would be great to make progress on the  points that
> Addison made and also the issue that Takeshi brought up at the F2F
> regarding different lengths of character strings in different (programming)
> languages.
>
>
>
> Thanks!
>
>
>
> Rob
>
>
>
>
>
>
>
> On Tue, May 12, 2015 at 1:09 PM, Phillips, Addison <addison@lab126.com>
> wrote:
>
> Some comments from reading the document through initially. I understand
> that this is a work in progress.
>
> 'caseFolding': There is a default Unicode case folding. However, it is not
> applicable in all cases. For example, see the note box in [1]. Certainly a
> default case folding could be the default. But there should be a means of
> tailoring the case fold using a language tag.
>
> 'unicodeFolding': This also presents a number of difficulties. Not just
> canonical (NFC/NFD) equivalence but also compatibility equivalence
> (NFKC/NFKD) is sometimes useful. In addition, there are textual variations
> that are not related to Unicode character properties that searches may wish
> to deal with. For example, Japanese uses both katakana and hiragana
> phonetic scripts: one might wish to normalize these differences away when
> searching text. In other words, I think probably this parameter needs more
> thought.
>
> As an aside, there are other things that you note that users might want to
> ignore/not ignore when searching. This is discussed at length in UTS#10,
> Chapter 8 [2] and language-specific tailoring and different "weights" come
> into play.
>
> 'wholeWord': This seems simple at first, but some languages (Thai,
> Japanese, Chinese) that do not use spaces between words have a difficult
> relationship with this feature. This doesn't make the feature invalid, but
> does require a health warning that the items selected may not, in fact,
> always be words.
>
> Normalization in general: it may be possible that the searched text is
> itself not provided in a normalized form. Health warnings or solid
> implementation guidance is certainly necessary here.
>
> The discussion of using Unicode decomposition in section 9 might need to
> be carefully thought through. For example, the Korean Hangul script
> decomposes in a way that might interfere with searching operations (a
> character that had a Levenshtein distance of '1' when composed might have a
> distance as large as '4' when decomposed).
>
> The example 'character count': what exactly would be counted here? Unicode
> code points? Graphemes?
>
> There are invisible characters in Unicode, such as variation selectors or
> the new emoji skin tone characters, which may not meaningfully affect the
> user's intention, but might prevent searches from being successful.
>
> Anyway, food for thought. I look forward to further discussion.
>
> ~Addison
>
> [1] http://w3c.github.io/charmod-norm/#definitionCaseFolding
> [2] http://www.unicode.org/reports/tr10/#Searching
>
> > -----Original Message-----
> > From: Doug Schepers [mailto:schepers@w3.org]
> > Sent: Tuesday, May 12, 2015 11:47 AM
> > To: i18n WG; Richard Ishida; Phillips, Addison; W3C Public Annotation
> List
> > Subject: Feedback on i28n in Rangefinder API
> >
>
> > Hi, Addison, Richard, I18n–
> >
> > Oops, hit send too soon, sorry... resending.
> >
> > (BCCing the Web Annotation WG mailing list, to keep them in the loop)
> >
> > I'd like to schedule a liaison telcon between the Internationalization
> WG and
> > the Web Annotation WG, to discuss issues around a client-side API for
> > searching for strings in a web document.
> >
> > The Web Annotation WG is chartered to deliver a spec for "fuzzy
> anchoring",
> > which basically means a way to link to a specific passage in a document,
> even
> > if there is no ID and even if the document may have changed.
> >
> > One manifestation of this is my Rangefinder API spec [1], which is
> basically a
> > find-in-page API with fuzzy matching (e.g. case folding, Levenshtein
> distance
> > tolerance, Unicode normalization [2]) and location scoping.
> >
> > For the Unicode normalization, we'd like to refer normatively to the
> updated
> > Charmod-Norm [3]. In any case, we'd like to discuss our use cases and
> > requirements around i18n with you, for your best advice on how we should
> > proceed.
> >
> > I spoke with Richard today, and he suggested the best next step would be
> > have you take a look at my rough early draft of the Rangefinder API, so
> we
> > have some basis for discussion. Please excuse the sketchy nature of the
> spec,
> > and note that the examples are illustrative but out of date with the
> spec's
> > development.
> >
> > If you want to meet, would you want to join us, or have some of us join
> you?
> > We normally meet on Wednesdays at 11am ET.
> >
> >
> > [1] http://w3c.github.io/rangefinder/
> > [2] http://w3c.github.io/rangefinder/#widl-RangeFinder-unicodeFolding
> > [3] http://www.w3.org/TR/2014/WD-charmod-norm-20140715/
> >
> > Regards–
> > –Doug
> >
>
>
>
>
>
> --
>
> Rob Sanderson
>
> Information Standards Advocate
>
> Digital Library Systems and Services
>
> Stanford, CA 94305
>



-- 
Rob Sanderson
Information Standards Advocate
Digital Library Systems and Services
Stanford, CA 94305
Received on Thursday, 28 May 2015 02:31:30 UTC