RE: Rough Draft of Robust Anchoring: the RangeFinder API from Kanai, Takeshi on 2015-02-25 (public-annotation@w3.org from February 2015)

From: Kanai, Takeshi <Takeshi.Kanai@jp.sony.com>
Date: Wed, 25 Feb 2015 10:38:45 +0000
To: Randall Leeds <randall@bleeds.info>, Doug Schepers <schepers@w3.org>, "W3C Public Annotation List" <public-annotation@w3.org>
Message-ID: <E72CF575142F6D4196D04D303E0462DE04E3CC18@JPYOKXMS120.jp.sony.com>
Hello Randall,

You are right. The normalization did not change the glyph.

What I wanted to do was to make sure the intention of the definition of “asciiFolding” in the document. I couldn’t figure out any situations at where it is necessary to map non-Latin characters to Latin characters, as non-Latin character user.
Then, I interpreted it as canonical search/canonical matching, and wrote down the algorithm for clarification purpose, although it was still doing Latin to Latin mapping.


What I know as “ascii folding” is to express a glyph in ASCII letters. For example, “☆” (U+2606) would be mapped to “STAR”. So, it works only for English text, I think.
As non-Latin user, I translated it as “pronunciation”. (I’m not talking about pronunciation codes, such as IPA, just in case..)
In Japanese, the star glyph should be mapped to Japanese characters (non-Latin), but the mapping does not always work appropriately.  I mean how a glyph should be mapped depends on context. In case content authors would like to explicitly specify folding words, or pronunciation, they put “Ruby annotation” on the words, especially in trade books. Then, “WWW” would be searchable with the words “World Wide Web”. See [1]

Unlike Latin languages, pronunciation of Japanese words depends on the context, besides each letter consists of several meanings. So I could say that we are always folding back, while we are reading text.


[1]
http://www.w3.org/TR/ruby/#simple-ruby1



Thanks,
Takeshi Kanai

From: Randall Leeds [mailto:randall@bleeds.info]
Sent: Wednesday, February 25, 2015 5:42 PM
To: Kanai, Takeshi; Doug Schepers; W3C Public Annotation List
Subject: Re: Rough Draft of Robust Anchoring: the RangeFinder API

I was the one who suggested "asciiFolding" to Doug.

I wonder if unicode normalization should be implied rather than explicit. I am not a unicode expert but I thought normalization did not change the glyph, only the byte representation. If that's the case, maybe it's not necessary to expose that in the API.
Any suggestions on how to handle this for non-latin script are very helpful! Thank you!

On Wed Feb 25 2015 at 12:21:06 AM Kanai, Takeshi <Takeshi.Kanai@jp.sony.com<mailto:Takeshi.Kanai@jp.sony.com>> wrote:
Hi Doug,

I'm afraid that the definition of asciiFolding is not clear enough. Japanese characters are non-Latin characters, but I don't think it is possible to make a map which points to Latin characters.

I assume that what we would like to do with this attribute is so called "canonical search" or "canonical matching".
If so, what the attribute calls for is to apply NFC (Unicode Normalization Form C [1]) first and use the map defined in Unicode Collation Algorithm [1], for example. I don't think it is necessary to write down the precise algorithm into the document, but I would like to make sure whether the method above meets the intention of the attribute or not.

[1] Unicode Normalization Forms
http://unicode.org/reports/tr15/


[2] Unicode Collation Algorithm
http://unicode.org/reports/tr10/



Thanks,
Takeshi Kanai

-----Original Message-----
From: Doug Schepers [mailto:schepers@w3.org<mailto:schepers@w3.org>]
Sent: Wednesday, February 25, 2015 2:48 PM
To: W3C Public Annotation List
Subject: Re: Rough Draft of Robust Anchoring: the RangeFinder API

Hi, folks–

Just a quick note. Rob asked me to move this file, to keep the deliverables organized. It's now located at:

  http://w3c.github.io/web-annotation/api/rangefinder/


Even this is a temporary location, though... I'll be moving it to specs.webplatform.org<http://specs.webplatform.org> soon, and adding the annotation capability to it.

Feel free to review, but be aware that the URL is transitory.

Regards–
–Doug

On 2/24/15 1:33 PM, Doug Schepers wrote:
> Hi, folks–
>
> After talking about Robust Anchoring with many people over the course
> of the last couple years (!), with encouragement and good criticisms,
> I've refined my notion of what's needed for a client-side API for
> Robust Anchoring.
>
> I've drawn up a strawman of my current thinking for an API called
> RangeFinder [1].
>
> It's very rough in places, but I'd appreciate any feedback on the spec
> as it stands. I'd greatly appreciate any thoughts or opinions on it at
> this stage.
>
> I'm not sure it's mature enough for this yet, but at some point, I'd
> like to engage the research and academic communities and the experts
> who've published on text search algorithms, to polish this up and make
> it not quite as embarrassing as it is currently. If anyone knows who
> we should contact in that regard, please chime in. This is a great
> opportunity to leverage all that research in the service of Web
> developers and browsers!
>
> [1] http://w3c.github.io/web-annotation/rangefinder-api/

>
> Regards–
> –Doug
>
Received on Wednesday, 25 February 2015 10:39:26 UTC