W3C home > Mailing lists > Public > public-annotation@w3.org > February 2015

Re: Rough Draft of Robust Anchoring: the RangeFinder API

From: Doug Schepers <schepers@w3.org>
Date: Wed, 25 Feb 2015 09:43:57 -0500
Message-ID: <54EDDFAD.3030601@w3.org>
To: "Kanai, Takeshi" <Takeshi.Kanai@jp.sony.com>, Randall Leeds <randall@bleeds.info>, W3C Public Annotation List <public-annotation@w3.org>
Hi, Takeshi–

Thanks for your feedback!

You can see some of my motivation for including "ASCII folding" in these 
articles [1][2]; this is also known as character-folding, 
accent-folding, diacritic-folding, and other names with more or less 
accuracy.

I agree that the definition is inadequate at this point (as I 
acknowledge out in the spec itself); the name is probably also terrible. 
I'm don't have the expertise at this point to define it better, but I'm 
very open to suggestions on improving it.

I'm especially interested in concrete suggestions, in references to 
relevant docs and specs (like the Unicode ones you point to), and in use 
cases.


(This strawman draft was mostly intended to start the conversation, 
which had precisely the intended effect. Thanks for bringing your 
expertise to the conversation; it's very helpful.)


[1] 
http://www.elasticsearch.org/guide/en/elasticsearch/guide/master/asciifolding-token-filter.html
[2] http://alistapart.com/article/accent-folding-for-auto-complete

Regards–
–Doug

On 2/25/15 5:38 AM, Kanai, Takeshi wrote:
> Hello Randall,
>
> You are right. The normalization did not change the glyph.
>
> What I wanted to do was to make sure the intention of the definition of
> “asciiFolding” in the document. I couldn’t figure out any situations at
> where it is necessary to map non-Latin characters to Latin characters,
> as non-Latin character user.
>
> Then, I interpreted it as canonical search/canonical matching, and wrote
> down the algorithm for clarification purpose, although it was still
> doing Latin to Latin mapping.
>
> What I know as “ascii folding” is to express a glyph in ASCII letters.
> For example, “☆” (U+2606) would be mapped to “STAR”. So, it works only
> for English text, I think.
>
> As non-Latin user, I translated it as “pronunciation”. (I’m not talking
> about pronunciation codes, such as IPA, just in case..)
>
> In Japanese, the star glyph should be mapped to Japanese characters
> (non-Latin), but the mapping does not always work appropriately.  I mean
> how a glyph should be mapped depends on context. In case content authors
> would like to explicitly specify folding words, or pronunciation, they
> put “Ruby annotation” on the words, especially in trade books. Then,
> “WWW” would be searchable with the words “World Wide Web”. See [1]
>
> Unlike Latin languages, pronunciation of Japanese words depends on the
> context, besides each letter consists of several meanings. So I could
> say that we are always folding back, while we are reading text.
>
> [1]
>
> http://www.w3.org/TR/ruby/#simple-ruby1
>
> Thanks,
>
> Takeshi Kanai
>
> *From:*Randall Leeds [mailto:randall@bleeds.info]
> *Sent:* Wednesday, February 25, 2015 5:42 PM
> *To:* Kanai, Takeshi; Doug Schepers; W3C Public Annotation List
> *Subject:* Re: Rough Draft of Robust Anchoring: the RangeFinder API
>
> I was the one who suggested "asciiFolding" to Doug.
>
> I wonder if unicode normalization should be implied rather than
> explicit. I am not a unicode expert but I thought normalization did not
> change the glyph, only the byte representation. If that's the case,
> maybe it's not necessary to expose that in the API.
>
> Any suggestions on how to handle this for non-latin script are very
> helpful! Thank you!
>
> On Wed Feb 25 2015 at 12:21:06 AM Kanai, Takeshi
> <Takeshi.Kanai@jp.sony.com <mailto:Takeshi.Kanai@jp.sony.com>> wrote:
>
> Hi Doug,
>
> I'm afraid that the definition of asciiFolding is not clear enough.
> Japanese characters are non-Latin characters, but I don't think it is
> possible to make a map which points to Latin characters.
>
> I assume that what we would like to do with this attribute is so called
> "canonical search" or "canonical matching".
> If so, what the attribute calls for is to apply NFC (Unicode
> Normalization Form C [1]) first and use the map defined in Unicode
> Collation Algorithm [1], for example. I don't think it is necessary to
> write down the precise algorithm into the document, but I would like to
> make sure whether the method above meets the intention of the attribute
> or not.
>
> [1] Unicode Normalization Forms
> http://unicode.org/reports/tr15/
>
> [2] Unicode Collation Algorithm
> http://unicode.org/reports/tr10/
>
>
> Thanks,
> Takeshi Kanai
>
> -----Original Message-----
> From: Doug Schepers [mailto:schepers@w3.org <mailto:schepers@w3.org>]
> Sent: Wednesday, February 25, 2015 2:48 PM
> To: W3C Public Annotation List
> Subject: Re: Rough Draft of Robust Anchoring: the RangeFinder API
>
> Hi, folks–
>
> Just a quick note. Rob asked me to move this file, to keep the
> deliverables organized. It's now located at:
>
> http://w3c.github.io/web-annotation/api/rangefinder/
>
> Even this is a temporary location, though... I'll be moving it to
> specs.webplatform.org <http://specs.webplatform.org> soon, and adding
> the annotation capability to it.
>
> Feel free to review, but be aware that the URL is transitory.
>
> Regards–
> –Doug
>
> On 2/24/15 1:33 PM, Doug Schepers wrote:
>> Hi, folks–
>>
>> After talking about Robust Anchoring with many people over the course
>> of the last couple years (!), with encouragement and good criticisms,
>> I've refined my notion of what's needed for a client-side API for
>> Robust Anchoring.
>>
>> I've drawn up a strawman of my current thinking for an API called
>> RangeFinder [1].
>>
>> It's very rough in places, but I'd appreciate any feedback on the spec
>> as it stands. I'd greatly appreciate any thoughts or opinions on it at
>> this stage.
>>
>> I'm not sure it's mature enough for this yet, but at some point, I'd
>> like to engage the research and academic communities and the experts
>> who've published on text search algorithms, to polish this up and make
>> it not quite as embarrassing as it is currently. If anyone knows who
>> we should contact in that regard, please chime in. This is a great
>> opportunity to leverage all that research in the service of Web
>> developers and browsers!
>>
>> [1]http://w3c.github.io/web-annotation/rangefinder-api/
>>
>> Regards–
>> –Doug
>>
>
Received on Wednesday, 25 February 2015 14:44:05 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 18:54:32 UTC