Re: IDL: special DOMString that converts to Unicode from Brendan Eich on 2012-10-29 (public-script-coord@w3.org from October to December 2012)

From: Brendan Eich <brendan@mozilla.org>
Date: Sun, 28 Oct 2012 17:05:22 -0700
To: Simon Pieters <simonp@opera.com>, Anne van Kesteren <annevk@annevk.nl>
CC: Robin Berjon <robin@w3.org>, public-script-coord@w3.org
Message-ID: <508DC842.3070200@mozilla.org>

Simon Pieters wrote:
> On Mon, 29 Oct 2012 00:14:59 +0200, Robin Berjon <robin@w3.org> wrote:
>
>> On 26/10/2012 12:59 , Anne van Kesteren wrote:
>>> I think we should introduce UTFString to make this conversion explicit
>>> and not bother tons of standards with boilerplate language that is
>>> easily forgotten.
>>
>> [Bikeshed Advisory]
>>
>> In which case can we just call it "string"?
>
> I think it's better if it's clear that it's only supposed to be used 
> where the overhead of ensuring Unicode cleanness is necessary

Yet Anne wrote "There is quite a number of APIs that take a string and 
expect it to not contain code unit garbage (lone surrogates)"-- which 
probably conflicts with "only ... used where the overhead ... is 
necessary". "quite a number" vs. "only" is a red flag.

Where is the "overhead necessary", exactly? If it's all over the place, 
we'll have a problem with developers and implementors, who won't want to 
bottleneck on (re-)checking all the time.

We'd want a checked type to formalize (if possible) that the re-checking 
has been minimized for a given implementation.

But why are we checking at all? JS allows naughty strings to be formed, 
but if there's a wire protocol that forbids them, then the API to speak 
that protocol should hide the detail of the implementation doing the 
checking. Push the checking out to the edge of the system where i/o happens.

/be

> , which "string" doesn't communicate. UTFString seems a bit clearer; 
> DOMStringWithoutLoneSurrogates is clearer still but quite long.
>

Received on Monday, 29 October 2012 00:05:59 UTC