W3C home > Mailing lists > Public > whatwg@whatwg.org > December 2006

[whatwg] finding a number...

From: Charles McCathieNevile <chaals@opera.com>
Date: Wed, 13 Dec 2006 20:28:09 +0530
Message-ID: <op.tkh6i7cewxe0ny@widsith.local>
On Wed, 13 Dec 2006 19:43:10 +0530, Mikko Rantalainen  
<mikko.rantalainen at peda.net> wrote:

> Charles McCathieNevile wrote:
>> On Wed, 13 Dec 2006 13:17:14 +0530, Henri Sivonen <hsivonen at iki.fi>  
>> wrote:
>>> On Dec 13, 2006, at 08:32, Charles McCathieNevile wrote:
>>>> possible *and no simpler* - this is too simple. Maybe assuming you  
>>>> can  parse numbers out of text is just a dumb idea as a normative  
>>>> part of a  spec.
>>> The attributes always work for any language. For English, the   
>>> textContent works as a *bonus*. It isn't that the spec fails to work  
>>> for  non-English. It is just that a particular *redundant* bonus  
>>> feature  doesn't work for non-English.
>>  The problem with this is that it means copying code the natural way   
>> doesn't work for some non-english speakers, and they have to read the  
>> spec  or guess why. [...]
>
> I think that "they have to read the spec" is a bonus, too.

Yeah, except it turns out to be wishful thinking of the kind WHATWG tries  
strenuously to avoid :( And where the problem is that people who  
habitually use conventions for numbers, it turns out that many of them  
don't really read english documents or mailing lists either...

> Perhaps the parser could be specified as follows:
>
> regexp for "numeric value" is [0-9 ,.]
> scan the numeric value backwards from end
> first character matching regexp [,.] is the decimal separator
>
> This would correctly interpret numbers such as
>
> 1,251,152.124
> 634.46
> 453.436.346,235

  This last is the important use case that the existing method fails.

> 23 236 435 123,121
>
> It would fail for numbers such as
>
> 1,234,456.789,012
> 1.234.456,789.012
>
> but that such formats used in any locale?

Not that I know of. Formats I know of use ".", "," or " " as seperators  
for integer amounts, and "," or "." for decimal seperators. The only  
seperators I know of inside the decimal part are "-", "e" and "E". I can  
imagine someone using the notation for web content in a meter, but I am  
not sure that it is likely.

Of course there are a handful of other types of numbers. One thing that is  
helpful is that in hebrew and arabic, numbers are written LTR even though  
the rest of the text isn't. I am not sure about other LTR languages -  
apparently there are a couple of Indic ones. On the other hand, since I am  
going to meet a handful of people this weekend who specialise in  
publishing for the Indian government, in at least their 22  
constitutionally official languages, I will try to remember to ask. One  
thing that is unhelpful is that in some languages numbers are written  
using ordinary letters. Although I suspect this use is very rare on the  
web, as I believe it is pretty much archaic in the relevant languages.

This is, of course, going down the path of specifying internationalised  
number picking - something that some people are ust dead against.

cheers

Chaals

-- 
   Charles McCathieNevile, Opera Software: Standards Group
   hablo espa?ol  -  je parle fran?ais  -  jeg l?rer norsk
chaals at opera.com          Try Opera 9 now! http://opera.com
Received on Wednesday, 13 December 2006 06:58:09 UTC

This archive was generated by hypermail 2.3.1 : Monday, 13 April 2015 23:08:31 UTC