Re: [bp-i18n-specdev] Should include advice on what "White-space" is.

Hi Richard,

> @spemberton what do you mean? Add the information to
> or add the information to the spec you are struggling with (for which a link would be useful, btw)?

The former, since this thread is part of a discussion of what ought to be in that spec. 
Groups were asked to comment on what they would like to see in the spec, 
and so I sent in two suggestions 1) help on whitespace 2) help on what a 'letter' is.

The link:

> You asked this question some time ago, and when we discussed this in the i18n WG telecon
> our conclusion was that what should be counted as 'white space' depends very much on
> what the application is trying to do.

I quite agree.

> I agree with John that it's very unlikely that you'd want to remove NNBSP, whatever you're doing,
> since that's used almost like a letter in Mongolian, even though it just looks like a space or gap
> - removing it would really screw up the meaning of the text. 

I note that Unicode doesn't classify NNBSP as Mongolian, even if it is often used in that context.
So apparently NNBSP is dependent on its context, and thus it may depend on its context whether it
should be removed or not (which is a shame; it would be better to *not* use NNBSP in Mongolian,
and have a special character that reflects this special usage.)

> By the way, does your application for white space removal care about invisible characters
> (such as Mongolian variant selectors?) or are you just worried about blank gaps appearing
> between other characters? And what is the motivation for removing 'white space' in your case?

We have a meta-usecase. There are certain values (in the world) that do not permit spaces in certain places, but that may be input with spaces nevertheless. Our classic example is credit card numbers, which people often input with spaces, but which need to be removed. But don't fixate on that one example, because there's an unbounded number of them: people copy and pasting numbers from a report where the numbers have been formatted with spaces, addresses where only single spaces are permitted, email addresses where leading and trailing spaces need to be elided. Values often come from copy/pasting, as well as keyboard input. We don't know all the use cases people have, but we do allow people to mark values as needing to have whitespace removed or compressed on input.

There is no good resource that we could find that helps with the definition of whitespace. Different specs seem to do different things. bp-i18n-specdev seems like an ideal candidate.

GitHub Notification of comment by spemberton
Please view or discuss this issue at using your GitHub account

Received on Wednesday, 22 March 2017 15:22:26 UTC