Re: i18n-ISSUE-411: Definition of whitespace should come from Unicode

On Mar 7, 2015 2:04 AM, "Andrew Sullivan" <ajs@anvilwalrusden.com> wrote:
>
> On Thu, Mar 05, 2015 at 10:38:01PM -0500, John Cowan wrote:
> > No, since you ask.  We use Unicode, but we don't require that every
> > non-printing character be recognized as a delimiter.
>
> What I worry about is inconsistent handling of whitespace across
> implementations.  But anyway, I guess this isn't really the place to
> fix that up, since it'd be all over XML anyway, right?  (I guess I'm
> just sensitive to this right now because the IETF tried to do clever
> things with paring down Unicode to things we wanted, and it isn't
> working quite as we'd hoped.)

I suspect that whitespace is pretty consistently treated as the four
control codes this point. In 2006 I tried a more inclusive definition of
whitespace in SPARQL but folks said "what the hell is this? Everybody knows
that whitespace is four characters." Had things like non-breaking,
zero-width, all-singing space stayed in SPARQL, parsers would have required
multi-byte lexers and the interoperability of incomplete implementations
would have suffered.

The downside is that someone typing in some script with its own whitespace
(does that exist?) must use ASCII space, but they have to anyways because
all of the language keywords are in ASCII.

> A
>
> --
> Andrew Sullivan
> ajs@anvilwalrusden.com
>

Received on Saturday, 7 March 2015 07:27:11 UTC