- From: Steven Atkin <atkin@us.ibm.com>
- Date: Tue, 28 Apr 2015 07:42:04 -0400
- To: "Eric Prud'hommeaux" <eric@w3.org>
- Cc: Andrew Sullivan <ajs@anvilwalrusden.com>, "Asmus Freytag (t)" <asmus-inc@ix.netcom.com>, John Cowan <cowan@mercury.ccil.org>, cowan@ccil.org, public-ldp-comments@w3.org, www-international@w3.org
- Message-ID: <OF9FD6A5E1.FFD95F64-ON85257E35.0040397C-85257E35.00404701@us.ibm.com>
I am satisfied with using the XML definition of whitespace.
Steven Atkin, Ph.D.
STSM - Chief Globalization Architect
IBM Globalization Center of Competency
atkin@us.ibm.com
http://www-3.ibm.com/software/globalization/index.jsp
From:	"Eric Prud'hommeaux" <eric@w3.org>
To:	Steven Atkin/Austin/IBM@IBMUS
Cc:	public-ldp-comments@w3.org, Andrew Sullivan
            <ajs@anvilwalrusden.com>, cowan@ccil.org, "Asmus Freytag (t)"
            <asmus-inc@ix.netcom.com>, John Cowan <cowan@mercury.ccil.org>,
            www-international@w3.org
Date:	04/27/2015 09:13 AM
Subject:	WG response to i18n-ISSUE-411: Definition of whitespace should
            come  from Unicode
Hi Steven, I believe we have closure on 409 and 410, but we haven't
heard back on 411.
* Eric Prud'hommeaux <eric@w3.org> [2015-03-27 11:57-0400]
> This thread went on to discuss the usage of the various whitespace
> characters but I believe there was consensus that programming/data
> languages should use U+0009, U+000A, U+000D, U+0020 as whitespace.
> The LDP WG believes this resolves this comment with no edits required.
> Steven Atkin, as the originator of this comment, can you confirm?
>
>
> On Sat, Mar 7, 2015 at 6:05 AM, Eric Prud'hommeaux <eric@w3.org> wrote:
> >
> > On Mar 7, 2015 9:04 AM, "Asmus Freytag (t)" <asmus-inc@ix.netcom.com>
wrote:
> >>
> >> On 3/6/2015 11:26 PM, Eric Prud'hommeaux wrote:
> >>>
> >>>
> >>> On Mar 7, 2015 2:04 AM, "Andrew Sullivan" <ajs@anvilwalrusden.com>
wrote:
> >>> >
> >>> > On Thu, Mar 05, 2015 at 10:38:01PM -0500, John Cowan wrote:
> >>> > > No, since you ask.  We use Unicode, but we don't require that
every
> >>> > > non-printing character be recognized as a delimiter.
> >>> >
> >>> > What I worry about is inconsistent handling of whitespace across
> >>> > implementations.  But anyway, I guess this isn't really the place
to
> >>> > fix that up, since it'd be all over XML anyway, right?  (I guess
I'm
> >>> > just sensitive to this right now because the IETF tried to do
clever
> >>> > things with paring down Unicode to things we wanted, and it isn't
> >>> > working quite as we'd hoped.)
> >>>
> >>> I suspect that whitespace is pretty consistently treated as the four
> >>> control codes this point. In 2006 I tried a more inclusive definition
of
> >>> whitespace in SPARQL but folks said "what the hell is this? Everybody
knows
> >>> that whitespace is four characters." Had things like non-breaking,
> >>> zero-width, all-singing space stayed in SPARQL, parsers would have
required
> >>> multi-byte lexers and the interoperability of incomplete
implementations
> >>> would have suffered.
> >>>
> >>> The downside is that someone typing in some script with its own
> >>> whitespace (does that exist?) must use ASCII space, but they have to
anyways
> >>> because all of the language keywords are in ASCII.
> >>
> >>
> >> For programming languages, sticking to the basic set for syntax
purposes
> >> makes a certain amount of sense.
> >>
> >> When you are dealing with text data, or free-form input, this approach
can
> >> be unnecessarily limiting.
> >>
> >> All the markup languages have the issue that both language syntax and
text
> >> content reside in the same "plain-text" file, leading to complicated
rules
> >> about which whitespace characters are part of the text content and
which are
> >> to be ignored for text purpose for being syntax characters.
> >
> > I completely agree with your analysis.
> >
> >> However, Andrew's point is well taken - it's important to not let the
> >> programmer's attitude infect those parts of whatever protocol is being
> >> designed that are concerned with representing full-text data. It
better be
> >> possible to not only represent all space characters (and zero width
> >> characters), but to have them act on the text in the way they are
defined in
> >> Unicode when segmenting text for whatever purpose.
> >
> > That makes sense to me. I think that both XML and RDF are languages
upon
> > which such applications would be built. In a sense, the only way they
can
> > screw up would be to not permit non-ASCII whitespace characters. Do you
> > agree?
> >
> >> A./
> >>
> >>> > A
> >>> >
> >>> > --
> >>> > Andrew Sullivan
> >>> > ajs@anvilwalrusden.com
> >>> >
> >>
> >>
>
>
>
> --
> -ericP
> office: +1.617.258.5741
> mobile: +1.617.599.3509
>
--
-ericP
office: +1.617.599.3509
mobile: +33.6.80.80.35.59
(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.
There are subtle nuances encoded in font variation and clever layout
which can only be seen by printing this message on high-clay paper.
Attachments
- image/gif attachment: graycol.gif
   
Received on Tuesday, 28 April 2015 11:46:27 UTC