W3C home > Mailing lists > Public > www-international@w3.org > January to March 2013

Re: I18N-ISSUE-247: Clarify interpretation of line breaks when calculating storage size [ITS-20]

From: Anne van Kesteren <annevk@annevk.nl>
Date: Fri, 29 Mar 2013 11:07:11 +0000
Message-ID: <CADnb78hkLi2LjTQ_ivjyJJELkbwS+amY_2BE0pXWYgsH7TLmdQ@mail.gmail.com>
To: Jirka Kosek <jirka@kosek.cz>
Cc: "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>, www-international@w3.org
On Fri, Mar 29, 2013 at 10:52 AM, Jirka Kosek <jirka@kosek.cz> wrote:
> "The storage size is expressed in bytes and is provided along with the
> character set encoding and the line break type which will be used when
> the content is stored."

This does not really tell you if line breaks are normalized.


> In XML content all line breaks are normalized to LF so only LF are
> considered as line breaks (but in source XML file you can use other
> representations of line breaks recognized by version of XML you use). If
> you think that this still needs to be explicitly written in spec, we can
> add note along those lines.

XML 1.0 does not normalize U+0085. Neither does any other sane format.
Is it converted here? What if combined with other line break
characters? Also, XML 1.0 still allows insertion of CR via &#x0D;
which when you get to the DOM-level will be a U+000D (which is what
you'd store presumably).

As for insane formats, XML 1.1 has U+2028, should that be normalized
for the purposes of storing?


-- 
http://annevankesteren.nl/
Received on Friday, 29 March 2013 11:07:40 UTC

This archive was generated by hypermail 2.3.1 : Friday, 29 March 2013 11:07:41 UTC