Re: I18N-ISSUE-247: Clarify interpretation of line breaks when calculating storage size [ITS-20]

On Fri, Mar 29, 2013 at 10:52 AM, Jirka Kosek <jirka@kosek.cz> wrote:
> "The storage size is expressed in bytes and is provided along with the
> character set encoding and the line break type which will be used when
> the content is stored."

This does not really tell you if line breaks are normalized.


> In XML content all line breaks are normalized to LF so only LF are
> considered as line breaks (but in source XML file you can use other
> representations of line breaks recognized by version of XML you use). If
> you think that this still needs to be explicitly written in spec, we can
> add note along those lines.

XML 1.0 does not normalize U+0085. Neither does any other sane format.
Is it converted here? What if combined with other line break
characters? Also, XML 1.0 still allows insertion of CR via &#x0D;
which when you get to the DOM-level will be a U+000D (which is what
you'd store presumably).

As for insane formats, XML 1.1 has U+2028, should that be normalized
for the purposes of storing?


-- 
http://annevankesteren.nl/

Received on Friday, 29 March 2013 11:07:40 UTC