Re: I18N-ISSUE-247: Clarify interpretation of line breaks when calculating storage size [ITS-20]

On 29.3.2013 12:07, Anne van Kesteren wrote:
>> In XML content all line breaks are normalized to LF so only LF are
>> considered as line breaks (but in source XML file you can use other
>> representations of line breaks recognized by version of XML you use). If
>> you think that this still needs to be explicitly written in spec, we can
>> add note along those lines.
> 
> XML 1.0 does not normalize U+0085. Neither does any other sane format.
> Is it converted here? What if combined with other line break
> characters? Also, XML 1.0 still allows insertion of CR via 
> which when you get to the DOM-level will be a U+000D (which is what
> you'd store presumably).

OK I see where you going. What about adding something like:

"For purposes of storage size calculations ITS processor MUST behave as
if line ends were normalized accordingly to
http://www.w3.org/TR/REC-xml/#sec-line-ends (or to
http://www.w3.org/TR/xml11/#sec-line-ends if XML 1.1 is used) and only
LINE FEED (U+000A) character is then considered as a line break."

So for XML 1.0 U+0085 will not be considered as a line break, same for


> As for insane formats, XML 1.1 has U+2028, should that be normalized
> for the purposes of storing?

For XML 1.1 yes. But with XML 1.1 we are moving into Alice in Wonderland
universe which we can safely left little bit underspecified IMHO.

   Jirka


-- 
------------------------------------------------------------------
  Jirka Kosek      e-mail: jirka@kosek.cz      http://xmlguru.cz
------------------------------------------------------------------
       Professional XML consulting and training services
  DocBook customization, custom XSLT/XSL-FO document processing
------------------------------------------------------------------
 OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 rep.
------------------------------------------------------------------
    Bringing you XML Prague conference    http://xmlprague.cz
------------------------------------------------------------------

Received on Friday, 29 March 2013 11:37:53 UTC