[Bug 6808] [Ser11] Whitespacing rules are too restrictive for the indent parameter

http://www.w3.org/Bugs/Public/show_bug.cgi?id=6808





--- Comment #9 from Henry Zongaro <zongaro@ca.ibm.com>  2010-02-01 21:37:21 ---
Section 5.1.3 of the draft of Serialization 1.1 dated 15 December 2009 contains
a very slightly improved variation of the text proposed in comment #3.  Here's
a proposed replacement for that section, up to, but not including, the two
notes.  In it, I've tried to state some of the requirements in the positive, to
make it more clear just where changes to whitespace characters may be made.

Replace the first paragraph, bulleted list and second paragraph of section
5.1.3 with the following:

-------------------------------------------------------------------------------
The indent parameter controls whether the serializer MAY adjust the whitespace
in the serialized result so that a person will find it easier to read.  If the
indent parameter has the value yes, the serializer MAY output whitespace
characters in addition to the whitespace characters in the instance of the data
model.  It MAY also elide from the output whitespace characters that occurred
in the instance of the data model or replace such whitespace characters with
other whitespace characters.  If the indent parameter has the value no, the
serializer MUST NOT output any additional, elide or replace whitespace
characters. If the indent parameter has the value yes, the serializer MUST use
an algorithm for dealing with whitespace characters that satisfies all of the
following constraints:

* Whitespace characters MAY be added adjacent to a text node, only if the text
node contains only whitespace characters.  Whitespace characters in such a text
node MAY also be elided or replaced.  For example, a tab MAY be inserted as a
replacement for existing spaces.
* Whitespace characters MAY be added, elided or replaced in the content of an
element whose type annotation is xs:untyped or xs:anyType and that has element
node children, in the content of an element whose content model is element
only, or outside the content of any element.
* Whitespace characters MUST NOT be added, elided or replaced in the content of
an element whose content model is known to be simple or empty.
* Whitespace characters SHOULD NOT be added, elided or replaced in places where
the characters would constitute significant whitespace, for example, in the
content of an element that is annotated with a type other than xs:untyped or
xs:anyType, and whose content model is known to be mixed.
* Whitespace characters MUST NOT be added, elided or replaced in the content of
an element whose expanded QName is a member of the list of expanded QNames in
the value of the suppress-indentation parameter.
* Whitespace characters MUST NOT be added, elided or replaced in a part of the
result document that is controlled by an xml:space attribute with value
preserve. (See [XML10] for more information about the xml:space attribute.)
-------------------------------------------------------------------------------

The word "content" in the above will be made to refer to XML 1.0's definition
of "content"[4] - to wit, "The text between the start-tag and end-tag is called
the element's content."

Following are some examples for which the rules have changed:

(i) <doc/>
(ii) <doc><!-- foo --></doc>
(iii) <doc><!-- foo --><ch/></doc>

Whitespace could be added to the content of doc in (i) or (ii) if doc is known
to have element-only content; that was not permitted at all in Serialization
1.0.

Whitespace could be added anywhere as a child of <doc> in (iii); in
Serialization 1.0 it could only be added before or after the <ch/> tag.

[3]
http://www.w3.org/TR/2009/WD-xslt-xquery-serialization-11-20091215/#xml-indent
[4] http://www.w3.org/TR/2008/REC-xml-20081126/#dt-content


-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Monday, 1 February 2010 21:37:24 UTC