Re: What to do about newlines in attribute values?

On Fri, Sep 14, 2012 at 9:25 PM, James Clark <> wrote:

> On Sat, Sep 15, 2012 at 3:31 AM, Uche Ogbuji <> wrote:
>> And I want to reiterate that for me the compatibility goal means all
>> MicroXML docs are WF XML.  I don not think we should be constrained to have
>> a fully backward compatible data model, though I agree we should carefully
>> consider each DM incompatibility we introduce (as we're doing in this case).
> I would put things slightly differently.  I think compatibility with the
> data model is a goal, but it's not a hard constraint: we might not achieve
> it fully when it conflicts with other goals.

Good way to put it.

> This issue is very difficult because it involves conflicting goals.
> (a) normalize newlines/tabs to spaces: this conflicts with our goal to
> minimize the ugliness/weirdness in MicroXML (we ought to formulate this as
> an explicit design goal)

How about just expanding #3:

MicroXML shall be dramatically simpler than XML,  as regards its
specification, syntax and data model, and shall contain fewer surprises in
these areas

> (b) no literal tabs and newlines in attribute values: this conflicts with
> our goal of supporting authoring in plain text editors; the forbidding tabs
> aspect also conflicts (to a lesser extent) with our goal to minimize
> ugliness/weirdness
> (c) newlines in attribute values allowed and left as newlines: this
> conflicts with the goal of data model compatibility
> How bad is (c)?
> - it's an uncorrectable difference in parsing; you cannot fix it up with a
> post-parsing stage, because newlines/tabs that are entered as numeric
> character references are preserved in XML, and the data model does not tell
> you which characters came from numeric character

Yeah, I think we'd accepted that it's pretty much uncorrectable post-pase,
but thanks for this clear formulation of why.  If we do go with (c) we
should probably add such wording to B2 of "Appendix B: Relationship to XML

> - any difference in parsing however slight does make a difference for some
> applications like digital signatures

> - with our hardline approach in excluding XML Namespaces, this is
> currently the only data model incompatibility: there's a big difference
> between being 100% compatible and being anything less; in the former case
> users don't have to think about the issue at all
> - being 100% compatible would make the "marketing" message much crisper
> and easier to understand
> On the other hand
> - I don't think any users would actually want the XML behaviour
> - I suspect applications that deal with token lists will handle newlines
> and tabs just fine (anything that follows either the XML Schema rules on
> lists or the RELAX NG rules will handle them)
> The tab issue makes (b) less attractive to me.
> As for (a), it also seems really bad to me: this would be the only thing
> in the rules of how to construct the data model from a WF MicroXML document
> that would be totally surprising to somebody unfamiliar with XML.
> I think my preference is (c).


Uche Ogbuji             
Founding Partner, Zepheira

Received on Saturday, 15 September 2012 05:18:25 UTC