Re: What to do about newlines in attribute values?

On Sat, Sep 15, 2012 at 3:31 AM, Uche Ogbuji <uche@ogbuji.net> wrote:

>
> And I want to reiterate that for me the compatibility goal means all
> MicroXML docs are WF XML.  I don not think we should be constrained to have
> a fully backward compatible data model, though I agree we should carefully
> consider each DM incompatibility we introduce (as we're doing in this case).
>

I would put things slightly differently.  I think compatibility with the
data model is a goal, but it's not a hard constraint: we might not achieve
it fully when it conflicts with other goals.

This issue is very difficult because it involves conflicting goals.

(a) normalize newlines/tabs to spaces: this conflicts with our goal to
minimize the ugliness/weirdness in MicroXML (we ought to formulate this as
an explicit design goal)

(b) no literal tabs and newlines in attribute values: this conflicts with
our goal of supporting authoring in plain text editors; the forbidding tabs
aspect also conflicts (to a lesser extent) with our goal to minimize
ugliness/weirdness

(c) newlines in attribute values allowed and left as newlines: this
conflicts with the goal of data model compatibility

How bad is (c)?

- it's an uncorrectable difference in parsing; you cannot fix it up with a
post-parsing stage, because newlines/tabs that are entered as numeric
character references are preserved in XML, and the data model does not tell
you which characters came from numeric character

- any difference in parsing however slight does make a difference for some
applications like digital signatures

- with our hardline approach in excluding XML Namespaces, this is currently
the only data model incompatibility: there's a big difference between being
100% compatible and being anything less; in the former case users don't
have to think about the issue at all

- being 100% compatible would make the "marketing" message much crisper and
easier to understand

On the other hand

- I don't think any users would actually want the XML behaviour

- I suspect applications that deal with token lists will handle newlines
and tabs just fine (anything that follows either the XML Schema rules on
lists or the RELAX NG rules will handle them)

The tab issue makes (b) less attractive to me.

As for (a), it also seems really bad to me: this would be the only thing in
the rules of how to construct the data model from a WF MicroXML document
that would be totally surprising to somebody unfamiliar with XML.

I think my preference is (c).

James

Received on Saturday, 15 September 2012 03:26:32 UTC