Re: What to do about newlines in attribute values?

On Fri, Sep 14, 2012 at 9:25 PM, James Clark <jjc@jclark.com> wrote:

>
>
> On Sat, Sep 15, 2012 at 3:31 AM, Uche Ogbuji <uche@ogbuji.net> wrote:
>
>>
>> And I want to reiterate that for me the compatibility goal means all
>> MicroXML docs are WF XML.  I don not think we should be constrained to have
>> a fully backward compatible data model, though I agree we should carefully
>> consider each DM incompatibility we introduce (as we're doing in this case).
>>
>
> I would put things slightly differently.  I think compatibility with the
> data model is a goal, but it's not a hard constraint: we might not achieve
> it fully when it conflicts with other goals.
>

Good way to put it.



> This issue is very difficult because it involves conflicting goals.
>
> (a) normalize newlines/tabs to spaces: this conflicts with our goal to
> minimize the ugliness/weirdness in MicroXML (we ought to formulate this as
> an explicit design goal)
>

How about just expanding #3:

MicroXML shall be dramatically simpler than XML,  as regards its
specification, syntax and data model, and shall contain fewer surprises in
these areas



> (b) no literal tabs and newlines in attribute values: this conflicts with
> our goal of supporting authoring in plain text editors; the forbidding tabs
> aspect also conflicts (to a lesser extent) with our goal to minimize
> ugliness/weirdness
>
> (c) newlines in attribute values allowed and left as newlines: this
> conflicts with the goal of data model compatibility
>
> How bad is (c)?
>
> - it's an uncorrectable difference in parsing; you cannot fix it up with a
> post-parsing stage, because newlines/tabs that are entered as numeric
> character references are preserved in XML, and the data model does not tell
> you which characters came from numeric character
>

Yeah, I think we'd accepted that it's pretty much uncorrectable post-pase,
but thanks for this clear formulation of why.  If we do go with (c) we
should probably add such wording to B2 of "Appendix B: Relationship to XML
(informative)"



> - any difference in parsing however slight does make a difference for some
> applications like digital signatures
>

> - with our hardline approach in excluding XML Namespaces, this is
> currently the only data model incompatibility: there's a big difference
> between being 100% compatible and being anything less; in the former case
> users don't have to think about the issue at all
>
> - being 100% compatible would make the "marketing" message much crisper
> and easier to understand
>
> On the other hand
>
> - I don't think any users would actually want the XML behaviour
>
> - I suspect applications that deal with token lists will handle newlines
> and tabs just fine (anything that follows either the XML Schema rules on
> lists or the RELAX NG rules will handle them)
>
> The tab issue makes (b) less attractive to me.
>
> As for (a), it also seems really bad to me: this would be the only thing
> in the rules of how to construct the data model from a WF MicroXML document
> that would be totally surprising to somebody unfamiliar with XML.
>
> I think my preference is (c).
>

+1


-- 
Uche Ogbuji                       http://uche.ogbuji.net
Founding Partner, Zepheira        http://zepheira.com
http://wearekin.org
http://www.thenervousbreakdown.com/author/uogbuji/
http://copia.ogbuji.net
http://www.linkedin.com/in/ucheogbuji
http://twitter.com/uogbuji

Received on Saturday, 15 September 2012 05:18:25 UTC