Re: Comments on the WD - A proposed alternative from Dan Connolly on 2000-03-23 (www-xml-canonicalization-comments@w3.org from March 2000)

From: Dan Connolly <connolly@w3.org>
Date: Thu, 23 Mar 2000 16:54:24 -0600
To: www-xml-canonicalization-comments@w3.org
Message-ID: <38DAA0A0.29E8816B@w3.org>

I think this is a cool idea. The WG has been busy with other
stuff, but I hope we get around to discussing it before
too much longer.

I've already started using this idiom in various tools that
I write that generate XML, and it works nicely.

I hope that all XML generation APIs/libraries/whatever
support this -- whether for c14n or just for regular
XML writing -- before long. Maybe I'll propose it
as an option for the XSLT xml output method.

> From: Arjun Ray (aray@q2.net)
> Date: Sun, Feb 20 2000
[...]
> (2) An opportunity seems to have been foregone to make other kinds of
> comparison techniques easy to exploit.  I have the UN*X 'diff' command
> in mind, specifically.  It works with the present format, but not
> necessarily at an easily used granularity - mainly because more than
> one information item can occur on the same line.
> 
> I believe a line-oriented approach to canonicalizing the *markup* of
> the document offers just as many advantages as the current proposal,
> eliminates the factitious line-feeds after PIs, and offers "low-tech"
> benefits to the eponymous DPH and his harried brethren.
> 
> The alternative retains from the current proposal all rules regarding
> 
>   1.  Whitespace normalization in "informative" data.
>   2.  Character escaping.
>   3.  Namespace renaming and propagation to subelements, etc.
>   4.  Lexicographic ordering of attributes.
> 
> (and any others I missed:))
> 
> The difference is in how tags and PIs are represented.  Specifically
> 
>    1.  These are immediately followed by a newline:
>          a.  The generic identifier of a start-tag.
>          b.  The generic identifier of an end-tag.
>          c.  The target of a PI.
> 
>    2.   Each attribute specification is on a separate line (i.e.
>         ends with a #xA.)
> 
>    3.   These all start on a new line:
>          a.   The '>' or '/>' of a start-tag (as a consequence of
>               Rules 1 and 2).
>          b.   The '>' of an end-tag (from Rule 1).
>          c.   The '?>' terminating a PI, usually by the insertion
>               of an immediately preceding #xA.
> 
> In eliminating the mew-lines following PIs in the current proposal,
> and significantly enhancing the utility of line-oriented text
> processing tools in dealing with canonicalized documents, I believe
> this alternative is worth considering.
> 
> That is, if I haven't missed something crushingly obvious:)
> 
> Arjun

-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/

Received on Thursday, 23 March 2000 18:03:11 UTC