General last call comments from Martin J. Duerst on 2000-02-08 (www-xml-canonicalization-comments@w3.org from February 2000)

From: Martin J. Duerst <duerst@w3.org>
Date: Tue, 08 Feb 2000 16:01:07 +0900
To: www-xml-canonicalization-comments@w3.org
Message-Id: <200002080658.PAA22368@sh.w3.mag.keio.ac.jp>
Here are some general last call comments:

- At the end of section 1, there should be a short overview over
  the spec, about one sentence per section. Most of this can be
  done by just saying that point 1 is discussed in section2, and
  so on.

- At the start of sec. 2, after '[Infoset]', it should say whether
  canonical XML includes all the required info items, and required
  properties thereoff, or where there are differences. There are
  some details about this later for individual items, but it would
  be good to have a general summary of this relationship.

- 2.3, and others: each subsection should fully list the info items
  that are included, to increase readability and presentational
  uniformity of the spec.

- 2.4: 'For those which': what does 'those' refer to? Information
  items? Processing instructions?

- 2.5: 'could' -> 'can'.

- Both in the intro and in 2.6, there should be some text explaining
  how whitespace is treated.

- There is a spurious empty paragraph at the start of sec. 4p

- Sec. 4: 'suppose' ... 'then if': Rewrite, e.g. as 'Given a file...
  and the following XML document'...

- Sec. 5: The syntax rules at the start of this section are very
  dense. This should be explained much better.

- As n-tilde is used before, I guess it would be better to
  use that in place of c-cedillia as an example for a decomposable character.

- 5.2: 'Where an element contains two lines are...'
                                           ^that

- 5.2: In the first list, the second bullet only mentions "&#13;",
  but not "&#xD;". The last bullet only mentions "&#x9;", but not
  "&#9;". There may be other, similar problems here. Also, the case
  of the document containing a single #xD or a single #xA should
  be discussed.

- 5.2: Codepoint U+000A (#xA) is Line Feed (LF) in IETF/ISO/Unicode,
  not NL. Please do not invent new things here. Also, CRLF is usually
  written without hyphen.

- 5.6: Maybe it's worth mentionning that the binary ordering of UTF-8
  is identical to the ordering of Unicode code points for a string.

- 5.6: [5.9 Namespaces]: This is not a reference, and therefore should
  not be formated as a reference. Please write: 'is described in
  Section 5.9, Namespaces'.

- 5.9: saying that this approach was choosen so that canonicalization
  is context-independent is a very good start to explain why namespace
  canonicalization is that space-wasting. However, it is by way not
  enough. That the canonical form of an element is the same independent
  of where it occurs in a document cannot be a goal by itself. Please
  provide a better explanation.

- 5.9: While having each namespace newly declared anew on each
  element might make some sense in some scenarios (depending on the
  explanation you give for the last point), repeating one and the
  same namespace declaration on the same start tag is not justified
  by the note in the current draft, nor can I see any other reason
  for doing this.

- Acknowledgements: W3C Liaison -> W3C Staff Contact

- Acknowledgements: U. Ill: please expand. At least people
  outside the US won't guess that.

- Acknowledgements: Please don't acknowledge yourself, or only yourself.
  To list the members of the WG, a title such as in the XML spec is
  more appropriate.


Regards,  Martin.


#-#-#  Martin J. Du"rst, World Wide Web Consortium
#-#-#  mailto:duerst@w3.org   http://www.w3.org
Received on Tuesday, 8 February 2000 01:58:38 UTC