Re: Needed: output comparitors from Dominique Hazaël-Massieux on 2004-08-16 (www-qa@w3.org from August 2004)

From: Dominique Hazaël-Massieux <dom@w3.org>
Date: Mon, 16 Aug 2004 16:12:16 +0200
To: david_marston@us.ibm.com, Lofton Henderson <lofton@rockynet.com>
Cc: www-qa@w3.org
Message-Id: <1092665536.1400.94.camel@stratustier>

Hi Dave,

Le ven 13/08/2004 à 04:52, david_marston@us.ibm.com a écrit :
> The basic XML comparitor confirms that two XML documents are equal at
> the InfoSet level. Thus, it has to neutralize the order of attributes
> and namespace nodes.

Would XML Canonicalization fit this requirement?
http://www.w3.org/TR/2001/REC-xml-c14n-20010315
"""
Any XML document is part of a set of XML documents that are logically
equivalent within an application context, but which vary in physical
representation based on syntactic changes permitted by XML 1.0 [XML] and
Namespaces in XML [Names]. This specification describes a method for
generating a physical representation, the canonical form, of an XML
document that accounts for the permissible changes. Except for
limitations regarding a few unusual cases, if two documents have the
same canonical form, then the two documents are logically equivalent
within the given application context.
"""

Examples of implementations of XML Canonicalization are available at:
http://www.w3.org/Signature/2000/10/10-c14n-interop.html

>  In some situations, it would help if it could overlook text nodes
> that are all white space.

XML Canonicalization doesn't do that, FWIW:
"""
Retain all whitespace between consecutive start tags, clean or dirty
Retain all whitespace between consecutive end tags, clean or dirty
Retain all whitespace between end tag/start tag pair, clean or dirty
Retain all whitespace in character content, clean or dirty
"""
http://www.w3.org/TR/2001/REC-xml-c14n-20010315#Example-WhitespaceInContent

>  A second comparitor is needed to check the output of a product that
> implements the Serialization spec [1] because there are requirements
> to produce CDATA sections and other details below the InfoSet level.

Could you give a few examples of such requirements?

> XSLT also produces HTML, and a definitive HTML comparitor would be
> welcome. The two inputs would be considered equal if a browser is
> required to render them the same way.

Hmm... This looks like a dangerous criterion for comparison, since
rendering is only one of the way HTML is used; I'm not sure what
definitive criterion should be use to compare two HTML documents,
although it may be worth looking at the SGML level, since HTML is an
SGML language.

> I hope there will be some way for the QA Activity to make this happen.
> It shouldn't be part of the workload of an individual "substantive"
> Working Group.

I don't think the current level of resources in the QA Activity would
make this possible as of today; but I'm fairly sure other working groups
have worked on similar tools that may be reworked or re-used; Lofton, I
kind of remember you or Kirill speaking about a test suite doing such a
comparison of output during a QA WG face to face meeting; does that
evoke anything to you? I looked at the minutes of the Tokyo F2F of the
QA WG, but didn't find any relevant detail.

Dom

> [1] http://www.w3.org/TR/xslt-xquery-serialization/
-- 
Dominique Hazaël-Massieux - http://www.w3.org/People/Dom/
W3C/ERCIM
mailto:dom@w3.org

Received on Monday, 16 August 2004 14:12:18 UTC