Re: Invalid XHTML Re: Another test suggesting change in the spec from Chimezie Ogbuji on 2007-04-23 (public-grddl-wg@w3.org from April 2007)

From: Chimezie Ogbuji <ogbujic@ccf.org>
Date: Mon, 23 Apr 2007 14:08:37 -0400
To: "Dan Connolly" <connolly@w3.org>
cc: "Jeremy Carroll" <jjc@hpl.hp.com>, "GRDDL Working Group" <public-grddl-wg@w3.org>
Message-ID: <1177351717.5534.73.camel@otherland>
Perhaps I'm missing the big picture, but it seems our 'silent' faithful
infoset stance (remember that?) conflicts with the notion of XHTML, its
'validity' requirement, and our normative dependency on XHTML.
Especially if a reading of the current text suggests that this test is
not 'acceptable':

[[
This specification is purposely silent on the question of which XML
processors are employed by or for GRDDL-aware agents. Whether or not
processing of XInclude, XML Validity, XML Schema Validity, XML
Signatures or XML Decryption take place is implementation-defined.
]]

Is it not the case that XHTML validity = XML validity?

Other comments inline

On Mon, 2007-04-23 at 12:45 -0500, Dan Connolly wrote:
> The suggestions so far haven't affected the normative
> rules/assertions, so they don't support this test either.
> 
> The rule currently starts:
> 
>   Given an XHTML family document[XHTML] with XPath root node N,
> 
> I would love it if the XHTML specs provided a definition
> of the set of documents we're interested in, but I can't
> find it.

The XHTML specs do not, but the set can easily be defined via XPath (as
you do later in this message).  

> > But currently our implementations do not in general check for DTD 
> > validity, yet formally they should since the rel="transformation"
> thing 
> > is only defined for DTD valid docs.

GRDDL.py does not validate (as the faithful infoset resolution suggests
that it is not required to do so):

Line 61: from Ft.Xml.Domlette import NoExtDtdReader as XMLParser

>From (http://4suite.org/docs/CoreManual.xml#id220032044)

[[
3.1.8 NoExtDtdReader

When using NonvalidatingReader to parse a document, that document's DTD
is still opened and read to obtain information such as entity
declarations and default attribute values. You cannot suppress reading
of the internal DTD subset, but you can prevent the external subset from
being accessed by using NoExtDtdReader. This won't affect the processing
of external parameter entities defined in the internal DTD subset. Use
this object as you would use NonvalidatingReader.
]]

So, not only does it *not* validate but it ignores external DTDs.

> What _do_ the implementations check or depend on?
> MIME type, XML-wf-ness, and root element namespace?

GRDDL.py (in its current form) only checks for XML-wf-ness and
successful evaluation of the (unambiguous) XPaths outlined in the
specification.

> If so, I'd specify something like this...
> 
>   If an information resource has a text/html representation
>   whose body is an XML document whose root element
>   bears the local name 'html' and the
>   namespace name 'http://www.w3.org/1999/xhtml', then ...
> 

+1 On this 

However, my original question remains: does our dependency on XHTML
clash with the faithful infoset 'stance'? Does it prevent us from
specifying our own family of 'well-formed-but-not-necessarily-valid'
XHTML documents?

-- 
Chimezie Ogbuji
Lead Systems Analyst
Thoracic and Cardiovascular Surgery
Cleveland Clinic Foundation
9500 Euclid Avenue/ W26
Cleveland, Ohio 44195
Office: (216)444-8593
ogbujic@ccf.org






Cleveland Clinic is ranked one of the top 3 hospitals in
America by U.S.News & World Report. Visit us online at
http://www.clevelandclinic.org for a complete listing of
our services, staff and locations.


Confidentiality Note:  This message is intended for use
only by the individual or entity to which it is addressed
and may contain information that is privileged,
confidential, and exempt from disclosure under applicable
law.  If the reader of this message is not the intended
recipient or the employee or agent responsible for
delivering the message to the intended recipient, you are
hereby notified that any dissemination, distribution or
copying of this communication is strictly prohibited.  If
you have received this communication in error,  please
contact the sender immediately and destroy the material in
its entirety, whether electronic or hard copy.  Thank you.


===================================
Received on Monday, 23 April 2007 18:09:24 UTC