- From: Jeremy Carroll <jjc@hpl.hp.com>
- Date: Mon, 23 Apr 2007 20:03:26 +0100
- To: Dan Connolly <connolly@w3.org>
- CC: ogbujic@ccf.org, GRDDL Working Group <public-grddl-wg@w3.org>
Here is my code - the intent is to be liberal. I've added comments for this message This code processes the root element of the document. // check mimetype: text/html or application/xhtml+xml boolean html = isHtmlMimetype(); // process according to section 2 of spec if (grddlNamespace) checkRootAttrs(attr); // if namespace URI if (uri != null && !uri.equals("")) { checkSchema(input.resolve(uri)); // Is HTML doc, if has HTML mimetype or HTML namespace // this allows XHTML2 namespace too. html = html || isHtmlNS(uri); } else // if no namespace URI, but root element is <html> or case variant // then its html if (localName.equalsIgnoreCase("html")) html = true; // we've done enough processing except in the HTML case if (!html) throw new SeenEnoughExpectedException(); // hmmm if our root element is not html, then // the document is a mess, better tidy it up. if (!localName.equalsIgnoreCase("html")) { needTidy(); // doesn't usually return } I wouldn't expect the spec to call out all of these possibilities. I suspect I should have a strict mode that switches at least some of it off. So, my code, does any one of: a) checks for mimetype (text/html application/xhtml+xml) b) checks for namespace uri c) checks for root element name, if unqualified. - and will tidy up any mess it finds. This differs from your suggestion which was (a) or ( (b) and (c) ), rather than (a) or (b) or (c) which I've implemented. I've also implemented (a) as two mimetypes, and (b) as two namespaces. Jeremy Dan Connolly wrote: > On Mon, 2007-04-23 at 14:08 -0400, Chimezie Ogbuji wrote: > [...] >> On Mon, 2007-04-23 at 12:45 -0500, Dan Connolly wrote: > [...] >>> What _do_ the implementations check or depend on? >>> MIME type, XML-wf-ness, and root element namespace? >> GRDDL.py (in its current form) only checks for XML-wf-ness and >> successful evaluation of the (unambiguous) XPaths outlined in the >> specification. > > There are no XPaths in the relevant section. > > Oh... wait... yes there are... though only in the informative > mechanical rules... > > I think those rules match, for example, XHTML inside Atom; > even inside an Atom document that says "The following > XHTML is false/fictuional/counter-factual..." > >>> If so, I'd specify something like this... >>> >>> If an information resource has a text/html representation >>> whose body is an XML document whose root element >>> bears the local name 'html' and the >>> namespace name 'http://www.w3.org/1999/xhtml', then ... >>> >> +1 On this > > That quick sketch excludes the following mime types: > text/xml > application/xml > application/xhtml+xml > > I think that's not a good way to specify it... but I do think > the media type has to specify XML... i.e. text/plain is no good. > >> However, my original question remains: does our dependency on XHTML >> clash with the faithful infoset 'stance'? > > No. (i.e. not as far as I can tell.) > -- Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN Registered No: 690597 England
Received on Monday, 23 April 2007 19:04:11 UTC