Re: ISSUE-3 PROPOSAL: Infoset coercion from Manu Sporny on 2010-07-08 (public-rdfa-wg@w3.org from July 2010)

From: Manu Sporny <msporny@digitalbazaar.com>
Date: Thu, 08 Jul 2010 08:56:10 -0400
To: RDFa WG <public-rdfa-wg@w3.org>
Message-ID: <4C35CAEA.107@digitalbazaar.com>
On 07/08/2010 07:12 AM, Ivan Herman wrote:
> I would like to understand what this means in practice.
> 
> What you seem to describe is that if
> 
> 1. I use xmlns:XXX in HTML5+RDFa (which is probably a bad idea anyway:-)
> 2. I have an XML processor that is namespace aware, then this is what I get
> 
> But what do we say if an XML processor is _not_ namespace aware?

The concern isn't with XML Processors, since all the major ones are
Infoset-based and are thus namespace aware. All the ones used by the
browser manufacturers fit this model anyway - they're all Infoset-based
processors.

The concern is with non-XML processors, such as SGML-based processors,
which is the category that HTML5 non-XML mode falls into. Non-XML mode
processors are not namespace aware.

There is concern in WHAT WG and HTML WG that nobody defined the
algorithm for translating an SGML-based document model into an
Infoset-based document model.

In other words, in HTML5 (non-XML mode), nobody says how to generate a
namespace tuple... to get from this (in a non-XML HTML5 document):

   xmlns:foo="http://example.org/foo#"

to this (in the Infoset):

   [http://www.w3.org/2000/xmlns/, foo, http://example.org/foo#]

Some people in HTML WG and WHAT WG are arguing that non-XML mode models
shouldn't be namespace aware at all because SGML isn't namespace aware.
Namespaces are a DOM2 and Infoset thing. The counter-argument to that is
that RDFa needs the namespace information - we depend on it for xmlns:
to work correctly, and thus don't have the luxury of ignoring namespace
information.

There is also a parallel issue - we'd like the HTML5 non-XML mode model
and the XHTML5 (XML-mode) model to be equivalent. By having namespaces
in one and not the other, the two models are not equivalent. Since the
two models are, currently, not equivalent - the HTML WG and WHAT WG
folks are saying that the triples generated from a document /could/
potentially be different between HTML5 and XHTML5 mode. That would be a
bad thing.

To remedy this, we are trying to create language that makes it clear how
namespace information is extracted from an HTML5 non-XML mode document
as well as an XHTML5 XML-mode document.

We don't /need/ to modify the coercion to Infoset rules in the HTML5
spec for this to work. However, modifying the coercion to Infoset rules
is the correct technical solution.

If HTML WG rejects the coercion to Infoset rule changes, we can always
fall back to defining how one extracts the namespace information from a
HTML5 non-XML mode document (for Infosets and for DOM Level 2).

I know this is confusing - hopefully I'll be able to do a better job
explaining this on the call today.

-- manu

-- 
Manu Sporny (skype: msporny, twitter: manusporny)
President/CEO - Digital Bazaar, Inc.
blog: Myth Busting Web Stacks - PHP is Faster Than You Think
http://blog.digitalbazaar.com/2010/06/12/myth-busting-php/2/
Received on Thursday, 8 July 2010 12:56:45 UTC