Re: Request to publish HTML+RDFa (draft 3) as FPWD

On Tue, Sep 22, 2009 at 9:48 AM, Shane McCarron <> wrote:
> Henri Sivonen wrote:
>> How would you characterize the ongoing denial that the syntax
>> xmlns:p="" is problematic?
>> How can the problem be meaningfully resolved when you aren't even
>> admitting there's a problem to discuss?
> Because, Henri, we don't grok the problem.  I am slowly beginning to
> understand that this might be due to our talking past one another.  The W3C
> has a Recommendation that defines the Syntax of RDFa *input* and the
> extraction of RDF triples from that *input*.  It defines this as an
> extension to XHTML.   XHTML Modularization provides the structure for a host
> language.  The Recommendation is carefully vague about how that input is
> parsed because that is properly the job of the host language.
> In the RDFa in HTML document, Manu has deferred the syntax and extraction to
> the existing Recommendation, and has deferred the parsing of the input to
> the host language specification (HTML5).
> Jonas, Maciej, and you have pointed out that (my translation here) since it
> is possible for the *input* to be altered on its way to the code that would
> perform the extraction, it is important we define the rules for that
> extraction more tightly.  In particular, it is possible that the syntax of
> an 'xmlns:' declaration attribute may not be readily available.  It is also
> possible that, depending on the form of the *input* document, the
> declaration attribute may manifest in different ways on its way through the
> toolchain (e.g., showing up as a literal 'xmlns:foo' in HTML mode, and as
> 'foo' in the XMLNS namespace in XML mode).  However, I don't think *anyone*
> has said that the declaration will not be present in some form if it was
> present in the original *input*.  And that's how the processing rules are
> written.
> Section 5.5 defines the way in which prefix mappings are defined and
> remembered by an implementation, not how they are pulled from the data
> stream by that implementation.  To the RDFa Task Force, these are
> implementation details.  Depending upon your implementation strategy and
> environment, you will need to find the things the RDFa extraction process
> cares about, and act upon them to generate the triples.  We really, really,
> really don't care how you do this.  What we care about is that each engine
> emits the same triples in the end.  That's why there is a test suite, and
> its why there were lots of independent implementations with completely
> different strategies long before the specification was complete.
> Regardless, I agree there is room to tighten the language to ensure that
> implementors have the proper guidance, and that edge conditions, even
> pathological ones, have clear, consistent rules.  I have proposed that we
> augment the text in RDFa Syntax section 5.5 step 2 to directly address this
> problem, and am updating my proposed errata text now.  I hope that, when
> that is ready, you will continue to help by letting us know if it satisfies
> your objections.

I would say there are two separate things that are missing:

The most substantial one is how to do prefix mappings in a DOM or a
HTML document. Prefix mapping is currently defined using the
Namespaces in XML recommendation. However this recommendation only
defines how prefix mappings are done in a serialized XML document. I
hope we can all agree that neither DOMs (an in-memory datastructure)
or HTML documents are not XML documents.

For example, if I have a DOM and I want to do map the prefix "foo",
which of the following algorithms should I use:
1. Call Node.lookupNamespacePrefix as defined by DOM Level 3 using
"foo" as the prefix argument.
2. Walk up the parent chain looking for an element with an attribute
with localName "foo" and namespace "",
and then use the value of that attribute.
3. Walk up the parent chain looking for an element with an attribute
with tagName "xmlns:foo", and then use the value of that attribute.
4. Walk up the parent chain looking for either the attribute in 2 or
3, and if both are specified use some prioritization order.
5. Walk up the parent chain looking for either the attribute in 2 if
the document was parsed as XHTML, or attribute in 3 if the document
was parsed as HTML.
6. Do something else?

Any of 1 to 5 (as well as possibly 6) seems equally valid to me, and
as far as I can tell there really is no specified answer.

Likewise, how do I find out what the 'cc' prefix is mapped for the <a>
element in the following serialized HTML document?

<!DOCTYPE html>
<html xmlns:cc="">
<head><title>HTML+RDFa example</title></head>
<table xmlns:cc="">
<a rel="cc:license"
 Creative Commons License
<tr><td>Example table</td></tr>

As far as I can see the Namespaces in XML recommendation can't help me
in either situation. For the DOM it doesn't deal with in-memory data
models at all, rather it only deals with serialized XML documents. For
the HTML document, if I try to apply XML processing as prescribed by
Namespaces in XML, I conclude that 'cc' maps to
"", when in reality suspect it should
map to "".

The second, IMHO lesser problem, is that no processing is defined
anywhere for non XML documents. Even as far as reading a rel attribute
out of a document is only defined for XML documents. All normative
requirements refer to XML processing and thus applies no more to HTML
documents than to GIF images. I consider this less of a problem
because I think it's fairly obvious that data is read out of a DOM
using the getAttribute function. And from a HTML document by first
parsing it to a DOM and then calling getAttribute. But it really
should be formally defined somewhere.

/ Jonas

Received on Tuesday, 22 September 2009 18:15:55 UTC