Re: Request to publish HTML+RDFa (draft 3) as FPWD from Jonas Sicking on 2009-09-22 (public-rdf-in-xhtml-tf@w3.org from September 2009)

From: Jonas Sicking <jonas@sicking.cc>
Date: Tue, 22 Sep 2009 13:43:58 -0700
To: Shane McCarron <shane@aptest.com>
Cc: Henri Sivonen <hsivonen@iki.fi>, Mark Birbeck <mark.birbeck@webbackplane.com>, HTMLWG WG <public-html@w3.org>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>
Message-ID: <63df84f0909221343j4c76683bi82b840559935c0e7@mail.gmail.com>
On Tue, Sep 22, 2009 at 1:07 PM, Shane McCarron <shane@aptest.com> wrote:
>
>
> Jonas Sicking wrote:
>
> I would say there are two separate things that are missing:
>
> The most substantial one is how to do prefix mappings in a DOM or a
> HTML document. Prefix mapping is currently defined using the
> Namespaces in XML recommendation. However this recommendation only
> defines how prefix mappings are done in a serialized XML document. I
> hope we can all agree that neither DOMs (an in-memory datastructure)
> or HTML documents are not XML documents.
>
> For example, if I have a DOM and I want to do map the prefix "foo",
> which of the following algorithms should I use:
> 1. Call Node.lookupNamespacePrefix as defined by DOM Level 3 using
> "foo" as the prefix argument.
> 2. Walk up the parent chain looking for an element with an attribute
> with localName "foo" and namespace "http://www.w3.org/2000/xmlns/",
> and then use the value of that attribute.
> 3. Walk up the parent chain looking for an element with an attribute
> with tagName "xmlns:foo", and then use the value of that attribute.
> 4. Walk up the parent chain looking for either the attribute in 2 or
> 3, and if both are specified use some prioritization order.
> 5. Walk up the parent chain looking for either the attribute in 2 if
> the document was parsed as XHTML, or attribute in 3 if the document
> was parsed as HTML.
> 6. Do something else?
>
> Any of 1 to 5 (as well as possibly 6) seems equally valid to me, and
> as far as I can tell there really is no specified answer.
>
>
> Nor should there be.  This presupposes a DOM-based processing model.  While
> you *might* be using a DOM, you don't need to.

Sure, but if you have a DOM, what do you do? One solution is certainly
to say that "If you have a DOM, there is no way to extract RDFa data".
This is certainly a possibility, but it does mean that it's impossible
to

> Section 2 of Manu's spec is
> pretty clear about this, as is the HTML5 spec itself.  To answer your
> question though - it would depend on what environment you were implementing
> in, surely. If you were writing an implementation against a DOM 3
> implementation that actually worked, they 1 seems pretty clever.  An area I
> wold agree needs clarification is what happens in the (pathological) case
> where there are two attributes on the same element (your item 4).  The task
> force has not yet discussed this case. My gut tells me that the item in the
> XMLNS Namespace would take precedence, but that's not always reliable.
>
> Likewise, how do I find out what the 'cc' prefix is mapped for the <a>
> element in the following serialized HTML document?
>
> <!DOCTYPE html>
> <html xmlns:cc="http://example.org/myNamespace#">
> <head><title>HTML+RDFa example</title></head>
> <body>
> <table xmlns:cc="http://creativecommons.org/ns#">
> <a rel="cc:license"
>  href="http://creativecommons.org/licenses/by-nc-nd/3.0/">
>  Creative Commons License
> </a>
> <tr><td>Example table</td></tr>
> </table>
> </body>
> </html>
>
>
> Manu's document specifies that, in the context of the HTML host language,
> the document is processed as defined in HTML.  That document requires that a
> conforming processor have parsed the *input* using the rules specified in
> HTML5.  A conforming RDFa Processor would not ever see a document with the
> above structure, right?

Ok, so you process the HTML document, and build a DOM tree, then what?
The output from the HTML parser is a DOM. Unless there is a specified
way to map that DOM to something that you can perform RDFa processing
on it seems you are stuck, no?

> As far as I can see the Namespaces in XML recommendation can't help me
> in either situation. For the DOM it doesn't deal with in-memory data
> models at all, rather it only deals with serialized XML documents. For
> the HTML document, if I try to apply XML processing as prescribed by
> Namespaces in XML, I conclude that 'cc' maps to
> "http://creativecommons.org/ns#", when in reality suspect it should
> map to "http://example.org/myNamespace#".
>
>
> According to Manu's document, and therefore according to HTML5, the "a"
> element would be moved before the table element, and your suspicion would be
> correct.
>
>
> The second, IMHO lesser problem, is that no processing is defined
> anywhere for non XML documents. Even as far as reading a rel attribute
> out of a document is only defined for XML documents. All normative
> requirements refer to XML processing and thus applies no more to HTML
> documents than to GIF images. I consider this less of a problem
> because I think it's fairly obvious that data is read out of a DOM
> using the getAttribute function. And from a HTML document by first
> parsing it to a DOM and then calling getAttribute. But it really
> should be formally defined somewhere.
>
>
> Great!  it is formally defined.  In Manu's document, and by reference in the
> HTML5 spec.  Am I missing something here?  Section 2 of Manu's document
> says, in part:
>
> "The HTML5 and XHTML5 DOM, or equivalent data structure, should be used as
> input to the RDFa processing rules. The normative language for construction
> of the HTML5 DOM and XHTML5 DOM is contained in the HTML5 specification. "
>
> Do you need additional language to make this clearer?

I really don't know how to make this clearer. The HTML specification
describes how to map a serialized HTML document to a DOM. However the
RDFa document doesn't describe how to extract RDFa data from neither a
serialized HTML document, nor from a DOM.

Since it seems we are still talking past each other, I'm not really
sure what to do at this point. One solution would be to wait for an
implementation to appear and then we can see if the need to rely on
any undefined processing steps.

/ Jonas
Received on Tuesday, 22 September 2009 20:45:01 UTC