Re: FPWD Review Request: HTML+RDFa from Mark Birbeck on 2009-09-04 (public-html@w3.org from September 2009)

From: Mark Birbeck <mark.birbeck@webbackplane.com>
Date: Fri, 4 Sep 2009 12:54:45 +0100
To: James Graham <jgraham@opera.com>
Cc: Henri Sivonen <hsivonen@iki.fi>, Anne van Kesteren <annevk@opera.com>, Manu Sporny <msporny@digitalbazaar.com>, HTML WG <public-html@w3.org>, RDFa Developers <public-rdf-in-xhtml-tf@w3.org>
Message-ID: <640dd5060909040454t5c60cd0x2b7902421f2ef00@mail.gmail.com>
Hi James,

I think you're really going have to be more specific. You say things
like "one will soon run into the following problem", but you don't say
what the problem is. You say "clearly the tree[s] produced...will
require different processing", but you don't say why they require
different processing.

(Let's leave aside that you think something is being swept under some
carpet, otherwise we'll never get through this discussion.)

So let's take this in two steps.

First I'll show a typical JavaScript algorithm for obtaining prefix
mappings from an element in an RDFa parser, and then we'll look at
whether that algorithm can be justified from the point of view of the
RDFa spec.

Here's some typical code:

  function getMappingsFromElement(element, mappingList) {
    var attributes = element.attributes, attrName, i;

    if (attributes) {
      for (i = 0; i < attributes.length; i++) {
        attrName = attributes[i].nodeName;

        if (attrName.substring(0, 5) === "xmlns") {
          if (attrName.length === 5) {
            mappingList.add("", attributes[i].nodeValue);
          } else if (attrName.substring(5, 6) === ':') {
            mappingList.add(attrName.substring(6), attributes[i].nodeValue);
          }
        }
      }
    }

It takes as input an [Element] (i.e., a DOM element), and a list of
mappings to keep updated. The code iterates through the element's
attributes, adding any mappings to the list. Duplicate values are
simply overwritten.

Now, since this algorithm only requires DOM Level 1 features (the
attributes property on element, and the nodeName and nodeValue
properties on an attribute), it's safe to say that it will work
regardless of whether it's HTML4, HTML5, XHTML, SVG, and so on.

Of course, it's written in JavaScript, but since it's using standard
DOM1 features, it's also clear that the language is irrelevant, and
this could be implemented in any language that has a DOM1 library.

However, the big issue is whether we *should* be doing something more
if we are in XML mode. In other words, am I using a sleight-of-hand
here to achieve compatibility across different kinds of DOM by only
using DOM1 features, when actually the RDFa spec requires that I
should be using DOM2 features if possible?

The answer is no, as I've explained in the other threads, but I'll
emphasise the main points here.

The RDFa parsing algorithm does not say "go get the currently in-scope
namespace mappings", it simply says "get the values of xmlns-based
attributes and crack them open".

As it happens, even if it did say "go get the currently in-scope
namespace mappings", you couldn't do that, even in DOM2.

But in any case, the spec simply says "get the values of xmlns-based
attributes and crack them open, and we'll keep track of scoping
ourselves".

This is no surprise, since the algorithm was written with the express
purpose of allowing other prefixing mechanisms to be used; if a new
prefix mechanism were added, such as @prefix or @token, then if we'd
used namespaces explicitly then we'd suddenly have to track the scope
of the mappings ourselves. However, by adding this to the algorithm
from the start we retained flexibility.

Which means that to support @prefix or @token (or any other prefix
mapping proposal), the only thing that would need to change in out
'typical' RDFa parser would be the function that I wrote above, which
maintains the list of prefix mappings.

And since the algorithm for converting a CURIE to a full URI uses the
currently in-scope URI mappings -- and *not* the currently in-scope
namespaces -- then the rest of the RDFa parsing algorithm 'just
works'.

Regards,

Mark

On Fri, Sep 4, 2009 at 11:52 AM, James Graham<jgraham@opera.com> wrote:
> Mark Birbeck wrote:
>
>> The original objection was that different processing is required for
>> different DOMs, and I think we've shown that's not the case; all that
>> is required is to iterate through the list of atttributes, and pull
>> out those that begin "xmlns:".
>
> It seems to me this is empirically untrue. Consider the case where one tries
> to write an RDFa processor in python using lxml and html5lib with the lxml
> treebuilder. One will soon run into the following problem:
>
>>>> from lxml import etree
>>>> root = etree.fromstring("<html xmlns='http://www.w3.org/1999/xhtml'
>>>> xmlns:foo='http://foo.example'></html>")
>>>> root.tag
> '{http://www.w3.org/1999/xhtml}html'
>>>> root.attrib
> {}
>>>> root.nsmap
> {None: 'http://www.w3.org/1999/xhtml', 'foo': 'http://foo.example'}
>
>
>>>> import html5lib
>>>> tree = html5lib.parse("<html xmlns='http://www.w3.org/1999/xhtml'
>>>> xmlns:foo='http://foo.example'></html>", treebuilder="lxml")
>>>> root = tree.getroot()
>>>> root.tag
> '{http://www.w3.org/1999/xhtml}html'
>>>> root.attrib
> {'xmlns': 'http://www.w3.org/1999/xhtml', 'xmlnsU0003Afoo':
> 'http://foo.example'}
>>>> root.nsmap
> {None: 'http://www.w3.org/1999/xhtml'}
>
> Clearly the tree produced using XML and the tree produced using html5lib
> will require different processing. Using a non-namespace aware XML processor
> would still result in problems since the tag name would be different in the
> two cases.
>
> Obviously this is not, as stated, strictly a "DOM" consistency issue since
> it uses lxml rather than DOM for its tree model. Nevertheless, it does
> demonstrate why one cannot pretend that the use of xml namespaces to
> establish prefix bindings is an unimportant detail that can be swept under
> the carpet.
>



-- 
Mark Birbeck, webBackplane

mark.birbeck@webBackplane.com

http://webBackplane.com/mark-birbeck

webBackplane is a trading name of Backplane Ltd. (company number
05972288, registered office: 2nd Floor, 69/85 Tabernacle Street,
London, EC2A 4RR)
Received on Friday, 4 September 2009 11:55:28 UTC