W3C home > Mailing lists > Public > public-rdf-in-xhtml-tf@w3.org > September 2009

Re: RDFa test cases

From: Philip Taylor <pjt47@cam.ac.uk>
Date: Wed, 23 Sep 2009 09:07:27 +0100
Message-ID: <4AB9D73F.9010709@cam.ac.uk>
To: Jeni Tennison <jeni@jenitennison.com>
CC: "public-rdf-in-xhtml-tf.w3.org list" <public-rdf-in-xhtml-tf@w3.org>
Jeni Tennison wrote:
> Philip,
> 
> I just updated rdfQuery to address some of the new test cases that 
> you've done. Thank you for taking the time to make the edge cases 
> explicit. The latest trunk rdfQuery output will differ from the samples 
> you give in some places, so just to detail them here and explain why:

Great, thanks for looking at this! I've made a few changes to the test
cases, and updated the test results to use the current trunk.

> 1. For 'Empty xmlns prefix', where the test case is:
>   <p xmlns:="http://example.org/" property=":test">Test</p>
> rdfQuery ignores the bogus namespace declaration and processes :test as 
> a CURIE with a missing prefix, putting it in the XHTML Vocabulary 
> namespace:
> 
>   <> <http://www.w3.org/1999/xhtml/vocab#test> "Test" .

Fixed the test case to match that.

> 2. For 'Underscore xmlns prefix', where the test case is:
> 
>   <p xmlns:_="http://example.org/" property="_:test">Test</p>
> 
> rdfQuery retains the legal namespace declaration but because the CURIE 
> begins with _:, which is how blank nodes are indicated, it's interpreted 
> as a blank node with the id 'test'. Predicates cannot be blank nodes in 
> rdfQuery, so it's treated as a bogus value and ignored. No triples are 
> created.

I don't see anything in the RDFa spec that justifies this behaviour. As
far as I can tell, xmlns:_ updates the list of in-scope mappings, and
then the CURIE is converted to URI with the steps "Split the CURIE at
the colon to obtain the prefix and the resource. Using the prefix and
the current in-scope mappings, obtain the URI that the prefix maps to.
Concatenate the mapped URI with the resource value, to obtain an
absolute URI". The prefix is '_' and so the relevant in-scope mapping
applies, just like any other prefix, so the mapped URI comes from
xmlns:_ and the resulting absolute URI is used as the predicate.

This doesn't really sound like sensible behaviour, but it's how I
interpret the spec. Am I misunderstanding something? Does the spec need
to be fixed?

> 3. For 'xmlns prefix 'xml' with incorrect URI', where the test case is:
> 
>   <p xmlns:xml="http://example.org/" property="xml:test">Test</p>
> 
> rdfQuery ignores the bogus namespace declaration, but uses the built-in 
> namespace declaration for the prefix 'xml' and therefore generates the 
> triple:
> 
>   <> <http://www.w3.org/XML/1998/namespacetest> "Test" .

What built-in namespace declaration? The spec says there is "An
initially empty list of [URI mapping]s, called the [local list of URI
mappings]", and the list is only updated by xmlns:* attributes on
processed elements, so it sounds like a violation if the list has some
built-in entries and is not initially empty.

Same issue for the tests that use xmlns:xmlns="...".

The only one that should generate triples is
xmlns:xml="http://www.w3.org/XML/1998/namespace" -- XML Namespaces says
the xml prefix/URI "MAY, but need not, be declared, and MUST NOT be
bound to any other namespace name", so this particular declaration is
legitimate and should add to the list of URI mappings. (It looks like
rdfQuery is only accidentally passing this test, because of the built-in
mapping for xml, so the bugs are cancelling each other out :-) )

> 7. For 'Safe CURIE containing square brackets', where the test case is:
> 
>   <p xmlns:ex="http://example.org/1/" xmlns:[ex="http://example.org/2/" 
> about="[[ex:test]]" property="ex:test">Test</p>
> 
> rdfQuery ignores the bogus namespace declaration for the prefix '[ex'. 
> The about attribute contains an illegal CURIE, so is ignored for the 
> purpose of setting the subject. The result is:
> 
>   <> <http://example.org/1/test> "Test" .

Fixed the test case.

> 8. The language-based tests aren't met, because I'm currently at a loss 
> as to how to amend rdfQuery to work out whether it's being used in an 
> HTML or XHTML setting and therefore whether xml:lang is a lang attribute 
> in the XML namespace or an attribute called 'xml:lang' in no namespace. 

You shouldn't need to know the setting.
http://whatwg.org/html5#language defines the language of an element in
terms of the DOM, independent of the syntax, so it's the same for
documents parsed from HTML and XHTML. In particular it says:

   "To determine the language of a node, user agents must look at the
nearest ancestor element (including the element itself if the node is an
element) that has a lang attribute in the XML namespace set or is an
HTML element and has a lang in no namespace attribute set. That
attribute specifies the language of the node.

   "If both the lang attribute in no namespace and the lang attribute in
the XML namespace are set on an element, user agents must use the lang
attribute in the XML namespace, and the lang attribute in no namespace
must be ignored for the purposes of determining the element's language."

which I think that can be implemented roughly like this (untested, but
hopefully nearly compatible with the HTML5 spec and with various current
browsers):

  function getLang(elem) {
    if (elem.hasAttributeNS && elem.hasAttributeNS(
          'http://www.w3.org/XML/1998/namespace', 'lang'))
      return elem.getAttributeNS(
        'http://www.w3.org/XML/1998/namespace', 'lang');

    else if (elem.hasAttribute('lang') && (
        typeof elem.namespaceURI == 'undefined'
        || elem.namespaceURI === null
        || elem.namespaceURI == 'http://www.w3.org/1999/xhtml'
    ))
      return elem.getAttribute('lang');

    else if (elem.parentNode)
      return getLang(elem.parentNode);

    else
      /* try to parse <meta http-equiv=content-language>,
         but I can't be bothered really */
  }

On a related note, I fixed one of my language test cases (the input
markup <p property=... xml:lang="aa">Test</p> in text/html ought to
produce a literal with no language), and I think the rest are all
testing for the correct output (per HTML5's definition of language).

-- 
Philip Taylor
pjt47@cam.ac.uk
Received on Wednesday, 23 September 2009 08:08:06 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 23 September 2009 08:08:08 GMT