- From: Philip Taylor <pjt47@cam.ac.uk>
- Date: Sat, 05 Sep 2009 13:00:31 +0100
- To: Shane McCarron <shane@aptest.com>
- CC: Mark Birbeck <mark.birbeck@webbackplane.com>, Manu Sporny <msporny@digitalbazaar.com>, HTML WG <public-html@w3.org>, RDFa Developers <public-rdf-in-xhtml-tf@w3.org>
Shane McCarron wrote:
> I would not object to providing examples of extraction algorithms as guidance.
> We already do this for CURIEs somewhere... But I do not think it is a good idea
> to normatively define code.
I agree the spec shouldn't normatively define code. When I said it
"needs to define the prefix mapping extraction algorithm in precise
detail" I was thinking of something much more abstract than real code,
though it should still be clear and unambiguous on all the relevant details.
Currently I don't see anything in the specs other than vague references
to the Namespaces in XML spec ("Since CURIE mappings are created by
authors via the XML namespace syntax [XMLNS] an RDFa processor MUST take
into account the hierarchical nature of prefix declarations" in
rdfa-syntax, "CURIE prefix mappings specified using xmlns: must be
processed using the rules specified in the [Namespaces in XML]
Recommendation" in HTML5+RDFa), and I want it to be clearer about
exactly which rules are applied and how they are adapted for non-XML
content, because otherwise I can produce lots of test cases where I
can't work out what the spec says the output must be. (I don't care how
an implementation computes the output, I just want to know what the
output is.)
> The processing model in the current RDFa Syntax
> Recommendation is sufficiently precise for anyone to understand what must be
> done in the face of both conforming and non-conforming input. The edge
> conditions people keep bringing up (what happens if xmlns:="" is defined, etc)
> are all degenerate cases of the general case of prefix declaration that does not
> match the syntax definition. If it doesn't match the syntax definition, it is
> illegal.
Which syntax definition? In http://www.w3.org/TR/rdfa-syntax/ I can only
find a definition of the CURIE syntax, which is not relevant to the
issue of handling xmlns:="...".
(In most cases the CURIE syntax restriction is sufficient - you can't
have rel="0:test" (it will just be ignored) so it doesn't really matter
how xmlns:0="..." was processed. But you can write rel=":test", so it
matters how xmlns:="..." interacts with that. And you can write
rel="ex:test" and xmlns:ex="" (empty value, illegal in Namespaces in XML
1.0), so it matters how that is handled too.)
Presumably http://www.w3.org/TR/REC-xml-names/#NT-PrefixedAttName is the
relevant syntax definition for namespace prefix declarations, but
rdfa-syntax doesn't explicitly refer to that. It's implicit when using
RDFa in XHTML, because XHTML is based on top of xml-names and you'll get
a well-formedness error if you try writing these invalid things, but
that doesn't automatically apply when using HTML instead.
Should the non-syntactic xml-names constraints be required too? e.g.
what triples should I get if I write the following HTML:
<p xmlns:xml="http://example.org/" property="xml:test">Test</p>
<p xmlns:xmlns="http://www.w3.org/2000/xmlns/"
property="xmlns:test">Test</p>
<p xmlns:ex="http://www.w3.org/2000/xmlns/" property="ex:test">Test</p>
(which all violate the Namespace Constraints in xml-names)? I presume
these should all be ignored too, but implementers have not been doing
that, so evidently it is not sufficiently obvious.
(I've updated http://philip.html5.org/demos/rdfa/results.html with some
of these cases, to show the output of current implementations. The
pass/fail statuses are largely irrelevant and probably wrong, but the
table shows the actual output of each implementation on mouse-over.)
> If it is illegal, it is ignored. What more does one need in a
> normative spec?
For RDFa-in-HTML, I'd like it to explicitly state what "illegal" means,
e.g. whether those Namespace Constraints should be applied in
non-XML-based versions of HTML. It doesn't need to redefine things that
are defined elsewhere, but it should explicitly refer to concepts like
PrefixedAttName and Namespace Constraints that are being used by the
RDFa-in-HTML processing model, because I don't think they are obvious
otherwise.
For both RDFa-in-HTML and RDFa-in-XHTML, I'd also like it to slightly
more clearly state what "ignored" means:
The "CURIE and URI Processing" section says "any value that is not a
'curie' according to the definition in the section CURIE Syntax
Definition MUST be ignored". The "Sequence" section refers to e.g. "the
URI from @about, if present, obtained according to the section on CURIE
and URI Processing", and I think it's clear it should be considered
not-present if it's not a valid CURIE. So <span about="[bogus:bogus]"
src="http://example.org/"> should ignore @about and use @src, and that's
all okay. (Some implementations still get this wrong, though.)
But it also says "if @property is not present then the [skip element]
flag is set to 'true'" - is an invalid CURIE meant to be considered
not-present here too (even though there's no reference to the CURIE and
URI Processing section)? i.e. should the output from:
<p about="http://example.com/" rel="next">
<span property="bogus:bogus">
<span about="http://example.net/">Test</span>
</span>
</p>
include the triple '<http://example.com/>
<http://www.w3.org/1999/xhtml/vocab#next> <http://example.net/>' or not?
Implementations differ.
It also says "If the [current element] contains no @rel or @rev
attribute" - is the attribute meant to be ignored (acting as if the
element didn't have the attribute at all) if it contains only invalid
CURIEs (or if it contains no values)? i.e. should the output from:
<p xmlns:ex="http://example.org/" rel="bogus:bogus"
property="ex:test" href="http://example.org/href">Test</p>
include the triple '<http://example.org/href> <http://example.org/test>
"Test".' or '<> <http://example.org/test> "Test".'? Implementations
again differ.
The test suite should be extended to cover these cases, in order to
detect these differences between implementations (because at least one
must be buggy), if it doesn't already (I haven't checked). But I think
the RDFa Syntax spec should also be updated to be clear about the
expected behaviour, because I've tried to read it carefully and I'm
still not confident enough to know what the output should be.
> I could come up with a nearly infinite collection of illegal declarations for
> each of the attributes that are addressed in the RDFa Syntax specification.
> However, they would all fall into the same class - illegal. When you are doing
> testing, you don't do "exhaustive" or even "thorough" testing of anything that
> is sufficiently complex. It is impossible. Instead, you do "equivalence class
> testing". Identify a couple of use cases from each class of processing for a
> given interface, test those, and trust that the other values in the class will
> behave the same way. For example, I would not test every single possible prefix
> name when exercising a CURIE processing library. It is not just impossible, it
> is also uninteresting. I would test some good ones and make sure they work. I
> would test some bad ones and make sure they are ignored. Then I would move on.
I would want to write tests that find bugs. There are lots of different
classes of bugs when handling illegal input - you might forget to check
the prefix is non-zero length, or forget to check it's an NCName, or
forget to check the value is non-empty, or forget to check the value is
not the xml or xmlns URI, or you might use the 4th Edition of XML
instead of the 5th, etc. There are dozens of mistakes that people can
(and apparently do) make when implementing this. Those mistakes are not
all equivalent, so they should each be tested as separate equivalence
classes, and it needs a lot more than a few tests of illegal input.
(I agree that each class doesn't need to be tested exhaustively - e.g. a
few non-NCName prefixes are enough to detect bugs if implementations
aren't correctly checking for NCNames, and there's no need to test
thousands of non-NCNames because that's very unlikely to find any more
bugs. But I don't think anyone's ever proposed testing thousands of
non-NCNames, so I presume that's not really what you're concerned about.)
--
Philip Taylor
pjt47@cam.ac.uk
Received on Saturday, 5 September 2009 12:01:16 UTC