- From: Philip Taylor <pjt47@cam.ac.uk>
- Date: Sat, 05 Sep 2009 13:00:31 +0100
- To: Shane McCarron <shane@aptest.com>
- CC: Mark Birbeck <mark.birbeck@webbackplane.com>, Manu Sporny <msporny@digitalbazaar.com>, HTML WG <public-html@w3.org>, RDFa Developers <public-rdf-in-xhtml-tf@w3.org>
Shane McCarron wrote: > I would not object to providing examples of extraction algorithms as guidance. > We already do this for CURIEs somewhere... But I do not think it is a good idea > to normatively define code. I agree the spec shouldn't normatively define code. When I said it "needs to define the prefix mapping extraction algorithm in precise detail" I was thinking of something much more abstract than real code, though it should still be clear and unambiguous on all the relevant details. Currently I don't see anything in the specs other than vague references to the Namespaces in XML spec ("Since CURIE mappings are created by authors via the XML namespace syntax [XMLNS] an RDFa processor MUST take into account the hierarchical nature of prefix declarations" in rdfa-syntax, "CURIE prefix mappings specified using xmlns: must be processed using the rules specified in the [Namespaces in XML] Recommendation" in HTML5+RDFa), and I want it to be clearer about exactly which rules are applied and how they are adapted for non-XML content, because otherwise I can produce lots of test cases where I can't work out what the spec says the output must be. (I don't care how an implementation computes the output, I just want to know what the output is.) > The processing model in the current RDFa Syntax > Recommendation is sufficiently precise for anyone to understand what must be > done in the face of both conforming and non-conforming input. The edge > conditions people keep bringing up (what happens if xmlns:="" is defined, etc) > are all degenerate cases of the general case of prefix declaration that does not > match the syntax definition. If it doesn't match the syntax definition, it is > illegal. Which syntax definition? In http://www.w3.org/TR/rdfa-syntax/ I can only find a definition of the CURIE syntax, which is not relevant to the issue of handling xmlns:="...". (In most cases the CURIE syntax restriction is sufficient - you can't have rel="0:test" (it will just be ignored) so it doesn't really matter how xmlns:0="..." was processed. But you can write rel=":test", so it matters how xmlns:="..." interacts with that. And you can write rel="ex:test" and xmlns:ex="" (empty value, illegal in Namespaces in XML 1.0), so it matters how that is handled too.) Presumably http://www.w3.org/TR/REC-xml-names/#NT-PrefixedAttName is the relevant syntax definition for namespace prefix declarations, but rdfa-syntax doesn't explicitly refer to that. It's implicit when using RDFa in XHTML, because XHTML is based on top of xml-names and you'll get a well-formedness error if you try writing these invalid things, but that doesn't automatically apply when using HTML instead. Should the non-syntactic xml-names constraints be required too? e.g. what triples should I get if I write the following HTML: <p xmlns:xml="http://example.org/" property="xml:test">Test</p> <p xmlns:xmlns="http://www.w3.org/2000/xmlns/" property="xmlns:test">Test</p> <p xmlns:ex="http://www.w3.org/2000/xmlns/" property="ex:test">Test</p> (which all violate the Namespace Constraints in xml-names)? I presume these should all be ignored too, but implementers have not been doing that, so evidently it is not sufficiently obvious. (I've updated http://philip.html5.org/demos/rdfa/results.html with some of these cases, to show the output of current implementations. The pass/fail statuses are largely irrelevant and probably wrong, but the table shows the actual output of each implementation on mouse-over.) > If it is illegal, it is ignored. What more does one need in a > normative spec? For RDFa-in-HTML, I'd like it to explicitly state what "illegal" means, e.g. whether those Namespace Constraints should be applied in non-XML-based versions of HTML. It doesn't need to redefine things that are defined elsewhere, but it should explicitly refer to concepts like PrefixedAttName and Namespace Constraints that are being used by the RDFa-in-HTML processing model, because I don't think they are obvious otherwise. For both RDFa-in-HTML and RDFa-in-XHTML, I'd also like it to slightly more clearly state what "ignored" means: The "CURIE and URI Processing" section says "any value that is not a 'curie' according to the definition in the section CURIE Syntax Definition MUST be ignored". The "Sequence" section refers to e.g. "the URI from @about, if present, obtained according to the section on CURIE and URI Processing", and I think it's clear it should be considered not-present if it's not a valid CURIE. So <span about="[bogus:bogus]" src="http://example.org/"> should ignore @about and use @src, and that's all okay. (Some implementations still get this wrong, though.) But it also says "if @property is not present then the [skip element] flag is set to 'true'" - is an invalid CURIE meant to be considered not-present here too (even though there's no reference to the CURIE and URI Processing section)? i.e. should the output from: <p about="http://example.com/" rel="next"> <span property="bogus:bogus"> <span about="http://example.net/">Test</span> </span> </p> include the triple '<http://example.com/> <http://www.w3.org/1999/xhtml/vocab#next> <http://example.net/>' or not? Implementations differ. It also says "If the [current element] contains no @rel or @rev attribute" - is the attribute meant to be ignored (acting as if the element didn't have the attribute at all) if it contains only invalid CURIEs (or if it contains no values)? i.e. should the output from: <p xmlns:ex="http://example.org/" rel="bogus:bogus" property="ex:test" href="http://example.org/href">Test</p> include the triple '<http://example.org/href> <http://example.org/test> "Test".' or '<> <http://example.org/test> "Test".'? Implementations again differ. The test suite should be extended to cover these cases, in order to detect these differences between implementations (because at least one must be buggy), if it doesn't already (I haven't checked). But I think the RDFa Syntax spec should also be updated to be clear about the expected behaviour, because I've tried to read it carefully and I'm still not confident enough to know what the output should be. > I could come up with a nearly infinite collection of illegal declarations for > each of the attributes that are addressed in the RDFa Syntax specification. > However, they would all fall into the same class - illegal. When you are doing > testing, you don't do "exhaustive" or even "thorough" testing of anything that > is sufficiently complex. It is impossible. Instead, you do "equivalence class > testing". Identify a couple of use cases from each class of processing for a > given interface, test those, and trust that the other values in the class will > behave the same way. For example, I would not test every single possible prefix > name when exercising a CURIE processing library. It is not just impossible, it > is also uninteresting. I would test some good ones and make sure they work. I > would test some bad ones and make sure they are ignored. Then I would move on. I would want to write tests that find bugs. There are lots of different classes of bugs when handling illegal input - you might forget to check the prefix is non-zero length, or forget to check it's an NCName, or forget to check the value is non-empty, or forget to check the value is not the xml or xmlns URI, or you might use the 4th Edition of XML instead of the 5th, etc. There are dozens of mistakes that people can (and apparently do) make when implementing this. Those mistakes are not all equivalent, so they should each be tested as separate equivalence classes, and it needs a lot more than a few tests of illegal input. (I agree that each class doesn't need to be tested exhaustively - e.g. a few non-NCName prefixes are enough to detect bugs if implementations aren't correctly checking for NCNames, and there's no need to test thousands of non-NCNames because that's very unlikely to find any more bugs. But I don't think anyone's ever proposed testing thousands of non-NCNames, so I presume that's not really what you're concerned about.) -- Philip Taylor pjt47@cam.ac.uk
Received on Saturday, 5 September 2009 12:01:16 UTC