- From: Shane McCarron <shane@aptest.com>
- Date: Tue, 08 Sep 2009 11:19:46 -0500
- To: Philip Taylor <pjt47@cam.ac.uk>
- CC: Mark Birbeck <mark.birbeck@webbackplane.com>, Manu Sporny <msporny@digitalbazaar.com>, HTML WG <public-html@w3.org>, RDFa Developers <public-rdf-in-xhtml-tf@w3.org>
Philip, Thanks for taking the time to respond so thoroughly. In general I agree with Mark that the RDFa Syntax Recommendation could have done a better job of tightening its relationship to the Namespaces in XML Recommendation. I also agree that if there are pathological cases that are *important* they should be covered by a test suite. What is "pathological" and what is "important" are obviously subjective, and test suite authors spend lots of time debating such things. I have some detailed comments in line. Philip Taylor wrote: > Shane McCarron wrote: >> I would not object to providing examples of extraction algorithms as >> guidance. We already do this for CURIEs somewhere... But I do not >> think it is a good idea to normatively define code. > > I agree the spec shouldn't normatively define code. When I said it > "needs to define the prefix mapping extraction algorithm in precise > detail" I was thinking of something much more abstract than real code, > though it should still be clear and unambiguous on all the relevant > details. > > Currently I don't see anything in the specs other than vague > references to the Namespaces in XML spec ("Since CURIE mappings are > created by authors via the XML namespace syntax [XMLNS] an RDFa > processor MUST take into account the hierarchical nature of prefix > declarations" in rdfa-syntax, "CURIE prefix mappings specified using > xmlns: must be processed using the rules specified in the [Namespaces > in XML] Recommendation" in HTML5+RDFa), and I want it to be clearer > about exactly which rules are applied and how they are adapted for > non-XML content, because otherwise I can produce lots of test cases > where I can't work out what the spec says the output must be. (I don't > care how an implementation computes the output, I just want to know > what the output is.) Well... Hmm... My opinion differs with yours on this. That reference, while in prose, is not vague at all. It is a normative reference to a related W3C Recommendation that defines precisely the syntactic requirements for what is and is not a legal xmlns: attribute declaration (see, for example, section 3 - Declaring Namespaces). As an implementor of RDFa Syntax, it is my responsibility to ensure I am either 1) using a library to parse my input that already knows about the requirements of the Namespaces in XML Recommendation, or 2) implement those requirements myself. Either way, the requirements are clear (and yes, my implementation is somewhat broken). Further, since the RDFa Syntax Recommendation is only concerned about the "syntax" of those prefix declarations, and has no semantic requirements beyond that for the use of XML Namespaces, it should be clear that parts of the Namespaces in XML Recommendation that deal with how XML Namespaces effect the declaration of elements and attributes is irrelevant for an RDFa Syntax - conforming processor. (Note - I would be very comfortable adding such language in the RDFa Syntax Errata document immediately. I will bring it up at the next Task Force call.) > >> The processing model in the current RDFa Syntax Recommendation is >> sufficiently precise for anyone to understand what must be done in >> the face of both conforming and non-conforming input. The edge >> conditions people keep bringing up (what happens if xmlns:="" is >> defined, etc) are all degenerate cases of the general case of prefix >> declaration that does not match the syntax definition. If it doesn't >> match the syntax definition, it is illegal. > > Which syntax definition? In http://www.w3.org/TR/rdfa-syntax/ I can > only find a definition of the CURIE syntax, which is not relevant to > the issue of handling xmlns:="...". True. That's what the XML Namespaces Recommendation is for. And it tightly defines the syntax. Anything that does not conform to that syntax is not a legal CURIE prefix declaration, and therefore would be ignored. > > (In most cases the CURIE syntax restriction is sufficient - you can't > have rel="0:test" (it will just be ignored) so it doesn't really > matter how xmlns:0="..." was processed. But you can write rel=":test", > so it matters how xmlns:="..." interacts with that. And you can write > rel="ex:test" and xmlns:ex="" (empty value, illegal in Namespaces in > XML 1.0), so it matters how that is handled too.) The XML Namespaces Recommendation clearly says what is illegal, including xmlns:="...". The RDFa Syntax Recommendation clearly states that there is no way to define a local default CURIE prefix mapping, and that rel=":next" is interpreted in the context of the XHTML Vocabulary URI. So no, I don't think there is any room for misinterpretation or difference among implementations here. > > > Presumably http://www.w3.org/TR/REC-xml-names/#NT-PrefixedAttName is > the relevant syntax definition for namespace prefix declarations, but > rdfa-syntax doesn't explicitly refer to that. It's implicit when using > RDFa in XHTML, because XHTML is based on top of xml-names and you'll > get a well-formedness error if you try writing these invalid things, > but that doesn't automatically apply when using HTML instead. RDFa Syntax DOES explicitly, normatively incorporate the XML Namespaces Recommendation. It also explicitly, normatively says that the XML Namespace *syntax* is what is used to declare CURIE prefix mappings. I wouldn't mind explaining in an errata that the syntax is defined at http://www.w3.org/TR/REC-xml-names/#NT-PrefixedAttName - would that help address your concerns? > > Should the non-syntactic xml-names constraints be required too? e.g. > what triples should I get if I write the following HTML: > > <p xmlns:xml="http://example.org/" property="xml:test">Test</p> > > <p xmlns:xmlns="http://www.w3.org/2000/xmlns/" > property="xmlns:test">Test</p> > > <p xmlns:ex="http://www.w3.org/2000/xmlns/" property="ex:test">Test</p> > > (which all violate the Namespace Constraints in xml-names)? I presume > these should all be ignored too, but implementers have not been doing > that, so evidently it is not sufficiently obvious. Such prefix declarations are illegal, and therefore MUST be ignored by a conforming RDFa Processor. Do all processors do so today? I doubt it. Could they? Of course. Should they? Of course. Would it break anything in the wild if they started doing so tomorrow? No way. These are good but pathological cases. I would be happy to add test cases for them. But in the end, whether we test for these cases or not in no way changes the definition of RDFa as it was published. We have these constraints by normative reference already. > > >> If it is illegal, it is ignored. What more does one need in a >> normative spec? > > For RDFa-in-HTML, I'd like it to explicitly state what "illegal" > means, e.g. whether those Namespace Constraints should be applied in > non-XML-based versions of HTML. It doesn't need to redefine things > that are defined elsewhere, but it should explicitly refer to concepts > like PrefixedAttName and Namespace Constraints that are being used by > the RDFa-in-HTML processing model, because I don't think they are > obvious otherwise. I agree that all the prefix syntax declaration constraints should apply to both the XHTML and HTML versions of RDFa. I think they do already because of the normative inclusion of the XML Namespaces Recommendation, but if you think it would help clarify things I am happy to 1) add an errata as I mentioned above, and 2) support adding some explicit text to the RDFa-in-HTML working draft. > > > For both RDFa-in-HTML and RDFa-in-XHTML, I'd also like it to slightly > more clearly state what "ignored" means: > > The "CURIE and URI Processing" section says "any value that is not a > 'curie' according to the definition in the section CURIE Syntax > Definition MUST be ignored". The "Sequence" section refers to e.g. > "the URI from @about, if present, obtained according to the section on > CURIE and URI Processing", and I think it's clear it should be > considered not-present if it's not a valid CURIE. So <span > about="[bogus:bogus]" src="http://example.org/"> should ignore @about > and use @src, and that's all okay. (Some implementations still get > this wrong, though.) I think your interpretation is the correct one, and I think there is a test case to this effect already. If it were not ignored, then @about would be interpreted as @about="" and that would refer to the current document and supercede @src. Michael or Manu, can you confirm there is already a test case for this? > > But it also says "if @property is not present then the [skip element] > flag is set to 'true'" - is an invalid CURIE meant to be considered > not-present here too (even though there's no reference to the CURIE > and URI Processing section)? i.e. should the output from: > > <p about="http://example.com/" rel="next"> > <span property="bogus:bogus"> > <span about="http://example.net/">Test</span> > </span> > </p> > > include the triple '<http://example.com/> > <http://www.w3.org/1999/xhtml/vocab#next> <http://example.net/>' or > not? Implementations differ. The rules are to be applied consistently. If there are no legal values in an attribute declaration, an implementation MUST act as if that attribute declaration were not present at all. Again, I believe there are test cases that do this now, and it surprises me that you say implementations differ on this. In the case of @property, I would support adding errata to clarify that this behaves as @about behaves if that would satisfy your concern. > > It also says "If the [current element] contains no @rel or @rev > attribute" - is the attribute meant to be ignored (acting as if the > element didn't have the attribute at all) if it contains only invalid > CURIEs (or if it contains no values)? i.e. should the output from: > > <p xmlns:ex="http://example.org/" rel="bogus:bogus" > property="ex:test" href="http://example.org/href">Test</p> > > include the triple '<http://example.org/href> > <http://example.org/test> "Test".' or '<> <http://example.org/test> > "Test".'? Implementations again differ. As above - all illegal attribute interpretations should be consistent throughout. @rel or @rev with no legal values MUST be treated as if the attribute were not present at all. > > The test suite should be extended to cover these cases, in order to > detect these differences between implementations (because at least one > must be buggy), if it doesn't already (I haven't checked). But I think > the RDFa Syntax spec should also be updated to be clear about the > expected behaviour, because I've tried to read it carefully and I'm > still not confident enough to know what the output should be. Understood. We will discuss this at a Task Force meeting and see if there is a way to introduce a blanket statement via the errata. However, again, I believe there is no conflict in the spec as written currently. There is ALWAYS room for misinterpretation in every spec. We can tighten the language and attempt to make the language more consistent. > > >> I could come up with a nearly infinite collection of illegal >> declarations for each of the attributes that are addressed in the >> RDFa Syntax specification. However, they would all fall into the >> same class - illegal. When you are doing testing, you don't do >> "exhaustive" or even "thorough" testing of anything that is >> sufficiently complex. It is impossible. Instead, you do >> "equivalence class testing". Identify a couple of use cases from >> each class of processing for a given interface, test those, and trust >> that the other values in the class will behave the same way. For >> example, I would not test every single possible prefix name when >> exercising a CURIE processing library. It is not just impossible, it >> is also uninteresting. I would test some good ones and make sure >> they work. I would test some bad ones and make sure they are >> ignored. Then I would move on. > > I would want to write tests that find bugs. There are lots of > different classes of bugs when handling illegal input - you might > forget to check the prefix is non-zero length, or forget to check it's > an NCName, or forget to check the value is non-empty, or forget to > check the value is not the xml or xmlns URI, or you might use the 4th > Edition of XML instead of the 5th, etc. There are dozens of mistakes > that people can (and apparently do) make when implementing this. Those > mistakes are not all equivalent, so they should each be tested as > separate equivalence classes, and it needs a lot more than a few tests > of illegal input. > > (I agree that each class doesn't need to be tested exhaustively - e.g. > a few non-NCName prefixes are enough to detect bugs if implementations > aren't correctly checking for NCNames, and there's no need to test > thousands of non-NCNames because that's very unlikely to find any more > bugs. But I don't think anyone's ever proposed testing thousands of > non-NCNames, so I presume that's not really what you're concerned about.) > No, I'm not. Poor testing is my personal soap box. Sorry if I came off as attacking your testing methodology. In general, I believe it is important to always identify each equivalence class. There are several in the case the of XML Namespace prefix syntax, and it is a good idea to exercise each of them. There are several in the case of CURIE interpretation in attribute values, and those should be exercised as well. What I *personally* avoid is adding tests to make sure something no longer works wrong. Conformance testing is about ensuring all implementations work *right* in the presence of correct and incorrect usage. Failure or regression testing is about adding tests that exercise a reported failure. Once that reported failure is fixed, that test will never fail again. Therefore, such tests check to make sure an implementation no longer works wrong. It doesn't make it a bad test, but such tests are almost always exercising members of a class of input that SHOULD have been exercised by conformance testing in the first place. Rather than add a hodge-podge of tests that touch on specific failure cases, I strive to define/update the related general equivalence class. That way you are categorizing the test correctly and exercising the general feature, as opposed to the specific failure. But as I said, that's my personal soap box. I have been standing on it, beating my breast and shouting, for 25 years. For some reason, there are people who remain unconvinced. :-P Shane P. McCarron Phone: +1 763 786-8160 x120 Managing Director Fax: +1 763 786-8180 ApTest Minnesota Inet: shane@aptest.com
Received on Tuesday, 8 September 2009 16:20:51 UTC