- From: Michael Hausenblas <michael.hausenblas@deri.org>
- Date: Tue, 08 Sep 2009 17:35:18 +0100
- To: Shane McCarron <shane@aptest.com>, Philip Taylor <pjt47@cam.ac.uk>
- CC: Mark Birbeck <mark.birbeck@webbackplane.com>, Manu Sporny <msporny@digitalbazaar.com>, HTML WG <public-html@w3.org>, RDFa TF list <public-rdf-in-xhtml-tf@w3.org>
>> The "CURIE and URI Processing" section says "any value that is not a >> 'curie' according to the definition in the section CURIE Syntax >> Definition MUST be ignored". The "Sequence" section refers to e.g. >> "the URI from @about, if present, obtained according to the section on >> CURIE and URI Processing", and I think it's clear it should be >> considered not-present if it's not a valid CURIE. So <span >> about="[bogus:bogus]" src="http://example.org/"> should ignore @about >> and use @src, and that's all okay. (Some implementations still get >> this wrong, though.) > I think your interpretation is the correct one, and I think there is a > test case to this effect already. If it were not ignored, then @about > would be interpreted as @about="" and that would refer to the current > document and supercede @src. Michael or Manu, can you confirm there is > already a test case for this? AFAIK not precisely this case. What comes closest are TC35-37 and TC42. FWIW, I guess we can add one TC concerning this, if desired. Cheers, Michael -- Dr. Michael Hausenblas LiDRC - Linked Data Research Centre DERI - Digital Enterprise Research Institute NUIG - National University of Ireland, Galway Ireland, Europe Tel. +353 91 495730 http://linkeddata.deri.ie/ http://sw-app.org/about.html > From: Shane McCarron <shane@aptest.com> > Date: Tue, 08 Sep 2009 11:19:46 -0500 > To: Philip Taylor <pjt47@cam.ac.uk> > Cc: Mark Birbeck <mark.birbeck@webbackplane.com>, Manu Sporny > <msporny@digitalbazaar.com>, HTML WG <public-html@w3.org>, RDFa TF list > <public-rdf-in-xhtml-tf@w3.org> > Subject: Re: FPWD Review Request: HTML+RDFa > Resent-From: RDFa TF list <public-rdf-in-xhtml-tf@w3.org> > Resent-Date: Tue, 08 Sep 2009 16:20:56 +0000 > > Philip, > > Thanks for taking the time to respond so thoroughly. In general I agree > with Mark that the RDFa Syntax Recommendation could have done a better > job of tightening its relationship to the Namespaces in XML > Recommendation. I also agree that if there are pathological cases that > are *important* they should be covered by a test suite. What is > "pathological" and what is "important" are obviously subjective, and > test suite authors spend lots of time debating such things. > > I have some detailed comments in line. > > Philip Taylor wrote: >> Shane McCarron wrote: >>> I would not object to providing examples of extraction algorithms as >>> guidance. We already do this for CURIEs somewhere... But I do not >>> think it is a good idea to normatively define code. >> >> I agree the spec shouldn't normatively define code. When I said it >> "needs to define the prefix mapping extraction algorithm in precise >> detail" I was thinking of something much more abstract than real code, >> though it should still be clear and unambiguous on all the relevant >> details. >> >> Currently I don't see anything in the specs other than vague >> references to the Namespaces in XML spec ("Since CURIE mappings are >> created by authors via the XML namespace syntax [XMLNS] an RDFa >> processor MUST take into account the hierarchical nature of prefix >> declarations" in rdfa-syntax, "CURIE prefix mappings specified using >> xmlns: must be processed using the rules specified in the [Namespaces >> in XML] Recommendation" in HTML5+RDFa), and I want it to be clearer >> about exactly which rules are applied and how they are adapted for >> non-XML content, because otherwise I can produce lots of test cases >> where I can't work out what the spec says the output must be. (I don't >> care how an implementation computes the output, I just want to know >> what the output is.) > Well... Hmm... My opinion differs with yours on this. That reference, > while in prose, is not vague at all. It is a normative reference to a > related W3C Recommendation that defines precisely the syntactic > requirements for what is and is not a legal xmlns: attribute declaration > (see, for example, section 3 - Declaring Namespaces). As an implementor > of RDFa Syntax, it is my responsibility to ensure I am either 1) using a > library to parse my input that already knows about the requirements of > the Namespaces in XML Recommendation, or 2) implement those requirements > myself. Either way, the requirements are clear (and yes, my > implementation is somewhat broken). > > Further, since the RDFa Syntax Recommendation is only concerned about > the "syntax" of those prefix declarations, and has no semantic > requirements beyond that for the use of XML Namespaces, it should be > clear that parts of the Namespaces in XML Recommendation that deal with > how XML Namespaces effect the declaration of elements and attributes is > irrelevant for an RDFa Syntax - conforming processor. > > (Note - I would be very comfortable adding such language in the RDFa > Syntax Errata document immediately. I will bring it up at the next Task > Force call.) > >> >>> The processing model in the current RDFa Syntax Recommendation is >>> sufficiently precise for anyone to understand what must be done in >>> the face of both conforming and non-conforming input. The edge >>> conditions people keep bringing up (what happens if xmlns:="" is >>> defined, etc) are all degenerate cases of the general case of prefix >>> declaration that does not match the syntax definition. If it doesn't >>> match the syntax definition, it is illegal. >> >> Which syntax definition? In http://www.w3.org/TR/rdfa-syntax/ I can >> only find a definition of the CURIE syntax, which is not relevant to >> the issue of handling xmlns:="...". > True. That's what the XML Namespaces Recommendation is for. And it > tightly defines the syntax. Anything that does not conform to that > syntax is not a legal CURIE prefix declaration, and therefore would be > ignored. >> >> (In most cases the CURIE syntax restriction is sufficient - you can't >> have rel="0:test" (it will just be ignored) so it doesn't really >> matter how xmlns:0="..." was processed. But you can write rel=":test", >> so it matters how xmlns:="..." interacts with that. And you can write >> rel="ex:test" and xmlns:ex="" (empty value, illegal in Namespaces in >> XML 1.0), so it matters how that is handled too.) > The XML Namespaces Recommendation clearly says what is illegal, > including xmlns:="...". The RDFa Syntax Recommendation clearly states > that there is no way to define a local default CURIE prefix mapping, and > that rel=":next" is interpreted in the context of the XHTML Vocabulary > URI. So no, I don't think there is any room for misinterpretation or > difference among implementations here. > >> >> >> Presumably http://www.w3.org/TR/REC-xml-names/#NT-PrefixedAttName is >> the relevant syntax definition for namespace prefix declarations, but >> rdfa-syntax doesn't explicitly refer to that. It's implicit when using >> RDFa in XHTML, because XHTML is based on top of xml-names and you'll >> get a well-formedness error if you try writing these invalid things, >> but that doesn't automatically apply when using HTML instead. > RDFa Syntax DOES explicitly, normatively incorporate the XML Namespaces > Recommendation. It also explicitly, normatively says that the XML > Namespace *syntax* is what is used to declare CURIE prefix mappings. I > wouldn't mind explaining in an errata that the syntax is defined at > http://www.w3.org/TR/REC-xml-names/#NT-PrefixedAttName - would that help > address your concerns? >> >> Should the non-syntactic xml-names constraints be required too? e.g. >> what triples should I get if I write the following HTML: >> >> <p xmlns:xml="http://example.org/" property="xml:test">Test</p> >> >> <p xmlns:xmlns="http://www.w3.org/2000/xmlns/" >> property="xmlns:test">Test</p> >> >> <p xmlns:ex="http://www.w3.org/2000/xmlns/" property="ex:test">Test</p> >> >> (which all violate the Namespace Constraints in xml-names)? I presume >> these should all be ignored too, but implementers have not been doing >> that, so evidently it is not sufficiently obvious. > Such prefix declarations are illegal, and therefore MUST be ignored by a > conforming RDFa Processor. Do all processors do so today? I doubt it. > Could they? Of course. Should they? Of course. Would it break > anything in the wild if they started doing so tomorrow? No way. These > are good but pathological cases. I would be happy to add test cases for > them. But in the end, whether we test for these cases or not in no way > changes the definition of RDFa as it was published. We have these > constraints by normative reference already. >> >> >>> If it is illegal, it is ignored. What more does one need in a >>> normative spec? >> >> For RDFa-in-HTML, I'd like it to explicitly state what "illegal" >> means, e.g. whether those Namespace Constraints should be applied in >> non-XML-based versions of HTML. It doesn't need to redefine things >> that are defined elsewhere, but it should explicitly refer to concepts >> like PrefixedAttName and Namespace Constraints that are being used by >> the RDFa-in-HTML processing model, because I don't think they are >> obvious otherwise. > I agree that all the prefix syntax declaration constraints should apply > to both the XHTML and HTML versions of RDFa. I think they do already > because of the normative inclusion of the XML Namespaces Recommendation, > but if you think it would help clarify things I am happy to 1) add an > errata as I mentioned above, and 2) support adding some explicit text to > the RDFa-in-HTML working draft. >> >> >> For both RDFa-in-HTML and RDFa-in-XHTML, I'd also like it to slightly >> more clearly state what "ignored" means: >> >> The "CURIE and URI Processing" section says "any value that is not a >> 'curie' according to the definition in the section CURIE Syntax >> Definition MUST be ignored". The "Sequence" section refers to e.g. >> "the URI from @about, if present, obtained according to the section on >> CURIE and URI Processing", and I think it's clear it should be >> considered not-present if it's not a valid CURIE. So <span >> about="[bogus:bogus]" src="http://example.org/"> should ignore @about >> and use @src, and that's all okay. (Some implementations still get >> this wrong, though.) > I think your interpretation is the correct one, and I think there is a > test case to this effect already. If it were not ignored, then @about > would be interpreted as @about="" and that would refer to the current > document and supercede @src. Michael or Manu, can you confirm there is > already a test case for this? >> >> But it also says "if @property is not present then the [skip element] >> flag is set to 'true'" - is an invalid CURIE meant to be considered >> not-present here too (even though there's no reference to the CURIE >> and URI Processing section)? i.e. should the output from: >> >> <p about="http://example.com/" rel="next"> >> <span property="bogus:bogus"> >> <span about="http://example.net/">Test</span> >> </span> >> </p> >> >> include the triple '<http://example.com/> >> <http://www.w3.org/1999/xhtml/vocab#next> <http://example.net/>' or >> not? Implementations differ. > The rules are to be applied consistently. If there are no legal values > in an attribute declaration, an implementation MUST act as if that > attribute declaration were not present at all. Again, I believe there > are test cases that do this now, and it surprises me that you say > implementations differ on this. In the case of @property, I would > support adding errata to clarify that this behaves as @about behaves if > that would satisfy your concern. >> >> It also says "If the [current element] contains no @rel or @rev >> attribute" - is the attribute meant to be ignored (acting as if the >> element didn't have the attribute at all) if it contains only invalid >> CURIEs (or if it contains no values)? i.e. should the output from: >> >> <p xmlns:ex="http://example.org/" rel="bogus:bogus" >> property="ex:test" href="http://example.org/href">Test</p> >> >> include the triple '<http://example.org/href> >> <http://example.org/test> "Test".' or '<> <http://example.org/test> >> "Test".'? Implementations again differ. > As above - all illegal attribute interpretations should be consistent > throughout. @rel or @rev with no legal values MUST be treated as if the > attribute were not present at all. >> >> The test suite should be extended to cover these cases, in order to >> detect these differences between implementations (because at least one >> must be buggy), if it doesn't already (I haven't checked). But I think >> the RDFa Syntax spec should also be updated to be clear about the >> expected behaviour, because I've tried to read it carefully and I'm >> still not confident enough to know what the output should be. > Understood. We will discuss this at a Task Force meeting and see if > there is a way to introduce a blanket statement via the errata. > However, again, I believe there is no conflict in the spec as written > currently. There is ALWAYS room for misinterpretation in every spec. > We can tighten the language and attempt to make the language more > consistent. >> >> >>> I could come up with a nearly infinite collection of illegal >>> declarations for each of the attributes that are addressed in the >>> RDFa Syntax specification. However, they would all fall into the >>> same class - illegal. When you are doing testing, you don't do >>> "exhaustive" or even "thorough" testing of anything that is >>> sufficiently complex. It is impossible. Instead, you do >>> "equivalence class testing". Identify a couple of use cases from >>> each class of processing for a given interface, test those, and trust >>> that the other values in the class will behave the same way. For >>> example, I would not test every single possible prefix name when >>> exercising a CURIE processing library. It is not just impossible, it >>> is also uninteresting. I would test some good ones and make sure >>> they work. I would test some bad ones and make sure they are >>> ignored. Then I would move on. >> >> I would want to write tests that find bugs. There are lots of >> different classes of bugs when handling illegal input - you might >> forget to check the prefix is non-zero length, or forget to check it's >> an NCName, or forget to check the value is non-empty, or forget to >> check the value is not the xml or xmlns URI, or you might use the 4th >> Edition of XML instead of the 5th, etc. There are dozens of mistakes >> that people can (and apparently do) make when implementing this. Those >> mistakes are not all equivalent, so they should each be tested as >> separate equivalence classes, and it needs a lot more than a few tests >> of illegal input. >> >> (I agree that each class doesn't need to be tested exhaustively - e.g. >> a few non-NCName prefixes are enough to detect bugs if implementations >> aren't correctly checking for NCNames, and there's no need to test >> thousands of non-NCNames because that's very unlikely to find any more >> bugs. But I don't think anyone's ever proposed testing thousands of >> non-NCNames, so I presume that's not really what you're concerned about.) >> > > No, I'm not. Poor testing is my personal soap box. Sorry if I came off > as attacking your testing methodology. In general, I believe it is > important to always identify each equivalence class. There are several > in the case the of XML Namespace prefix syntax, and it is a good idea to > exercise each of them. There are several in the case of CURIE > interpretation in attribute values, and those should be exercised as well. > > What I *personally* avoid is adding tests to make sure something no > longer works wrong. Conformance testing is about ensuring all > implementations work *right* in the presence of correct and incorrect > usage. Failure or regression testing is about adding tests that exercise > a reported failure. Once that reported failure is fixed, that test will > never fail again. Therefore, such tests check to make sure an > implementation no longer works wrong. It doesn't make it a bad test, > but such tests are almost always exercising members of a class of input > that SHOULD have been exercised by conformance testing in the first > place. Rather than add a hodge-podge of tests that touch on specific > failure cases, I strive to define/update the related general equivalence > class. That way you are categorizing the test correctly and exercising > the general feature, as opposed to the specific failure. > > But as I said, that's my personal soap box. I have been standing on it, > beating my breast and shouting, for 25 years. For some reason, there > are people who remain unconvinced. :-P > > Shane P. McCarron Phone: +1 763 786-8160 x120 > Managing Director Fax: +1 763 786-8180 > ApTest Minnesota Inet: shane@aptest.com > > >
Received on Tuesday, 8 September 2009 16:36:03 UTC