Re: FPWD Review Request: HTML+RDFa from Michael Hausenblas on 2009-09-08 (public-html@w3.org from September 2009)

From: Michael Hausenblas <michael.hausenblas@deri.org>
Date: Tue, 08 Sep 2009 17:35:18 +0100
To: Shane McCarron <shane@aptest.com>, Philip Taylor <pjt47@cam.ac.uk>
CC: Mark Birbeck <mark.birbeck@webbackplane.com>, Manu Sporny <msporny@digitalbazaar.com>, HTML WG <public-html@w3.org>, RDFa TF list <public-rdf-in-xhtml-tf@w3.org>
Message-ID: <C6CC4656.82A5%michael.hausenblas@deri.org>
>> The "CURIE and URI Processing" section says "any value that is not a
>> 'curie' according to the definition in the section CURIE Syntax
>> Definition MUST be ignored". The "Sequence" section refers to e.g.
>> "the URI from @about, if present, obtained according to the section on
>> CURIE and URI Processing", and I think it's clear it should be
>> considered not-present if it's not a valid CURIE. So <span
>> about="[bogus:bogus]" src="http://example.org/"> should ignore @about
>> and use @src, and that's all okay. (Some implementations still get
>> this wrong, though.)
> I think your interpretation is the correct one, and I think there is a
> test case to this effect already.  If it were not ignored, then @about
> would be interpreted as @about="" and that would refer to the current
> document and supercede @src.  Michael or Manu, can you confirm there is
> already a test case for this?

AFAIK not precisely this case. What comes closest are TC35-37 and TC42.
FWIW, I guess we can add one TC concerning this, if desired.

Cheers,
      Michael

-- 
Dr. Michael Hausenblas
LiDRC - Linked Data Research Centre
DERI - Digital Enterprise Research Institute
NUIG - National University of Ireland, Galway
Ireland, Europe
Tel. +353 91 495730
http://linkeddata.deri.ie/
http://sw-app.org/about.html



> From: Shane McCarron <shane@aptest.com>
> Date: Tue, 08 Sep 2009 11:19:46 -0500
> To: Philip Taylor <pjt47@cam.ac.uk>
> Cc: Mark Birbeck <mark.birbeck@webbackplane.com>, Manu Sporny
> <msporny@digitalbazaar.com>, HTML WG <public-html@w3.org>, RDFa TF list
> <public-rdf-in-xhtml-tf@w3.org>
> Subject: Re: FPWD Review Request: HTML+RDFa
> Resent-From: RDFa TF list <public-rdf-in-xhtml-tf@w3.org>
> Resent-Date: Tue, 08 Sep 2009 16:20:56 +0000
> 
> Philip,
> 
> Thanks for taking the time to respond so thoroughly.  In general I agree
> with Mark that the RDFa Syntax Recommendation could have done a better
> job of tightening its relationship to the Namespaces in XML
> Recommendation.  I also agree that if there are pathological cases that
> are *important* they should be covered by a test suite.  What is
> "pathological" and what is "important" are obviously subjective, and
> test suite authors spend lots of time debating such things.
> 
> I have some detailed comments in line.
> 
> Philip Taylor wrote:
>> Shane McCarron wrote:
>>> I would not object to providing examples of extraction algorithms as
>>> guidance.  We already do this for CURIEs somewhere...  But I do not
>>> think it is a good idea to normatively define code.
>> 
>> I agree the spec shouldn't normatively define code. When I said it
>> "needs to define the prefix mapping extraction algorithm in precise
>> detail" I was thinking of something much more abstract than real code,
>> though it should still be clear and unambiguous on all the relevant
>> details.
>> 
>> Currently I don't see anything in the specs other than vague
>> references to the Namespaces in XML spec ("Since CURIE mappings are
>> created by authors via the XML namespace syntax [XMLNS] an RDFa
>> processor MUST take into account the hierarchical nature of prefix
>> declarations" in rdfa-syntax, "CURIE prefix mappings specified using
>> xmlns: must be processed using the rules specified in the [Namespaces
>> in XML] Recommendation" in HTML5+RDFa), and I want it to be clearer
>> about exactly which rules are applied and how they are adapted for
>> non-XML content, because otherwise I can produce lots of test cases
>> where I can't work out what the spec says the output must be. (I don't
>> care how an implementation computes the output, I just want to know
>> what the output is.)
> Well... Hmm... My opinion differs with yours on this.  That reference,
> while in prose, is not vague at all.  It is a normative reference to a
> related W3C Recommendation that defines precisely the syntactic
> requirements for what is and is not a legal xmlns: attribute declaration
> (see, for example, section 3 - Declaring Namespaces).  As an implementor
> of RDFa Syntax, it is my responsibility to ensure I am either 1) using a
> library to parse my input that already knows about the requirements of
> the Namespaces in XML Recommendation, or 2) implement those requirements
> myself.  Either way, the requirements are clear (and yes, my
> implementation is somewhat broken).
> 
> Further, since the RDFa Syntax Recommendation is only concerned about
> the "syntax" of those prefix declarations, and has no semantic
> requirements beyond that for the use of XML Namespaces, it should be
> clear that parts of the Namespaces in XML Recommendation that deal with
> how XML Namespaces effect the declaration of elements and attributes is
> irrelevant for an RDFa Syntax - conforming processor.
> 
> (Note - I would be very comfortable adding such language in the RDFa
> Syntax Errata document immediately.  I will bring it up at the next Task
> Force call.)
> 
>> 
>>> The processing model in the current RDFa Syntax Recommendation is
>>> sufficiently precise for anyone to understand what must be done in
>>> the face of both conforming and non-conforming input.  The edge
>>> conditions people keep bringing up (what happens if xmlns:="" is
>>> defined, etc) are all degenerate cases of the general case of prefix
>>> declaration that does not match the syntax definition.  If it doesn't
>>> match the syntax definition, it is illegal.
>> 
>> Which syntax definition? In http://www.w3.org/TR/rdfa-syntax/ I can
>> only find a definition of the CURIE syntax, which is not relevant to
>> the issue of handling xmlns:="...".
> True.  That's what the XML Namespaces Recommendation is for.  And it
> tightly defines the syntax.  Anything that does not conform to that
> syntax is not a legal CURIE prefix declaration, and therefore would be
> ignored.
>> 
>> (In most cases the CURIE syntax restriction is sufficient - you can't
>> have rel="0:test" (it will just be ignored) so it doesn't really
>> matter how xmlns:0="..." was processed. But you can write rel=":test",
>> so it matters how xmlns:="..." interacts with that. And you can write
>> rel="ex:test" and xmlns:ex="" (empty value, illegal in Namespaces in
>> XML 1.0), so it matters how that is handled too.)
> The XML Namespaces Recommendation clearly says what is illegal,
> including xmlns:="...".  The RDFa Syntax Recommendation clearly states
> that there is no way to define a local default CURIE prefix mapping, and
> that rel=":next" is interpreted in the context of the XHTML Vocabulary
> URI.  So no, I don't think there is any room for misinterpretation or
> difference among implementations here.
> 
>> 
>> 
>> Presumably http://www.w3.org/TR/REC-xml-names/#NT-PrefixedAttName is
>> the relevant syntax definition for namespace prefix declarations, but
>> rdfa-syntax doesn't explicitly refer to that. It's implicit when using
>> RDFa in XHTML, because XHTML is based on top of xml-names and you'll
>> get a well-formedness error if you try writing these invalid things,
>> but that doesn't automatically apply when using HTML instead.
> RDFa Syntax DOES explicitly, normatively incorporate the XML Namespaces
> Recommendation.  It also explicitly, normatively says that the XML
> Namespace *syntax* is what is used to declare CURIE prefix mappings.  I
> wouldn't mind explaining in an errata that the syntax is defined at
> http://www.w3.org/TR/REC-xml-names/#NT-PrefixedAttName - would that help
> address your concerns?
>> 
>> Should the non-syntactic xml-names constraints be required too? e.g.
>> what triples should I get if I write the following HTML:
>> 
>>   <p xmlns:xml="http://example.org/" property="xml:test">Test</p>
>> 
>>   <p xmlns:xmlns="http://www.w3.org/2000/xmlns/"
>> property="xmlns:test">Test</p>
>> 
>>   <p xmlns:ex="http://www.w3.org/2000/xmlns/" property="ex:test">Test</p>
>> 
>> (which all violate the Namespace Constraints in xml-names)? I presume
>> these should all be ignored too, but implementers have not been doing
>> that, so evidently it is not sufficiently obvious.
> Such prefix declarations are illegal, and therefore MUST be ignored by a
> conforming RDFa Processor.  Do all processors do so today?  I doubt it.
> Could they?  Of course.  Should they?  Of course.  Would it break
> anything in the wild if they started doing so tomorrow?  No way.  These
> are good but pathological cases.  I would be happy to add test cases for
> them.  But in the end, whether we test for these cases or not in no way
> changes the definition of RDFa as it was published.  We have these
> constraints by normative reference already.
>> 
>> 
>>> If it is illegal, it is ignored.  What more does one need in a
>>> normative spec?
>> 
>> For RDFa-in-HTML, I'd like it to explicitly state what "illegal"
>> means, e.g. whether those Namespace Constraints should be applied in
>> non-XML-based versions of HTML. It doesn't need to redefine things
>> that are defined elsewhere, but it should explicitly refer to concepts
>> like PrefixedAttName and Namespace Constraints that are being used by
>> the RDFa-in-HTML processing model, because I don't think they are
>> obvious otherwise.
> I agree that all the prefix syntax declaration constraints should apply
> to both the XHTML and HTML versions of RDFa.  I think they do already
> because of the normative inclusion of the XML Namespaces Recommendation,
> but if you think it would help clarify things I am happy to 1) add an
> errata as I mentioned above, and 2) support adding some explicit text to
> the RDFa-in-HTML working draft.
>> 
>> 
>> For both RDFa-in-HTML and RDFa-in-XHTML, I'd also like it to slightly
>> more clearly state what "ignored" means:
>> 
>> The "CURIE and URI Processing" section says "any value that is not a
>> 'curie' according to the definition in the section CURIE Syntax
>> Definition MUST be ignored". The "Sequence" section refers to e.g.
>> "the URI from @about, if present, obtained according to the section on
>> CURIE and URI Processing", and I think it's clear it should be
>> considered not-present if it's not a valid CURIE. So <span
>> about="[bogus:bogus]" src="http://example.org/"> should ignore @about
>> and use @src, and that's all okay. (Some implementations still get
>> this wrong, though.)
> I think your interpretation is the correct one, and I think there is a
> test case to this effect already.  If it were not ignored, then @about
> would be interpreted as @about="" and that would refer to the current
> document and supercede @src.  Michael or Manu, can you confirm there is
> already a test case for this?
>> 
>> But it also says "if @property is not present then the [skip element]
>> flag is set to 'true'" - is an invalid CURIE meant to be considered
>> not-present here too (even though there's no reference to the CURIE
>> and URI Processing section)? i.e. should the output from:
>> 
>>     <p about="http://example.com/" rel="next">
>>       <span property="bogus:bogus">
>>         <span about="http://example.net/">Test</span>
>>       </span>
>>     </p>
>> 
>> include the triple '<http://example.com/>
>> <http://www.w3.org/1999/xhtml/vocab#next> <http://example.net/>' or
>> not? Implementations differ.
> The rules are to be applied consistently.  If there are no legal values
> in an attribute declaration, an implementation MUST act as if that
> attribute declaration were not present at all.  Again, I believe there
> are test cases that do this now, and it surprises me that you say
> implementations differ on this.  In the case of @property, I would
> support adding errata to clarify that this behaves as @about behaves if
> that would satisfy your concern.
>> 
>> It also says "If the [current element] contains no @rel or @rev
>> attribute" - is the attribute meant to be ignored (acting as if the
>> element didn't have the attribute at all) if it contains only invalid
>> CURIEs (or if it contains no values)? i.e. should the output from:
>> 
>>   <p xmlns:ex="http://example.org/" rel="bogus:bogus"
>> property="ex:test" href="http://example.org/href">Test</p>
>> 
>> include the triple '<http://example.org/href>
>> <http://example.org/test> "Test".' or '<> <http://example.org/test>
>> "Test".'? Implementations again differ.
> As above - all illegal attribute interpretations should be consistent
> throughout.  @rel or @rev with no legal values MUST be treated as if the
> attribute were not present at all.
>> 
>> The test suite should be extended to cover these cases, in order to
>> detect these differences between implementations (because at least one
>> must be buggy), if it doesn't already (I haven't checked). But I think
>> the RDFa Syntax spec should also be updated to be clear about the
>> expected behaviour, because I've tried to read it carefully and I'm
>> still not confident enough to know what the output should be.
> Understood.  We will discuss this at a Task Force meeting and see if
> there is a way to introduce a blanket statement via the errata.
> However, again, I believe there is no conflict in the spec as written
> currently.  There is ALWAYS room for misinterpretation in every spec.
> We can tighten the language and attempt to make the language more
> consistent. 
>> 
>> 
>>> I could come up with a nearly infinite collection of illegal
>>> declarations for each of the attributes that are addressed in the
>>> RDFa Syntax specification.  However, they would all fall into the
>>> same class - illegal.  When you are doing testing, you don't do
>>> "exhaustive" or even "thorough" testing of anything that is
>>> sufficiently complex.  It is impossible.  Instead, you do
>>> "equivalence class testing".  Identify a couple of use cases from
>>> each class of processing for a given interface, test those, and trust
>>> that the other values in the class will behave the same way.  For
>>> example, I would not test every single possible prefix name when
>>> exercising a CURIE processing library.  It is not just impossible, it
>>> is also uninteresting.  I would test some good ones and make sure
>>> they work.  I would test some bad ones and make sure they are
>>> ignored.  Then I would move on.
>> 
>> I would want to write tests that find bugs. There are lots of
>> different classes of bugs when handling illegal input - you might
>> forget to check the prefix is non-zero length, or forget to check it's
>> an NCName, or forget to check the value is non-empty, or forget to
>> check the value is not the xml or xmlns URI, or you might use the 4th
>> Edition of XML instead of the 5th, etc. There are dozens of mistakes
>> that people can (and apparently do) make when implementing this. Those
>> mistakes are not all equivalent, so they should each be tested as
>> separate equivalence classes, and it needs a lot more than a few tests
>> of illegal input.
>> 
>> (I agree that each class doesn't need to be tested exhaustively - e.g.
>> a few non-NCName prefixes are enough to detect bugs if implementations
>> aren't correctly checking for NCNames, and there's no need to test
>> thousands of non-NCNames because that's very unlikely to find any more
>> bugs. But I don't think anyone's ever proposed testing thousands of
>> non-NCNames, so I presume that's not really what you're concerned about.)
>> 
> 
> No, I'm not.  Poor testing is my personal soap box.  Sorry if I came off
> as attacking your testing methodology.  In general, I believe it is
> important to always identify each equivalence class.  There are several
> in the case the of XML Namespace prefix syntax, and it is a good idea to
> exercise each of them.  There are several in the case of CURIE
> interpretation in attribute values, and those should be exercised as well.
> 
> What I *personally* avoid is adding tests to make sure something no
> longer works wrong. Conformance testing is about ensuring all
> implementations work *right* in the presence of correct and incorrect
> usage. Failure or regression testing is about adding tests that exercise
> a reported failure. Once that reported failure is fixed, that test will
> never fail again.  Therefore, such tests check to make sure an
> implementation no longer works wrong.  It doesn't make it a bad test,
> but such tests are almost always exercising members of a class of input
> that SHOULD have been exercised by conformance testing in the first
> place.  Rather than add a hodge-podge of tests that touch on specific
> failure cases, I strive to define/update the related general equivalence
> class.  That way you are categorizing the test correctly and exercising
> the general feature, as opposed to the specific failure.
> 
> But as I said, that's my personal soap box.  I have been standing on it,
> beating my breast and shouting, for 25 years.  For some reason, there
> are people who remain unconvinced.  :-P
> 
> Shane P. McCarron                          Phone: +1 763 786-8160 x120
> Managing Director                            Fax: +1 763 786-8180
> ApTest Minnesota                            Inet: shane@aptest.com
> 
> 
>
Received on Tuesday, 8 September 2009 16:36:04 UTC