Re: Request to publish HTML+RDFa (draft 3) as FPWD from Shane McCarron on 2009-09-21 (public-html@w3.org from September 2009)

From: Shane McCarron <shane@aptest.com>
Date: Mon, 21 Sep 2009 17:01:56 -0500
To: Maciej Stachowiak <mjs@apple.com>
CC: Jonas Sicking <jonas@sicking.cc>, Manu Sporny <msporny@digitalbazaar.com>, HTMLWG WG <public-html@w3.org>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>
Message-ID: <4AB7F7D4.5020401@aptest.com>
Maciej,

My comments inline:

Maciej Stachowiak wrote:
>
>
> Here is the only implementation conformance requirement regarding 
> prefix mapping that I could find in section 5.4:
>
> "Since CURIE mappings are created by authors via the XML namespace 
> syntax [XMLNS] an RDFa processor MUST take into account the 
> hierarchical nature of prefix declarations."
>
> I do not think this is adequate to define the processing model, 
> particularly not in a non-XML context. Indeed, this requirement 
> appears to be redundant with what Section 5.5 says (or is trying to 
> say, anyway), so I'm not sure why it is there at all.

It doesn't define a processing model.  It defines the syntax.  
Regardless, I appreciate that you feel it is extraneous AND 
insufficient.  It is there to help tie together some concepts.  See below.

>
> Section 5.5 does define the processing model in some detail (modulo 
> bugs). But even assuming the bugs are fixed, it does not define 
> anything in the context of HTML as opposed to XML.
Of course it doesn't.  Why would it?  We had no scope to define ANYTHING 
about HTML processing when that document was written.  Let's introduce 
some terms, so that we avoid more confusion.  Let's call the existing, 
approved Recommendation "RDFa Syntax".  Let's call the new candidate 
FPWD "RDFa in HTML".

RDFa Syntax is an XHTML specification.   I fear that we are conflating 
the concepts of document processing with RDFa processing.  The 
conceptual model of RDFa is one in which triples are extracted from a 
document that matches the *syntax* defined in the RDFa Syntax 
Recommendation. That extraction is achieved using the processing rules 
also defined in that Recommendation.  What happens between retrieval of 
the resource (source document) conforming to the syntax rules and the 
parsing of that document by a Conforming RDFa Processor is outside the 
scope of the RDFa Syntax document. It is the purview of the underlying 
(host language) specification.

Manu's draft augments the *syntax* rules that are defined in RDFa Syntax 
so that they are supported in HTML5, but remain *identical* in XHTML and 
HTML.  What you seem to be requesting is that we continue to extend the 
text to tighten down the definition.  I have no objection to that in the 
abstract.  However, as we do that we MUST ensure that the changes do not 
redefine behavior already defined elsewhere.  Further, we MUST ensure 
that changes are made in RDFa Syntax when that is appropriate, and in 
RDFa in HTML when it is not something that effects the base, common 
syntax or the base, common triple extraction rules.


>
> It would not be at all difficult to define this unambiguously in 
> RDFa+HTML. Here is a sample attempt by me:
>
> 'When applying the processing rules of XHTML+RDFa section 5.5 to an 
> HTML document, modify step 2 as follows. In addition to XML namespace 
> declarations, attributes in no namespace that start with the string 
> "xmlns:" create a namespace mapping if the attribute name matches the 
> PrefixedName production from [Namespaces in XML]. For each such 
> attribute, add a mapping to the [local list of URI mappings]; the 
> value to be mapped is the attribute name with the first six characters 
> (the initial "xmlns:") removed, and the value to map to is the 
> attribute value.'
>
> (Note: this allows proper namespace declarations added with 
> namespace-aware DOM APIs to still work in HTML documents. If this is 
> not desired, then simply replace "In addition to" with "Instead of".)

Okay, I understand what you are looking for.  I think that your 
suggested text is correct when talking about the DOM and Infoset 
processing.  But the processing rules in section 5.5 are not written 
from a DOM or Infoset perspective - at least not exclusively nor 
intentionally.  We really, really, really were talking about the syntax 
and then the extraction of data from structures that conform to that 
syntax.  Obviously it is possible to construct DOM trees that contain 
the relevant attributes even if there were no source document at all - 
however, a conforming RDFa processor wouldn't know the difference so... 
it would behave as if there were a source document that conformed to the 
syntax.

Regardless, if a change of this nature satisfies your objection (and 
objections like yours) I would not object to its inclusion.  I would 
probably suggest that text with similar precision be added to RDFa 
Syntax - if only as explanatory text supporting the interpretation of 
the rules in the context of the DOM.

>
>> As to your concern about section 5.5, thanks for bringing that to our 
>> attention.  I proposed errata text to clarify that wording [1] and I 
>> expect it to be approved at the next Task Force meeting.
>>
>> [1] 
>> http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009Sep/0092.html 
>>
>
> I find that errata text a bit confusing. Here's a few issues I spotted:
>
> 1) Instead of "Mappings are provided via the PrefixedAttName 
> production as defined in [XMLNS]," it should probably say "Mappings 
> are provided by XML namespace declarations, excluding default 
> namespace declarations, as defined in [XMLNS]". I say this because 
> RDFa processing rules operate on an abstract tree model, and not at 
> the raw textual level. If you want to use a grammar rule, you have to 
> define what it applies to (the qualified name of the attribute I guess?)

Hmm... As I have indicated above, I believe that the RDFa Syntax 
specification *is* a grammar specification.  It defines extensions to 
XHTML via a module, defines a markup language based upon that module, 
and provides a DTD for that language (we have an XML Schema 
implementation of it too, for some future update). 

The grammar of the prefix declarations for RDFa is defined via the 
PrefixedAttName production.  Your suggested text achieves something very 
different - to me anyway.  The implication is that the prefixes are 
provided by XML namespace declarations... which is not *wrong* for some 
environments... but it's surely not exclusively what we meant NOR 
exclusively how it is used in the wild.

I hope that we were very careful in the Recommendation to indicate that 
it is the *syntax* of the XML Namespace declarations that is used to 
define RDFa prefix mappings.  The fact that those mappings ALSO declare 
an XML Namespace in some contexts is great... but from a syntax 
perspective we don't care.  We don't use XML Namespaces.  I have written 
a few different RDFa and generic CURIE processors now, and none of them 
used XML Namespaces.  Namespaces are just not necessary in order to do 
the extraction of triples via the processing rules in section 5.5.  I am 
sure there are tool chains where it is necessary (because some element 
of the chain has hidden the original syntax from the RDFa processor), 
but that is surely an exercise for that implementor in that style 
toolchain, isn't it?  The specification is not aware of every possible 
way in which the *syntax* of a conforming source document is fed to a 
conforming RDFa processor.


>
> 2) The erratum text says: "The real meaning if this is only clear in 
> the context of Section 5.4.1. Scoping of Prefix Mappings, which 
> normatively includes the syntax processing rules of the Namespaces in 
> XML Recommendation," but section 5.4.1 does not appear to do that. The 
> only mention of XMLNS is in a factual dependent clause: "Since CURIE 
> mappings are created by authors via the XML namespace syntax [XMLNS]", 
> that precedes a conformance requirement to "take into account the 
> hierarchical nature of prefix declarations". As far as I can tell, 
> there is no conformance requirement to follow the syntax processing 
> rules of Namespaces in XML in general. If such a requirement was 
> intended, it should be stated clearly. Though personally, I think it's 
> better to precisely define the exact processing rules in section 5.5, 
> since Namespaces in XML is defined purely at a textual level, but RDFa 
> processing is defined on an abstract tree model, so it's not 
> necessarily obvious how to apply the rules.

I see what you are saying, and I do not mind making a further specific 
normative reference to [XMLNS] and its attendant syntax in section 5.5 
step 2.  I will try to update my proposed errata text to reflect your 
concerns.  I further agree that section 5.5 is written in a way that 
makes it possible to interpret the rules in the context of an abstract 
tree.  However, that section does not REQUIRE an abstract tree in order 
for it to be implemented.  You could, for example, implement the whole 
mess using a tokenizing parser that had callouts each time a token was 
encountered.  I did an implementation that way in Perl just for fun one 
weekend. 

>>
>>>
>>> Needless to say, I am not satisfied that my comment on this has been 
>>> addressed. It appears to me that the xmlns processing model for HTML 
>>> remains totally undefined.
>>
>> There is no "xmlns" processing model in RDFa.  There is a syntax 
>> specification and rules for extracting prefix mappings from that 
>> syntax.  Both of those are normative, including by reference for 
>> their relevant, defining Recommendations.
>
> Namespaces in XML does not apply to HTML, it only defines grammar and 
> processing rules for well-formed XML documents. So citing Namespaces 
> in XML doesn't answer anything. It's like explaining UTF-8 by pointing 
> to a spec for UTF-16 surrogate pairs.

Well - again, obviously I disagree.  The syntax rules defined in 
Namespaces in XML could be used in ANY language if you wanted to.  It's 
just an eBNF grammar, after all.  The Namespaces in XML Recommendation 
does not define the ways in which those syntactic namespace declarations 
are mapped into a DOM, nor into the Infoset.  It defines a syntax and 
also defines the hierarchic nature of XML Namespace declarations.  
Section 5.5 step 2 is (or at least attempts to be) explicit about the 
handling of this syntax.  Section 5.5 overall defines a recursive 
processing model that incorporates the hierarchic nature of the prefix 
declared via the syntax.  Finally, section 5.4.1 expressly refers to 
syntax of XML Namespaces AND their hierarchical nature, in an attempt to 
ensure these concepts were clear to the reader / implementor - in 
particular to an implementor who might not be working in some abstract 
tree environment nor in some environment in which the "namespaceness" of 
the XML Namespace declarations is enforced.


>
> It's really not very hard to define the processing rules in a clear an 
> precise way. I gave an example for how to do it. This doesn't have a 
> material effect on the intent of the spec, it just makes it unambiguous.

I am sure that we are working toward the same goals here.  Several of us 
in the RDFa Task Force spent a lot of time over years trying to ensure 
that the language in the RDFa Syntax document is not proscriptive.  That 
remains my primary concern.  We have to be certain that we are not 
pre-supposing a processing model nor a processing environment.  That 
doesn't mean we can't say what we mean in more precise language.  It 
also doesn't mean we cannot provide guidance to implementors in various 
environments.  However, I am adamant that guidance be provided outside 
of the W3C Recommendation (e.g., in an implementor's guide wiki).  That 
way we can keep it up to date, extend it as we learn, and NOT put 
implementation-specific language into the general case document that a 
W3C recommendation should (always) be.

Thanks as always for your insight!

-- 
Shane P. McCarron                          Phone: +1 763 786-8160 x120
Managing Director                            Fax: +1 763 786-8180
ApTest Minnesota                            Inet: shane@aptest.com
Received on Monday, 21 September 2009 22:02:55 UTC