Re: Review comments on HTML+RDFa (was Re: FPWD Review Request: HTML+RDFa) from Maciej Stachowiak on 2009-09-02 (public-rdf-in-xhtml-tf@w3.org from September 2009)

From: Maciej Stachowiak <mjs@apple.com>
Date: Tue, 01 Sep 2009 23:50:20 -0700
To: Manu Sporny <msporny@digitalbazaar.com>
Cc: HTMLWG WG <public-html@w3.org>, RDFa Developers <public-rdf-in-xhtml-tf@w3.org>
Message-id: <165CBAE1-AE50-4334-88C4-38DCEAB7522A@apple.com>
Hi Manu,

Thanks for your prompt reply. And thanks again for submitting this  
draft. I think that is a very positive step.

On Sep 1, 2009, at 9:20 PM, Manu Sporny wrote:

> Maciej Stachowiak wrote:
>
>> 4 Modifications to XHTML+RDFa
>> - One concern I have with only applying the changes to HTML: what  
>> if an
>> RDFa processor has a parsed DOM, but does not know if the DOM was
>> originally created from parsing HTML or XML?
>
> Hmm... that shouldn't matter. What made you think it does matter?
>
>> It would be better if a
>> single set of rules could be used once you have a DOM, without  
>> having to
>> know what kind it is, since the DOM itself does not directly expose  
>> that
>> information.
>
> This was the intent of the section, what made you think that the rules
> for an XHTML DOM and an HTML DOM were different?

I may have just failed to understand the spec. Here's what led to my  
conclusion:

"XHTML+RDFa specifies the attributes and processing rules for  
extracting RDF from an XHTML document. This section specifies changes  
to the attributes and processing rules defined in XHTML+RDFa in order  
to support extracting RDF from HTML documents."

To me, this implies that the changes here apply only to HTML (as in  
the text/html serialization), but XHTML (even XHTML5) should be  
processed strictly according to XHTML+RDFa and nothing else.

As a concrete example, my reading was that the "lang" attribute is  
only processed for text/html documents, but "xml:lang" is only  
processed for XML documents. Thus the recommendation to include both,  
since that would be the only way to get consistent behavior. That  
seems like a case where an RDFa processor that works on a DOM would  
have to know if the DOM came from an HTML or XML serialization. Am I  
misunderstanding?

>
>> 4.2 Invalid XMLLiteral values
>>
>> - Do XMLiteral values only need to be well-formed, or do they need  
>> to be
>> namespace well-formed?
>
> I believe the current consensus is that they just need to be
> well-formed. Did you have a technical reason why they should be one  
> over
> the other?

I figured since namespaces are important to RDF, that ns-well-formed  
content would be desired. I don't know of a strong reason to prefer  
one or the other.

However, I have noticed what I think is another problem with this  
section. The definition for "well-formed XML" points to the definition  
of a well-formed XML document. But it appears to me that an XMLLiteral  
is an XML fragment, not an XML document, and in general an XML  
fragment need not be a well-fomed XML document (for example it may  
have multiple elements and text nodes at top level instead of a single  
root). Further, serializing an elements DOM children a XHTML5 per the  
spec will not guarantee a well-formed XML document. I believe it will  
guarantee a valid XML fragment, but I'm not sure offhand where that is  
defined.

Conclusion: I think the definition here should point to a definition  
of well-formed XML fragment.

Additional comment: it seems like the serialization as XHTML5 per the  
HTML5 spec rules should always be done. A DOM fragment doesn't really  
have a notion of being well-formed XML or not - it needs to be  
serialized somehow. And it probably makes sense to use the HTML5  
algorithm regardless of whether the source DOM tree was HTML or XML.  
This might avoid the need to link to any well-formedness definitions  
(not sure though).

>
>> 4.3 The xmlns: attribute
>>   - "CURIE prefix mappings specified using xmlns:" does not clearly
>> specify how attributes starting with xmlns: turn into prefix  
>> mappings.
>> The processing model for this should be defined precisely.
>
> The processing rules for converting xmlns: to prefix mappings are
> outlined in the XHTML+RDFa spec, Section 5.5:
>
> http://www.w3.org/TR/rdfa-syntax/#sec_5.5.
>
> Is that sufficient? If not, why not?

Some concerns:

- The draft references [Namespaces in XML], not Section 5.5 of RDFa in  
XHTML.

Looking at the bit of that section that's relevant:

"Next the [current element] is parsed for [URI mapping]s... Mappings  
are provided by @xmlns. The value to be mapped is set by the XML  
namespace prefix, and the value to map is the value of the attribute—a  
URI."

However, since HTML doesn't really have a notion of XML namespace  
prefix, the processing rules need to be defined in terms of the  
textual name of the attribute for HTML DOMs; you can't soundly  
reference XML-only concepts to define things for HTML.

Also, reading over this, it seems like the processing rule is wrong  
even for RDFa in XML! The attribute named "xmlns" does not establish  
any namespace prefix binding, it just gives the default namespace URI.  
Rather than @xmlns, the spec surely meant to say something like  
"Mappings are provided by XML namespace declarations - attributes that  
have the xmlns namespace prefix". Second, the part of the attribute  
that should define the prefix binding is the local name, not the XML  
namespace prefix - the XML namespace prefix for all non-default  
namespace decarations is the string "xmlns", and for the literal  
attribute name "xmlns" the namespace prefix is the empty string. It  
seems to me this needs to be errata'd, because the spec taken  
literally is surely incompatible with what all real RDFa processors do.



>
>> General comments:
>> - I found it very hard to follow this document, since it seems to  
>> assume
>> full knowledge of RDFa in XHTML and only defines a delta.
>
> That's correct, this spec does require full knowledge of XHTML+RDFa.  
> The
> document attempts to not duplicate normative content between XHTML 
> +RDFa
> and HTML5+RDFa specifications. There are very few changes needed to  
> put
> RDFa into HTML5, so we didn't see a need to re-state large sections of
> the RDFa specification in this document. By duplicating the XHTML+RDFa
> REC language, we create a mechanism where we unnecessarily duplicate
> content at best, and at worst, we could accidentally deviate from the
> pre-existing RDFa REC language (and the test suite).

At the very least, references to the appropriate sections of XHTML 
+RDFa should be made explicit. Right now it seems there is a lot of  
implicit linkage. It also seems reviewers will have to study XHTML 
+RDFa to properly review HTML+RDFa.

>
>> As a result:
>> - It was hard for me to understand the actual processing model, so
>> that I'd understand what I had to do as an implementor.
>
> The processing model is the exact same as XHTML+RDFa, except for  
> section
> 4.1 and 4.2 in the HTML5+RDFa document. Would expressing which steps
> section 4.1 and 4.2 refer to in the XHTML+RDFa document be beneficial?

Definitely. And also stating very clearly what should be done  
differently, and whether it applies only to HTML DOMs, or to XML DOMs  
as well.

>
>> - I had no notion of the syntax, so I wouldn't know what to do as an
>> author.
>
> The syntax is covered in detail in the XHTML+RDFa Syntax and
> Processing[1] document as well as the RDFa Primer[2] document. Are  
> these
> not sufficient?

I would tentatively guess it's not sufficient, since those don't cover  
the syntax to use in HTML at all, and there are apparently some  
differences.

>
>>   - As a reviewer, it was impossible for me to determine if the
>> processing requirements were precisely specified, free of  
>> contradictions
>> and sane.
>
> Would making the changes you listed help alleviate this issue?

Only partly. I think what I need to do to give a sufficiently thorough  
review is to review RDFa+XHTML itself. That ma take a while - it is  
considerably longer than the draft you submitted.

>
>> For example, there was the idea to use a
>> "prefix" attribute instead of xmlns: declarations to define CURIE
>> prefixes, and also the idea to allow full URIs as an alternative to
>> CURIEs. Have these ideas been rejected?
>
> Neither idea has been rejected. We're still discussing @prefix, but we
> cannot add it to XHTML+RDFa without performing a revision of the
> specification -- including the usual LC->REC process.
>
> @prefix is part of a larger set of changes that may be realized in  
> RDFa
> 1.1 (the next version of RDFa, which we hope will unify RDFa  
> expression
> in both HTML and XHTML). We hope that RDFa 1.1 will replace the  
> current
> XHTML+RDFa and HTML5+RDFa FPWD with one specification document. Hence,
> why I personally think it would be better to have RDFa defined outside
> of the HTML5 specification than inside.
>
> We discussed full URI support in @rel/@rev/@property/@resource, and
> there is a technical solution that would allow it to happen, but it  
> was
> met with some pushback in the RDFa community. I prefer to have this
> supported in RDFa, but we haven't attempted to gather consensus around
> the feature and probably won't until RDFa 1.1.
>
> RDFa 1.1 could also have a mechanism to extend the set of keywords,
> allowing more Microformats-like property names (that map to URIs), but
> again... that's a feature that may not be standardized for another  
> year
> or so and would require a full LC->REC process.

Based on what you say, RDFa 1.1 seems potentially more interesting  
than the posted draft. Folding in text/html support in a primary spec  
instead of a delta spec, and enabling cross-serialization DOM  
consistency, both sound like big wins.

What's the timeline for RDFa 1.1? Is it necessary to wait a year? Will  
the work be hosted by an existing Working Group, or will a new one be  
formed?

You mention that a full LC->REC process is needed, but the same is  
true for the draft you posted.


Regards,
Maciej
Received on Wednesday, 2 September 2009 06:52:22 UTC