Re: Review comments on HTML+RDFa (was Re: FPWD Review Request: HTML+RDFa) from Manu Sporny on 2009-09-02 (public-rdf-in-xhtml-tf@w3.org from September 2009)

From: Manu Sporny <msporny@digitalbazaar.com>
Date: Wed, 02 Sep 2009 13:28:06 -0400
To: HTMLWG WG <public-html@w3.org>
CC: RDFa Developers <public-rdf-in-xhtml-tf@w3.org>
Message-ID: <4A9EAB26.3040507@digitalbazaar.com>
Maciej Stachowiak wrote:
> I may have just failed to understand the spec. Here's what led to my
> conclusion:
> 
> "XHTML+RDFa specifies the attributes and processing rules for extracting
> RDF from an XHTML document. This section specifies changes to the
> attributes and processing rules defined in XHTML+RDFa in order to
> support extracting RDF from HTML documents."
> 
> To me, this implies that the changes here apply only to HTML (as in the
> text/html serialization), but XHTML (even XHTML5) should be processed
> strictly according to XHTML+RDFa and nothing else.
> 
> As a concrete example, my reading was that the "lang" attribute is only
> processed for text/html documents, but "xml:lang" is only processed for
> XML documents. Thus the recommendation to include both, since that would
> be the only way to get consistent behavior. That seems like a case where
> an RDFa processor that works on a DOM would have to know if the DOM came
> from an HTML or XML serialization. Am I misunderstanding?

Ahh, now I see how you came to that conclusion. Thanks.

The intent is to have a unified set of rules for both XHTML and HTML.
That intent is clearly not conveyed in an effective manner and the text
that you cite. The current HTML+RDFa FPWD spec is confusing the matter.
I've added the comment to the wiki as the paragraph should be reworded
to be more clear.

In general, an RDFa processor should not have to detect whether the DOM
came from an HTML or XML serialization. The only reason I say "In
general" is because this may not hold true for retrieving the
xmlns:<prefix> mappings -- or the RDFa processor implementation may need
to try multiple calls to the DOM to detect whether or not xmlns:<prefix>
mappings exist for a particular element.

To be clear, the intent is that RDFa Processors should use the same
processing rules when processing "lang" and "xml:lang" for both HTML and
XHTML DOMs.

That being said, using @lang in an XHTML 1.1 document will result in a
non-conformant document:

http://www.w3.org/TR/xhtml11/changes.html

Since RDFa is defined as operations on an tree-based model (a DOM-like
structure), we can state rules that may operate on a non-conformant
documents that are translated into a DOM.

Does that clarify the intent? If so, I'll attempt to author language
that makes this more clear.

>>> 4.2 Invalid XMLLiteral values
>>>
>>> - Do XMLiteral values only need to be well-formed, or do they need to be
>>> namespace well-formed?
>>
>> I believe the current consensus is that they just need to be
>> well-formed. Did you have a technical reason why they should be one over
>> the other?
> 
> I figured since namespaces are important to RDF, that ns-well-formed
> content would be desired. I don't know of a strong reason to prefer one
> or the other.

Keep in mind that we're talking about XMLLiterals, not RDF, so while
namespaces are important for RDF, it doesn't necessarily follow that
they're important to to XMLLiterals. XMLLiterals were intended as a way
to express XML markup... they also create a number of headaches for
implementers and are somewhat annoying to use, so we're thinking of
changing them for RDFa 1.1 (but there is absolutely no consensus around
doing that).

The XHTML+RDFa specification requires preservation of whitespace and
namespaces that are defined in a parent element in XMLLiterals. So, it's
possible to have a snippet of XML that is not ns-well-formed:

<p xmlns:ex="http://example.org/vocab#" 	
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
   This is a rectangle (the markup for it is stored in a triple):
   <svg xmlns="http://www.w3.org/2000/svg"
        property="ex:markup" datatype="rdf:XMLLiteral">
      <rect width="300" height="100"
            style="fill:rgb(0,0,255);stroke-width:1;
            stroke:rgb(0,0,0)"/>
   </svg>
</p>

The markup above should produce the following triple:

<> ex:markup "<rect width=\"300\" height=\"100\"
            style=\"fill:rgb(0,0,255);stroke-width:1;
            stroke:rgb(0,0,0)\"/>"^^rdf:XMLLiteral

Tests 100-103 cover these namespace/whitespace preservation cases:

http://rdfa.digitalbazaar.com/rdfa-test-harness/

However, I'll check with the RDFa TF on this as it seems as if
xmlns="http://www.w3.org/2000/svg" should be preserved in this case...
don't remember why we don't preserve it. I may not be remembering a spec
detail correctly.

> Conclusion: I think the definition here should point to a definition of
> well-formed XML fragment.

I'll make the spec text more clear on this. I don't know if there is a
definition of a well-formed XML fragment anywhere. My understanding was
that a well-formed XML fragment is any XML fragment that you can
encapsulate in a single root element and that passes the test for a
well-formed document. For example:

<foo>
 YOUR_XMLLITERAL_HERE
</foo>

If the above passes an XML well-formedness validator, then you should
generate the XMLLiteral triple. I've added a note to the wiki[1] to
address this concern.

> Some concerns:

I've added these concerns to the wiki.[1]

> Also, reading over this, it seems like the processing rule is wrong even
> for RDFa in XML! The attribute named "xmlns" does not establish any
> namespace prefix binding, it just gives the default namespace URI.
> Rather than @xmlns, the spec surely meant to say something like
> "Mappings are provided by XML namespace declarations - attributes that
> have the xmlns namespace prefix". Second, the part of the attribute that
> should define the prefix binding is the local name, not the XML
> namespace prefix - the XML namespace prefix for all non-default
> namespace decarations is the string "xmlns", and for the literal
> attribute name "xmlns" the namespace prefix is the empty string. It
> seems to me this needs to be errata'd, because the spec taken literally
> is surely incompatible with what all real RDFa processors do.

Nice catch. I've raised this as an errata item for XHTML+RDFa:

http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009Sep/0015.html

We didn't have a single implementor or reviewer catch that comment,
perhaps because the test suite and examples made it clear what was being
discussed, but that doesn't mean it shouldn't be changed to be more
accurate.

>> The syntax is covered in detail in the XHTML+RDFa Syntax and
>> Processing[1] document as well as the RDFa Primer[2] document. Are these
>> not sufficient?
> 
> I would tentatively guess it's not sufficient, since those don't cover
> the syntax to use in HTML at all, and there are apparently some differences.

There shouldn't be any differences in the syntax that is processed by
the RDFa processor between an XHTML vs. HTML serialization. I'll attempt
to clarify this in the HTML+RDFa spec.

> Based on what you say, RDFa 1.1 seems potentially more interesting than
> the posted draft. Folding in text/html support in a primary spec instead
> of a delta spec, and enabling cross-serialization DOM consistency, both
> sound like big wins.

Yes, I believe most of the RDFa community would agree with that statement.

> What's the timeline for RDFa 1.1? Is it necessary to wait a year? Will
> the work be hosted by an existing Working Group, or will a new one be
> formed?

The current RDFa work will continue through the end of the year as a
part of the Semantic Web Deployment Working Group. At the end of the
year, it is expected that the work will be continued in an RDFa Working
Group and that group will publish a unified RDFa 1.1 specification
(covering as many languages as possible: HTML, XHTML, SVG, ODF, etc.).

> You mention that a full LC->REC process is needed, but the same is true
> for the draft you posted.

Correct, but it's simpler and more effective to do the LC->REC process
with a document that has already gone through REC and thus needs minor
modifications (XHTML+RDFa) than a completely new, 60+ page document
(RDFa 1.1) with features that are still being worked out on the drawing
board.

We want to make sure that for those that are authoring RDFa in HTML
today, that there is a valid spec for them to do so... sooner than later.

-- manu

[1]

-- 
Manu Sporny (skype: msporny, twitter: manusporny)
President/CEO - Digital Bazaar, Inc.
blog: The Pirate Bay and Building an Equitable Culture
http://blog.digitalbazaar.com/2009/08/30/equitable-culture/
Received on Wednesday, 2 September 2009 17:28:49 UTC