Re: Request to publish HTML+RDFa (draft 3) as FPWD from Jonas Sicking on 2009-09-22 (public-rdf-in-xhtml-tf@w3.org from September 2009)

From: Jonas Sicking <jonas@sicking.cc>
Date: Tue, 22 Sep 2009 11:14:32 -0700
To: Shane McCarron <shane@aptest.com>
Cc: Henri Sivonen <hsivonen@iki.fi>, Mark Birbeck <mark.birbeck@webbackplane.com>, HTMLWG WG <public-html@w3.org>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>
Message-ID: <63df84f0909221114k70416f6apfe48a081c05e2951@mail.gmail.com>
On Tue, Sep 22, 2009 at 9:48 AM, Shane McCarron <shane@aptest.com> wrote:
>
>
> Henri Sivonen wrote:
>>
>> How would you characterize the ongoing denial that the syntax
>> xmlns:p="http://example.com/" is problematic?
>> http://lists.w3.org/Archives/Public/public-html/2009Sep/0843.html
>> http://lists.w3.org/Archives/Public/public-html/2009Sep/0790.html
>>
>> How can the problem be meaningfully resolved when you aren't even
>> admitting there's a problem to discuss?
>
> Because, Henri, we don't grok the problem.  I am slowly beginning to
> understand that this might be due to our talking past one another.  The W3C
> has a Recommendation that defines the Syntax of RDFa *input* and the
> extraction of RDF triples from that *input*.  It defines this as an
> extension to XHTML.   XHTML Modularization provides the structure for a host
> language.  The Recommendation is carefully vague about how that input is
> parsed because that is properly the job of the host language.
>
> In the RDFa in HTML document, Manu has deferred the syntax and extraction to
> the existing Recommendation, and has deferred the parsing of the input to
> the host language specification (HTML5).
> Jonas, Maciej, and you have pointed out that (my translation here) since it
> is possible for the *input* to be altered on its way to the code that would
> perform the extraction, it is important we define the rules for that
> extraction more tightly.  In particular, it is possible that the syntax of
> an 'xmlns:' declaration attribute may not be readily available.  It is also
> possible that, depending on the form of the *input* document, the
> declaration attribute may manifest in different ways on its way through the
> toolchain (e.g., showing up as a literal 'xmlns:foo' in HTML mode, and as
> 'foo' in the XMLNS namespace in XML mode).  However, I don't think *anyone*
> has said that the declaration will not be present in some form if it was
> present in the original *input*.  And that's how the processing rules are
> written.
>
> Section 5.5 defines the way in which prefix mappings are defined and
> remembered by an implementation, not how they are pulled from the data
> stream by that implementation.  To the RDFa Task Force, these are
> implementation details.  Depending upon your implementation strategy and
> environment, you will need to find the things the RDFa extraction process
> cares about, and act upon them to generate the triples.  We really, really,
> really don't care how you do this.  What we care about is that each engine
> emits the same triples in the end.  That's why there is a test suite, and
> its why there were lots of independent implementations with completely
> different strategies long before the specification was complete.
>
> Regardless, I agree there is room to tighten the language to ensure that
> implementors have the proper guidance, and that edge conditions, even
> pathological ones, have clear, consistent rules.  I have proposed that we
> augment the text in RDFa Syntax section 5.5 step 2 to directly address this
> problem, and am updating my proposed errata text now.  I hope that, when
> that is ready, you will continue to help by letting us know if it satisfies
> your objections.

I would say there are two separate things that are missing:

The most substantial one is how to do prefix mappings in a DOM or a
HTML document. Prefix mapping is currently defined using the
Namespaces in XML recommendation. However this recommendation only
defines how prefix mappings are done in a serialized XML document. I
hope we can all agree that neither DOMs (an in-memory datastructure)
or HTML documents are not XML documents.

For example, if I have a DOM and I want to do map the prefix "foo",
which of the following algorithms should I use:
1. Call Node.lookupNamespacePrefix as defined by DOM Level 3 using
"foo" as the prefix argument.
2. Walk up the parent chain looking for an element with an attribute
with localName "foo" and namespace "http://www.w3.org/2000/xmlns/",
and then use the value of that attribute.
3. Walk up the parent chain looking for an element with an attribute
with tagName "xmlns:foo", and then use the value of that attribute.
4. Walk up the parent chain looking for either the attribute in 2 or
3, and if both are specified use some prioritization order.
5. Walk up the parent chain looking for either the attribute in 2 if
the document was parsed as XHTML, or attribute in 3 if the document
was parsed as HTML.
6. Do something else?

Any of 1 to 5 (as well as possibly 6) seems equally valid to me, and
as far as I can tell there really is no specified answer.


Likewise, how do I find out what the 'cc' prefix is mapped for the <a>
element in the following serialized HTML document?

<!DOCTYPE html>
<html xmlns:cc="http://example.org/myNamespace#">
<head><title>HTML+RDFa example</title></head>
<body>
<table xmlns:cc="http://creativecommons.org/ns#">
<a rel="cc:license"
 href="http://creativecommons.org/licenses/by-nc-nd/3.0/">
 Creative Commons License
</a>
<tr><td>Example table</td></tr>
</table>
</body>
</html>


As far as I can see the Namespaces in XML recommendation can't help me
in either situation. For the DOM it doesn't deal with in-memory data
models at all, rather it only deals with serialized XML documents. For
the HTML document, if I try to apply XML processing as prescribed by
Namespaces in XML, I conclude that 'cc' maps to
"http://creativecommons.org/ns#", when in reality suspect it should
map to "http://example.org/myNamespace#".


The second, IMHO lesser problem, is that no processing is defined
anywhere for non XML documents. Even as far as reading a rel attribute
out of a document is only defined for XML documents. All normative
requirements refer to XML processing and thus applies no more to HTML
documents than to GIF images. I consider this less of a problem
because I think it's fairly obvious that data is read out of a DOM
using the getAttribute function. And from a HTML document by first
parsing it to a DOM and then calling getAttribute. But it really
should be formally defined somewhere.

/ Jonas
Received on Tuesday, 22 September 2009 18:15:55 UTC