Re: @rel syntax in RDFa (relevant to ISSUE-60 discussion), was: Using XMLNS in link/@rel

Henri:

I was going to write you a private reply, just telling you what I useful 
and well reasoned posted I found this to be.  If nothing else, and without 
necessarily predjudging your conclusion about RDFa, I (and I suspect many 
other readers) learned a lot about the details of HTML/XML interoperation 
from this.  So, thank you!

Then I read the following(I infer that the nested quote is in part from 
Julian, presumably in a part of the thread that did not make it to 
www-tag):

Henri Sivonen writes:

> > P.S., I realise that this involves at least three additional
> > communities, but the TAG seems like the logical place for the initial
> > discussion and eventual coordination of this issue.
> 
> Since Steven already CCed two of those three and Julian forwarded your 
> email to the third, I've CCed all three in addition to the TAG here.

With my TAG chair hat on, this leads me to invite all concerned to think 
about whether the TAG should do more about this right now.  I will be glad 
to hear suggestions from anyone involved, including other TAG members, as 
to what if anything would be constructive for us to undertake more 
formally.  Otherwise, we'll continue to monitor this thread and contribute 
as individuals. 

Note that the TAG did previously involve ourselves in reviewing drafts of 
the CURIE specification, and our formal feedback to the working group was 
given at [1]. Since use of CURIEs is at the core of the current 
controversy, let me clarify one aspect of the TAG's note [1], the 
introduction of which reads:

"First of all, we would like to make clear that we are overall supportive 
of the publication of this work, and we do not anticipate that dealing 
with any of the following concerns should greatly slow your progress."

I haven't doublechecked this with other members of the TAG, but I'm quite 
confident that in saying that the TAG did not intend to take a position 
pro or con as to whether CURIEs are on balance a good thing, or whether 
their use should or shouldn't be encouraged in particular circumstances 
such as with RDFa.  The above was intended specifically to signal, I 
think, that the TAG did not wish to impede the process of publishing the 
Recommendation documents being reviewed.

Noah

[1] http://lists.w3.org/Archives/Public/www-tag/2008Oct/0024.html

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------






        Henri Sivonen <hsivonen@iki.fi>
        Sent by: www-tag-request@w3.org
        02/27/2009 07:57 AM 
                 To: Julian Reschke <julian.reschke@gmx.de>, Mark 
Nottingham <mnot@mnot.net>
                 cc: HTMLWG WG <public-html@w3.org>, "www-tag@w3.org WG" 
<www-tag@w3.org>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>, 
public-xhtml2@w3.org, (bcc: Noah Mendelsohn/Cambridge/IBM)
                 Subject: Re: @rel syntax in RDFa (relevant to ISSUE-60 
discussion), was: Using  XMLNS in link/@rel


Mark Nottingham wrote:

> Creative Commons just released a new spec:
>  http://wiki.creativecommons.org/Ccplus
> that has markup in this form:
>  <a xmlns:cc="http://creativecommons.org/ns#"
> rel="cc:morePermissions" href="#agreement">below</a>
> (in HTML4, one assumes, since they don't specify XHTML, and this is
> what the vast majority of users will presume).

http://wiki.creativecommons.org/images/0/06/Ccplus-technical.pdf says 
"html". The syntax is not valid in any of HTML 2.0, HTML 3.2, HTML 
4.0, HTML 4.01 or HTML5 as currently drafted.

> However, it appears that they adopted this practice from RDFa;
>  http://www.w3.org/TR/rdfa-syntax/#relValues
> which, in turn, *does* rely upon XHTML.

Indeed, RDFa is not a REC over text/html.

> However, XHTML does *not*
> specify the @rel value as a QName (or CURIE, as RDFa assumes);
> 
http://www.w3.org/TR/2008/REC-xhtml-modularization-20081008/abstraction.html#dt_LinkTypes
>
> "Note that in a future version of this specification, the Working
> Group expects to evolve this type from a simple name to a Qualified
> Name (QName)."

In HTML5, as currently drafted, rel is a space character-separated 
list of tokens that are compared ASCII-case-insensitively. It is 
noteworthy that the token may look like URIs, although HTML5 
processing itself ascribes no URI semantics to tokens that look like 
URIs.

> So, that's an expectation, not a current specification.

It's not a current or drafted specification for text/html, either.

[...]
> A few observations and questions;
>
> 1) I'm more than happy to specify in the Link that in XHTML, a link
> rel value is indeed a QName, if XHTML chooses to take that position
> (although I believe a URI is a better fit than a QName here, as in
> most other places). Can we get a current reading from the XHTML world
> on this?

In XHTML5, as currently drafted, rel is a space character-separated 
list of tokens that are compared ASCII-case-insensitively. This 
matches current HTML 4.01 and XHTML 1.0 implementations.

> 2) However, it seems like RDFa is jumping the gun by assuming @rel is
> a CURIE right now. This is not promoting interoperablity or shared
> architecture, because no XHTML processor that isn't aware of RDFa can
> properly identify these link relations.

I agree.

> My preference would be an
> erratum to RDFa removing this syntax, replacing them with a self-
> contained identifier (i.e. a URI). Thoughts?

More generally, I think it would make sense to issue an erratum that 
replaces all CURIEs in RDFa with the corresponding full URIs, since 
this would both
  1) Remove the reliance on attributes spelled "xmlns:foo" which are 
special in XML but not special in text/html (as text/html parsing is 
currently implemented out there and drafted in HTML5).
  2) Avoid introducing a novel prefix-based indirection mechanism with 
many of the same problems that Namespaces in XML have been observed to 
have over the last decade.

Examples of problems:
http://lists.xml.org/archives/xml-dev/200502/msg00306.html
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6475032
http://dev.ctor.org/soap4r/ticket/179
http://sourceforge.net/tracker/?func=detail&atid=454391&aid=924041&group_id=48863

> 3) CC's adoption of *proposed* XHTML conventions from RDFa into HTML4
> via CURIEs further muddies the waters; xmlns has no meaning whatsoever
> in HTML4, so they're promoting bad practice there by circumventing the
> specified Profile mechanism. I find this aspect of this the most
> concerning, and it needs clarification (more colourful words come to
> mind, but I'll leave it there for now).

I also find the use of xmlns:foo the most concerning aspect, but not 
just because it has no special HTML 4.01 on the theoretical level but 
on the practical repercussions for software architecture.

I develop a text/html parser that implements the HTML5 parsing 
algorithm and targets five APIs for the application layer: JDK DOM 
Level 2, Java SAX2 in the namespace-aware mode, XOM, Web DOM (the one 
browsers expose via JS; targeted via Google Web Toolkit) and the 
internal content tree API of Gecko (nsINode/nsIContent; targeted via 
automated translation of the Java code into C++).

These are all namespace-aware APIs. (Note that DOM Level 1 and the DOM 
Level 1-ish Python minidom aren't namespace-aware and they are the 
APIs typically used to demonstrate RDFa interop.)

Gecko, WebKit and Presto use a namespace-aware DOM for both text/html 
and application/xhtml+xml. Thus, we can gain understanding of the 
implemented mapping of text/html into a namespace-aware representation 
from these implementations. Since attributes of the form xmlns:foo are 
not special in any way in HTML 4.01 (or 4.0, 3.2 or 2.0 for that 
matter), an attribute spelled "xmlns:foo" in text/html parses into 
["", "xmlns:foo"] as the [namespace, local] pair. (Note that the local 
name is not an XML 1.0 + Namespaces NCName.) For compatibility with 
the behavior of these existing browsers, HTML5, as drafted, specifies 
that "xmlns:foo" in text/html parses into ["", "xmlns:foo"].

Demo: http://hsivonen.iki.fi/test/moz/xmlns-dom.html

DOM Level 2 XML, on the other hand, represents an attribute spelled 
"xmlns:foo" in application/xhtml+xml as ["http://www.w3.org/2000/ 
xmlns/", "foo"].

Demo: http://hsivonen.iki.fi/test/moz/xmlns-dom.xhtml

Furthermore, SAX2 in the namespace-aware mode and XOM do not represent 
what are spelled "xmlns:foo" in XML as attributes at all in the API. 
Instead, there's dedicated API surface for exposing namespace mappings 
to the application layer.

If we use the explicit mapping of DOM Level 3 to Infoset, the mapping 
of XML onto Infoset and the mappings from XML into XOM or namespace- 
aware SAX2, we have to conclude that when a DOM-oriented spec talks 
about an attribute in the "http://www.w3.org/2000/xmlns/" namespace, 
the concept maps to the namespace mapping API surface of SAX2 and XOM 
and, on the other hand, when an attribute is not in the 
"http://www.w3.org/2000/xmlns/ 
" namespace according to a DOM-oriented spec, it doesn't map to the 
namespace mapping API surface of XOM and namespace-aware SAX2.

The above paragraph is relevant, because the dominant design of text/ 
html parsers for non-browser applications established by John Cowan's 
TagSoup and adopted by HTML5 parsers is that they expose an XML API so 
that the application-level code is written as if working with an XML 
parser parsing an equivalent XHTML 1.0 or XHTML5 file (for HTML 4.01 
and HTML5 respectively).

This design of sharing above-parser application-level code between 
text/html and application/xhtml+xml is also in use in Gecko, WebKit 
and (based on black-box guess) Presto.

The internal API of Gecko differs from the DOM slightly: The DOM has 
three datums: namespace URI, qname (aka. Level 1 node name) and local 
name. Gecko's internal API also has three datums but slightly 
differently: namespace URI, *prefix* and local name. None of these are 
string data types in Gecko. The namespace URI is interned into a 32- 
bit integer and prefix & local name are interned into a specific 
interned string type that cannot be used directly where string types 
can be used. It follows, that for any natively implemented feature, it 
would be highly undesirable to have to look 'inside' these values as 
strings as opposed to merely comparing pointers or integers.

I'm not suggesting that there were any foreseeable native 
implementation of RDFa-sensitive functionality in any Gecko-based 
browsers. However, I am suggesting that language design that would be 
a bad match for established browser internals is architecturally 
unsound design in case there's the slightest chance that the language 
might one day be browser-sensitive.

Going back to the design of exposing text/html as if it were XML: As I 
pointed out earlier, xmlns:foo in text/html parses, in existing 
browsers and in the HTML5 parsing algorithm as drafted today, into a 
[namespace, local] pair where the local part is not an NCName. This 
characteristic alone (i.e. without even considering the part that is 
spelled "xmlns") is enough to render the [namespace, local] pair 
unrepresentable in XML 1.0 + Namespaces.

This poses the following problems:
  1) A local name that is not an NCName cannot be serialized as XML 
1.0 in such a way that parsing the resulting XML document with a 
namespace-aware parser round-trips the non-NCName local name properly.
  2) Namespace-wise strictly correct XML tree implementations throw if 
you try to set an attribute that can't be serialized as XML 1.0 + 
Namespaces. (A demo that makes XOM throw is included below my 
signature.)
  3) Even if the API contract of an XML API could be violated and a 
local name that is impossible in XML 1.0 + Namespaces could be passed 
through, this representation would be *different* from the way an XML 
parser would expose an attribute spelled "xmlns:foo" though the same 
API. Thus, the application-layer code would have to differ for text/ 
html and application/xhtml+xml.

The options are thus:

  1) Letting the application-layer code differ for text/html and 
application/xhtml+xml (provided that you can make the infrastructure 
not to throw). This would violate the DOM Consistency design principle 
in HTML Design Principles. (For the general purpose of application- 
layer code reuse, "DOM" here should be understood to mean any API 
between the parser and application layers.) Experience with dealing 
with the lang vs. xml:lang issue should show that going down this road 
leads to divergent code paths in many places, which is bug-prone and 
bad software architecture.

  2) Changing text/html parsing to parse "xmlns:foo" into 
["http://www.w3.org/2000/xmlns/ 
", "foo"]. This would be inconsistent with the behavior of existing 
Gecko, WebKit and Presto releases.

  3) Changing RDFa not to use attributes spelled "xmlns:foo" in either 
text/html or application/xhtml+xml. (Failing to do this for 
application/xhtml+xml would still lead to the problem of different 
code paths in application-layer code.) This could be achieved with an 
erratum changing CURIEs to full URIs.

  4) Not using RDFa in text/html at all.

- -

Due to the above considerations, I think that a vocabulary that uses 
attributes spelled "xmlns:foo" on (X)HTML elements is in architectural 
error.

> P.S., I realise that this involves at least three additional
> communities, but the TAG seems like the logical place for the initial
> discussion and eventual coordination of this issue.

Since Steven already CCed two of those three and Julian forwarded your 
email to the third, I've CCed all three in addition to the TAG here.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

import nu.xom.Attribute;
import nu.xom.Element;

public class XomTest {
     public static void main(String[] args) {
         Element elt = new Element("html", "http://www.w3.org/1999/ 
xhtml");
         elt.addAttribute(new Attribute("xmlns:foo", "bar"));
     }
}

Received on Friday, 27 February 2009 19:37:45 UTC