- From: <noah_mendelsohn@us.ibm.com>
- Date: Fri, 27 Feb 2009 14:35:38 -0500
- To: Henri Sivonen <hsivonen@iki.fi>
- Cc: Julian Reschke <julian.reschke@gmx.de>, Mark Nottingham <mnot@mnot.net>, HTMLWG WG <public-html@w3.org>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>, public-xhtml2@w3.org, "www-tag@w3.org WG" <www-tag@w3.org>
Henri:
I was going to write you a private reply, just telling you what I useful
and well reasoned posted I found this to be. If nothing else, and without
necessarily predjudging your conclusion about RDFa, I (and I suspect many
other readers) learned a lot about the details of HTML/XML interoperation
from this. So, thank you!
Then I read the following(I infer that the nested quote is in part from
Julian, presumably in a part of the thread that did not make it to
www-tag):
Henri Sivonen writes:
> > P.S., I realise that this involves at least three additional
> > communities, but the TAG seems like the logical place for the initial
> > discussion and eventual coordination of this issue.
>
> Since Steven already CCed two of those three and Julian forwarded your
> email to the third, I've CCed all three in addition to the TAG here.
With my TAG chair hat on, this leads me to invite all concerned to think
about whether the TAG should do more about this right now. I will be glad
to hear suggestions from anyone involved, including other TAG members, as
to what if anything would be constructive for us to undertake more
formally. Otherwise, we'll continue to monitor this thread and contribute
as individuals.
Note that the TAG did previously involve ourselves in reviewing drafts of
the CURIE specification, and our formal feedback to the working group was
given at [1]. Since use of CURIEs is at the core of the current
controversy, let me clarify one aspect of the TAG's note [1], the
introduction of which reads:
"First of all, we would like to make clear that we are overall supportive
of the publication of this work, and we do not anticipate that dealing
with any of the following concerns should greatly slow your progress."
I haven't doublechecked this with other members of the TAG, but I'm quite
confident that in saying that the TAG did not intend to take a position
pro or con as to whether CURIEs are on balance a good thing, or whether
their use should or shouldn't be encouraged in particular circumstances
such as with RDFa. The above was intended specifically to signal, I
think, that the TAG did not wish to impede the process of publishing the
Recommendation documents being reviewed.
Noah
[1] http://lists.w3.org/Archives/Public/www-tag/2008Oct/0024.html
--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
Henri Sivonen <hsivonen@iki.fi>
Sent by: www-tag-request@w3.org
02/27/2009 07:57 AM
To: Julian Reschke <julian.reschke@gmx.de>, Mark
Nottingham <mnot@mnot.net>
cc: HTMLWG WG <public-html@w3.org>, "www-tag@w3.org WG"
<www-tag@w3.org>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>,
public-xhtml2@w3.org, (bcc: Noah Mendelsohn/Cambridge/IBM)
Subject: Re: @rel syntax in RDFa (relevant to ISSUE-60
discussion), was: Using XMLNS in link/@rel
Mark Nottingham wrote:
> Creative Commons just released a new spec:
> http://wiki.creativecommons.org/Ccplus
> that has markup in this form:
> <a xmlns:cc="http://creativecommons.org/ns#"
> rel="cc:morePermissions" href="#agreement">below</a>
> (in HTML4, one assumes, since they don't specify XHTML, and this is
> what the vast majority of users will presume).
http://wiki.creativecommons.org/images/0/06/Ccplus-technical.pdf says
"html". The syntax is not valid in any of HTML 2.0, HTML 3.2, HTML
4.0, HTML 4.01 or HTML5 as currently drafted.
> However, it appears that they adopted this practice from RDFa;
> http://www.w3.org/TR/rdfa-syntax/#relValues
> which, in turn, *does* rely upon XHTML.
Indeed, RDFa is not a REC over text/html.
> However, XHTML does *not*
> specify the @rel value as a QName (or CURIE, as RDFa assumes);
>
http://www.w3.org/TR/2008/REC-xhtml-modularization-20081008/abstraction.html#dt_LinkTypes
>
> "Note that in a future version of this specification, the Working
> Group expects to evolve this type from a simple name to a Qualified
> Name (QName)."
In HTML5, as currently drafted, rel is a space character-separated
list of tokens that are compared ASCII-case-insensitively. It is
noteworthy that the token may look like URIs, although HTML5
processing itself ascribes no URI semantics to tokens that look like
URIs.
> So, that's an expectation, not a current specification.
It's not a current or drafted specification for text/html, either.
[...]
> A few observations and questions;
>
> 1) I'm more than happy to specify in the Link that in XHTML, a link
> rel value is indeed a QName, if XHTML chooses to take that position
> (although I believe a URI is a better fit than a QName here, as in
> most other places). Can we get a current reading from the XHTML world
> on this?
In XHTML5, as currently drafted, rel is a space character-separated
list of tokens that are compared ASCII-case-insensitively. This
matches current HTML 4.01 and XHTML 1.0 implementations.
> 2) However, it seems like RDFa is jumping the gun by assuming @rel is
> a CURIE right now. This is not promoting interoperablity or shared
> architecture, because no XHTML processor that isn't aware of RDFa can
> properly identify these link relations.
I agree.
> My preference would be an
> erratum to RDFa removing this syntax, replacing them with a self-
> contained identifier (i.e. a URI). Thoughts?
More generally, I think it would make sense to issue an erratum that
replaces all CURIEs in RDFa with the corresponding full URIs, since
this would both
1) Remove the reliance on attributes spelled "xmlns:foo" which are
special in XML but not special in text/html (as text/html parsing is
currently implemented out there and drafted in HTML5).
2) Avoid introducing a novel prefix-based indirection mechanism with
many of the same problems that Namespaces in XML have been observed to
have over the last decade.
Examples of problems:
http://lists.xml.org/archives/xml-dev/200502/msg00306.html
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6475032
http://dev.ctor.org/soap4r/ticket/179
http://sourceforge.net/tracker/?func=detail&atid=454391&aid=924041&group_id=48863
> 3) CC's adoption of *proposed* XHTML conventions from RDFa into HTML4
> via CURIEs further muddies the waters; xmlns has no meaning whatsoever
> in HTML4, so they're promoting bad practice there by circumventing the
> specified Profile mechanism. I find this aspect of this the most
> concerning, and it needs clarification (more colourful words come to
> mind, but I'll leave it there for now).
I also find the use of xmlns:foo the most concerning aspect, but not
just because it has no special HTML 4.01 on the theoretical level but
on the practical repercussions for software architecture.
I develop a text/html parser that implements the HTML5 parsing
algorithm and targets five APIs for the application layer: JDK DOM
Level 2, Java SAX2 in the namespace-aware mode, XOM, Web DOM (the one
browsers expose via JS; targeted via Google Web Toolkit) and the
internal content tree API of Gecko (nsINode/nsIContent; targeted via
automated translation of the Java code into C++).
These are all namespace-aware APIs. (Note that DOM Level 1 and the DOM
Level 1-ish Python minidom aren't namespace-aware and they are the
APIs typically used to demonstrate RDFa interop.)
Gecko, WebKit and Presto use a namespace-aware DOM for both text/html
and application/xhtml+xml. Thus, we can gain understanding of the
implemented mapping of text/html into a namespace-aware representation
from these implementations. Since attributes of the form xmlns:foo are
not special in any way in HTML 4.01 (or 4.0, 3.2 or 2.0 for that
matter), an attribute spelled "xmlns:foo" in text/html parses into
["", "xmlns:foo"] as the [namespace, local] pair. (Note that the local
name is not an XML 1.0 + Namespaces NCName.) For compatibility with
the behavior of these existing browsers, HTML5, as drafted, specifies
that "xmlns:foo" in text/html parses into ["", "xmlns:foo"].
Demo: http://hsivonen.iki.fi/test/moz/xmlns-dom.html
DOM Level 2 XML, on the other hand, represents an attribute spelled
"xmlns:foo" in application/xhtml+xml as ["http://www.w3.org/2000/
xmlns/", "foo"].
Demo: http://hsivonen.iki.fi/test/moz/xmlns-dom.xhtml
Furthermore, SAX2 in the namespace-aware mode and XOM do not represent
what are spelled "xmlns:foo" in XML as attributes at all in the API.
Instead, there's dedicated API surface for exposing namespace mappings
to the application layer.
If we use the explicit mapping of DOM Level 3 to Infoset, the mapping
of XML onto Infoset and the mappings from XML into XOM or namespace-
aware SAX2, we have to conclude that when a DOM-oriented spec talks
about an attribute in the "http://www.w3.org/2000/xmlns/" namespace,
the concept maps to the namespace mapping API surface of SAX2 and XOM
and, on the other hand, when an attribute is not in the
"http://www.w3.org/2000/xmlns/
" namespace according to a DOM-oriented spec, it doesn't map to the
namespace mapping API surface of XOM and namespace-aware SAX2.
The above paragraph is relevant, because the dominant design of text/
html parsers for non-browser applications established by John Cowan's
TagSoup and adopted by HTML5 parsers is that they expose an XML API so
that the application-level code is written as if working with an XML
parser parsing an equivalent XHTML 1.0 or XHTML5 file (for HTML 4.01
and HTML5 respectively).
This design of sharing above-parser application-level code between
text/html and application/xhtml+xml is also in use in Gecko, WebKit
and (based on black-box guess) Presto.
The internal API of Gecko differs from the DOM slightly: The DOM has
three datums: namespace URI, qname (aka. Level 1 node name) and local
name. Gecko's internal API also has three datums but slightly
differently: namespace URI, *prefix* and local name. None of these are
string data types in Gecko. The namespace URI is interned into a 32-
bit integer and prefix & local name are interned into a specific
interned string type that cannot be used directly where string types
can be used. It follows, that for any natively implemented feature, it
would be highly undesirable to have to look 'inside' these values as
strings as opposed to merely comparing pointers or integers.
I'm not suggesting that there were any foreseeable native
implementation of RDFa-sensitive functionality in any Gecko-based
browsers. However, I am suggesting that language design that would be
a bad match for established browser internals is architecturally
unsound design in case there's the slightest chance that the language
might one day be browser-sensitive.
Going back to the design of exposing text/html as if it were XML: As I
pointed out earlier, xmlns:foo in text/html parses, in existing
browsers and in the HTML5 parsing algorithm as drafted today, into a
[namespace, local] pair where the local part is not an NCName. This
characteristic alone (i.e. without even considering the part that is
spelled "xmlns") is enough to render the [namespace, local] pair
unrepresentable in XML 1.0 + Namespaces.
This poses the following problems:
1) A local name that is not an NCName cannot be serialized as XML
1.0 in such a way that parsing the resulting XML document with a
namespace-aware parser round-trips the non-NCName local name properly.
2) Namespace-wise strictly correct XML tree implementations throw if
you try to set an attribute that can't be serialized as XML 1.0 +
Namespaces. (A demo that makes XOM throw is included below my
signature.)
3) Even if the API contract of an XML API could be violated and a
local name that is impossible in XML 1.0 + Namespaces could be passed
through, this representation would be *different* from the way an XML
parser would expose an attribute spelled "xmlns:foo" though the same
API. Thus, the application-layer code would have to differ for text/
html and application/xhtml+xml.
The options are thus:
1) Letting the application-layer code differ for text/html and
application/xhtml+xml (provided that you can make the infrastructure
not to throw). This would violate the DOM Consistency design principle
in HTML Design Principles. (For the general purpose of application-
layer code reuse, "DOM" here should be understood to mean any API
between the parser and application layers.) Experience with dealing
with the lang vs. xml:lang issue should show that going down this road
leads to divergent code paths in many places, which is bug-prone and
bad software architecture.
2) Changing text/html parsing to parse "xmlns:foo" into
["http://www.w3.org/2000/xmlns/
", "foo"]. This would be inconsistent with the behavior of existing
Gecko, WebKit and Presto releases.
3) Changing RDFa not to use attributes spelled "xmlns:foo" in either
text/html or application/xhtml+xml. (Failing to do this for
application/xhtml+xml would still lead to the problem of different
code paths in application-layer code.) This could be achieved with an
erratum changing CURIEs to full URIs.
4) Not using RDFa in text/html at all.
- -
Due to the above considerations, I think that a vocabulary that uses
attributes spelled "xmlns:foo" on (X)HTML elements is in architectural
error.
> P.S., I realise that this involves at least three additional
> communities, but the TAG seems like the logical place for the initial
> discussion and eventual coordination of this issue.
Since Steven already CCed two of those three and Julian forwarded your
email to the third, I've CCed all three in addition to the TAG here.
--
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
import nu.xom.Attribute;
import nu.xom.Element;
public class XomTest {
public static void main(String[] args) {
Element elt = new Element("html", "http://www.w3.org/1999/
xhtml");
elt.addAttribute(new Attribute("xmlns:foo", "bar"));
}
}
Received on Friday, 27 February 2009 19:36:36 UTC