Re: @rel syntax in RDFa (relevant to ISSUE-60 discussion), was: Using XMLNS in link/@rel from Mark Nottingham on 2009-02-27 (public-rdf-in-xhtml-tf@w3.org from February 2009)

From: Mark Nottingham <mnot@mnot.net>
Date: Sat, 28 Feb 2009 10:07:35 +1100
To: noah_mendelsohn@us.ibm.com
Cc: Henri Sivonen <hsivonen@iki.fi>, Julian Reschke <julian.reschke@gmx.de>, HTMLWG WG <public-html@w3.org>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>, public-xhtml2@w3.org, "www-tag@w3.org WG" <www-tag@w3.org>
Message-Id: <B1EB99A4-FEE5-4DA7-8812-6BA534CEE86F@mnot.net>
Hi Noah,

It seems to me that indeed the TAG should be actively involved, since  
the primary issue here is about architectural coordination. If this  
were just an issue between two WGs or communities, that would be a  
straightforward thing, but there are too many stakeholders here.

I'm also honestly quite surprised to find that the W3C has Recommended  
multiple versions of XHTML that have different syntax for the same  
attribute; this doesn't seem like good practice, and may deserve  
looking into.

Cheers,


On 28/02/2009, at 6:35 AM, noah_mendelsohn@us.ibm.com wrote:

> Henri:
>
> I was going to write you a private reply, just telling you what I  
> useful
> and well reasoned posted I found this to be.  If nothing else, and  
> without
> necessarily predjudging your conclusion about RDFa, I (and I suspect  
> many
> other readers) learned a lot about the details of HTML/XML  
> interoperation
> from this.  So, thank you!
>
> Then I read the following(I infer that the nested quote is in part  
> from
> Julian, presumably in a part of the thread that did not make it to
> www-tag):
>
> Henri Sivonen writes:
>
>>> P.S., I realise that this involves at least three additional
>>> communities, but the TAG seems like the logical place for the  
>>> initial
>>> discussion and eventual coordination of this issue.
>>
>> Since Steven already CCed two of those three and Julian forwarded  
>> your
>> email to the third, I've CCed all three in addition to the TAG here.
>
> With my TAG chair hat on, this leads me to invite all concerned to  
> think
> about whether the TAG should do more about this right now.  I will  
> be glad
> to hear suggestions from anyone involved, including other TAG  
> members, as
> to what if anything would be constructive for us to undertake more
> formally.  Otherwise, we'll continue to monitor this thread and  
> contribute
> as individuals.
>
> Note that the TAG did previously involve ourselves in reviewing  
> drafts of
> the CURIE specification, and our formal feedback to the working  
> group was
> given at [1]. Since use of CURIEs is at the core of the current
> controversy, let me clarify one aspect of the TAG's note [1], the
> introduction of which reads:
>
> "First of all, we would like to make clear that we are overall  
> supportive
> of the publication of this work, and we do not anticipate that dealing
> with any of the following concerns should greatly slow your progress."
>
> I haven't doublechecked this with other members of the TAG, but I'm  
> quite
> confident that in saying that the TAG did not intend to take a  
> position
> pro or con as to whether CURIEs are on balance a good thing, or  
> whether
> their use should or shouldn't be encouraged in particular  
> circumstances
> such as with RDFa.  The above was intended specifically to signal, I
> think, that the TAG did not wish to impede the process of publishing  
> the
> Recommendation documents being reviewed.
>
> Noah
>
> [1] http://lists.w3.org/Archives/Public/www-tag/2008Oct/0024.html
>
> --------------------------------------
> Noah Mendelsohn
> IBM Corporation
> One Rogers Street
> Cambridge, MA 02142
> 1-617-693-4036
> --------------------------------------
>
>
>
>
>
>
>        Henri Sivonen <hsivonen@iki.fi>
>        Sent by: www-tag-request@w3.org
>        02/27/2009 07:57 AM
>                 To: Julian Reschke <julian.reschke@gmx.de>, Mark
> Nottingham <mnot@mnot.net>
>                 cc: HTMLWG WG <public-html@w3.org>, "www-tag@w3.org  
> WG"
> <www-tag@w3.org>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>,
> public-xhtml2@w3.org, (bcc: Noah Mendelsohn/Cambridge/IBM)
>                 Subject: Re: @rel syntax in RDFa (relevant to ISSUE-60
> discussion), was: Using  XMLNS in link/@rel
>
>
> Mark Nottingham wrote:
>
>> Creative Commons just released a new spec:
>> http://wiki.creativecommons.org/Ccplus
>> that has markup in this form:
>> <a xmlns:cc="http://creativecommons.org/ns#"
>> rel="cc:morePermissions" href="#agreement">below</a>
>> (in HTML4, one assumes, since they don't specify XHTML, and this is
>> what the vast majority of users will presume).
>
> http://wiki.creativecommons.org/images/0/06/Ccplus-technical.pdf says
> "html". The syntax is not valid in any of HTML 2.0, HTML 3.2, HTML
> 4.0, HTML 4.01 or HTML5 as currently drafted.
>
>> However, it appears that they adopted this practice from RDFa;
>> http://www.w3.org/TR/rdfa-syntax/#relValues
>> which, in turn, *does* rely upon XHTML.
>
> Indeed, RDFa is not a REC over text/html.
>
>> However, XHTML does *not*
>> specify the @rel value as a QName (or CURIE, as RDFa assumes);
>>
> http://www.w3.org/TR/2008/REC-xhtml-modularization-20081008/abstraction.html#dt_LinkTypes
>>
>> "Note that in a future version of this specification, the Working
>> Group expects to evolve this type from a simple name to a Qualified
>> Name (QName)."
>
> In HTML5, as currently drafted, rel is a space character-separated
> list of tokens that are compared ASCII-case-insensitively. It is
> noteworthy that the token may look like URIs, although HTML5
> processing itself ascribes no URI semantics to tokens that look like
> URIs.
>
>> So, that's an expectation, not a current specification.
>
> It's not a current or drafted specification for text/html, either.
>
> [...]
>> A few observations and questions;
>>
>> 1) I'm more than happy to specify in the Link that in XHTML, a link
>> rel value is indeed a QName, if XHTML chooses to take that position
>> (although I believe a URI is a better fit than a QName here, as in
>> most other places). Can we get a current reading from the XHTML world
>> on this?
>
> In XHTML5, as currently drafted, rel is a space character-separated
> list of tokens that are compared ASCII-case-insensitively. This
> matches current HTML 4.01 and XHTML 1.0 implementations.
>
>> 2) However, it seems like RDFa is jumping the gun by assuming @rel is
>> a CURIE right now. This is not promoting interoperablity or shared
>> architecture, because no XHTML processor that isn't aware of RDFa can
>> properly identify these link relations.
>
> I agree.
>
>> My preference would be an
>> erratum to RDFa removing this syntax, replacing them with a self-
>> contained identifier (i.e. a URI). Thoughts?
>
> More generally, I think it would make sense to issue an erratum that
> replaces all CURIEs in RDFa with the corresponding full URIs, since
> this would both
>  1) Remove the reliance on attributes spelled "xmlns:foo" which are
> special in XML but not special in text/html (as text/html parsing is
> currently implemented out there and drafted in HTML5).
>  2) Avoid introducing a novel prefix-based indirection mechanism with
> many of the same problems that Namespaces in XML have been observed to
> have over the last decade.
>
> Examples of problems:
> http://lists.xml.org/archives/xml-dev/200502/msg00306.html
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6475032
> http://dev.ctor.org/soap4r/ticket/179
> http://sourceforge.net/tracker/?func=detail&atid=454391&aid=924041&group_id=48863
>
>> 3) CC's adoption of *proposed* XHTML conventions from RDFa into HTML4
>> via CURIEs further muddies the waters; xmlns has no meaning  
>> whatsoever
>> in HTML4, so they're promoting bad practice there by circumventing  
>> the
>> specified Profile mechanism. I find this aspect of this the most
>> concerning, and it needs clarification (more colourful words come to
>> mind, but I'll leave it there for now).
>
> I also find the use of xmlns:foo the most concerning aspect, but not
> just because it has no special HTML 4.01 on the theoretical level but
> on the practical repercussions for software architecture.
>
> I develop a text/html parser that implements the HTML5 parsing
> algorithm and targets five APIs for the application layer: JDK DOM
> Level 2, Java SAX2 in the namespace-aware mode, XOM, Web DOM (the one
> browsers expose via JS; targeted via Google Web Toolkit) and the
> internal content tree API of Gecko (nsINode/nsIContent; targeted via
> automated translation of the Java code into C++).
>
> These are all namespace-aware APIs. (Note that DOM Level 1 and the DOM
> Level 1-ish Python minidom aren't namespace-aware and they are the
> APIs typically used to demonstrate RDFa interop.)
>
> Gecko, WebKit and Presto use a namespace-aware DOM for both text/html
> and application/xhtml+xml. Thus, we can gain understanding of the
> implemented mapping of text/html into a namespace-aware representation
> from these implementations. Since attributes of the form xmlns:foo are
> not special in any way in HTML 4.01 (or 4.0, 3.2 or 2.0 for that
> matter), an attribute spelled "xmlns:foo" in text/html parses into
> ["", "xmlns:foo"] as the [namespace, local] pair. (Note that the local
> name is not an XML 1.0 + Namespaces NCName.) For compatibility with
> the behavior of these existing browsers, HTML5, as drafted, specifies
> that "xmlns:foo" in text/html parses into ["", "xmlns:foo"].
>
> Demo: http://hsivonen.iki.fi/test/moz/xmlns-dom.html
>
> DOM Level 2 XML, on the other hand, represents an attribute spelled
> "xmlns:foo" in application/xhtml+xml as ["http://www.w3.org/2000/
> xmlns/", "foo"].
>
> Demo: http://hsivonen.iki.fi/test/moz/xmlns-dom.xhtml
>
> Furthermore, SAX2 in the namespace-aware mode and XOM do not represent
> what are spelled "xmlns:foo" in XML as attributes at all in the API.
> Instead, there's dedicated API surface for exposing namespace mappings
> to the application layer.
>
> If we use the explicit mapping of DOM Level 3 to Infoset, the mapping
> of XML onto Infoset and the mappings from XML into XOM or namespace-
> aware SAX2, we have to conclude that when a DOM-oriented spec talks
> about an attribute in the "http://www.w3.org/2000/xmlns/" namespace,
> the concept maps to the namespace mapping API surface of SAX2 and XOM
> and, on the other hand, when an attribute is not in the
> "http://www.w3.org/2000/xmlns/
> " namespace according to a DOM-oriented spec, it doesn't map to the
> namespace mapping API surface of XOM and namespace-aware SAX2.
>
> The above paragraph is relevant, because the dominant design of text/
> html parsers for non-browser applications established by John Cowan's
> TagSoup and adopted by HTML5 parsers is that they expose an XML API so
> that the application-level code is written as if working with an XML
> parser parsing an equivalent XHTML 1.0 or XHTML5 file (for HTML 4.01
> and HTML5 respectively).
>
> This design of sharing above-parser application-level code between
> text/html and application/xhtml+xml is also in use in Gecko, WebKit
> and (based on black-box guess) Presto.
>
> The internal API of Gecko differs from the DOM slightly: The DOM has
> three datums: namespace URI, qname (aka. Level 1 node name) and local
> name. Gecko's internal API also has three datums but slightly
> differently: namespace URI, *prefix* and local name. None of these are
> string data types in Gecko. The namespace URI is interned into a 32-
> bit integer and prefix & local name are interned into a specific
> interned string type that cannot be used directly where string types
> can be used. It follows, that for any natively implemented feature, it
> would be highly undesirable to have to look 'inside' these values as
> strings as opposed to merely comparing pointers or integers.
>
> I'm not suggesting that there were any foreseeable native
> implementation of RDFa-sensitive functionality in any Gecko-based
> browsers. However, I am suggesting that language design that would be
> a bad match for established browser internals is architecturally
> unsound design in case there's the slightest chance that the language
> might one day be browser-sensitive.
>
> Going back to the design of exposing text/html as if it were XML: As I
> pointed out earlier, xmlns:foo in text/html parses, in existing
> browsers and in the HTML5 parsing algorithm as drafted today, into a
> [namespace, local] pair where the local part is not an NCName. This
> characteristic alone (i.e. without even considering the part that is
> spelled "xmlns") is enough to render the [namespace, local] pair
> unrepresentable in XML 1.0 + Namespaces.
>
> This poses the following problems:
>  1) A local name that is not an NCName cannot be serialized as XML
> 1.0 in such a way that parsing the resulting XML document with a
> namespace-aware parser round-trips the non-NCName local name properly.
>  2) Namespace-wise strictly correct XML tree implementations throw if
> you try to set an attribute that can't be serialized as XML 1.0 +
> Namespaces. (A demo that makes XOM throw is included below my
> signature.)
>  3) Even if the API contract of an XML API could be violated and a
> local name that is impossible in XML 1.0 + Namespaces could be passed
> through, this representation would be *different* from the way an XML
> parser would expose an attribute spelled "xmlns:foo" though the same
> API. Thus, the application-layer code would have to differ for text/
> html and application/xhtml+xml.
>
> The options are thus:
>
>  1) Letting the application-layer code differ for text/html and
> application/xhtml+xml (provided that you can make the infrastructure
> not to throw). This would violate the DOM Consistency design principle
> in HTML Design Principles. (For the general purpose of application-
> layer code reuse, "DOM" here should be understood to mean any API
> between the parser and application layers.) Experience with dealing
> with the lang vs. xml:lang issue should show that going down this road
> leads to divergent code paths in many places, which is bug-prone and
> bad software architecture.
>
>  2) Changing text/html parsing to parse "xmlns:foo" into
> ["http://www.w3.org/2000/xmlns/
> ", "foo"]. This would be inconsistent with the behavior of existing
> Gecko, WebKit and Presto releases.
>
>  3) Changing RDFa not to use attributes spelled "xmlns:foo" in either
> text/html or application/xhtml+xml. (Failing to do this for
> application/xhtml+xml would still lead to the problem of different
> code paths in application-layer code.) This could be achieved with an
> erratum changing CURIEs to full URIs.
>
>  4) Not using RDFa in text/html at all.
>
> - -
>
> Due to the above considerations, I think that a vocabulary that uses
> attributes spelled "xmlns:foo" on (X)HTML elements is in architectural
> error.
>
>> P.S., I realise that this involves at least three additional
>> communities, but the TAG seems like the logical place for the initial
>> discussion and eventual coordination of this issue.
>
> Since Steven already CCed two of those three and Julian forwarded your
> email to the third, I've CCed all three in addition to the TAG here.
>
> -- 
> Henri Sivonen
> hsivonen@iki.fi
> http://hsivonen.iki.fi/
>
> import nu.xom.Attribute;
> import nu.xom.Element;
>
> public class XomTest {
>     public static void main(String[] args) {
>         Element elt = new Element("html", "http://www.w3.org/1999/
> xhtml");
>         elt.addAttribute(new Attribute("xmlns:foo", "bar"));
>     }
> }
>
>
>
>


--
Mark Nottingham     http://www.mnot.net/
Received on Friday, 27 February 2009 23:10:26 UTC