A Response to Comments from Ben Adida on 2006-08-01 (public-rdf-in-xhtml-tf@w3.org from August 2006)

From: Ben Adida <ben@mit.edu>
Date: Tue, 01 Aug 2006 09:06:58 -0400
To: Bjoern Hoehrmann <derhoermi@gmx.net>
CC: public-rdf-in-xhtml-tf@w3.org, w3c-html-wg@w3.org, public-swbp-wg@w3.org
Message-ID: <44CF51F2.4070902@mit.edu>
Bjoern,

A while ago (18 months or so) [1], you sent an email with concerns
regarding RDFa (previously written RDF/A). Though there have been some
responses to your concerns in the past [2], we thought it would be
useful to completely review the issues you brought up now that the XHTML
and RDFa documents are maturing. RDFa has greatly evolved as a result of
your comments and many others. I hope that this evolution, as described
in the comments below, addresses your concerns.

> Dear HTML Working Group,
> 
> I've read http://www.formsplayer.com/notes/rdf-a.html#div158715232 
> and the motivation of this work is still not clear to me.

This work began in the HTML Working Group in late 2003, starting with a
note by Mark Birbeck [4]. In 2004, Mark and Steven presented a paper on
this topic [5]. Around the same time (early 2004), the W3C tech plenary
in Cannes included a joint session between the then-new SWBPD WG and the
HTML WG, where Mark presented his proposal work and Dan Connolly
presented GRDDL [6].

As a result of this meeting, and as a result of new customer demand (in
particular from Creative Commons, which I represent), the RDF-in-HTML
task force was revived in late 2004 with updated goals and requirements
[7,8].

> The section states the following:
> 
> [...] RDF/XML [58][RDF-SYNTAX] provides sufficient flexibility to
> represent all of the abstract concepts in RDF [59][RDF-CONCEPTS].
> However, it presents two challenges; first it is difficult or
> impossible to validate documents that contain RDF/XML using XML
> Schemas or DTD's, which makes it difficult to import RDF/XML into
> other markup languages. [...]
> 
> "RDF/A Syntax" does not work with DTDs either as it requires to have
> XML Namespace declarations. For XML Schema, I do not really see any
> problem that would be relevant.

The point is that there is no W3C technology that allows XHTML with
RDF/XML to be validated. Although XHTML Modularization includes
techniques to make fixed modules of elements from other namespaces
validatable with DTDs, the statement you quote indicates the
completeness of the problem: there is not a single way to include
RDF/XML inside XHTML1, not via XML schema, not via DTDs.

Note that, even if this were possible, it would fail the requirements
cited in the above motivations, namely the idea that a clickable link
can be made into an RDF statement without fully duplicating the
underlying data.

> Validating "RDF/A Syntax" using XML Schema only works in so far as
> that the artifical containers (meta, link, etc.) can be validated,
> their content however can't. I thus fail to see how RDF/A is
> considered an improvement in this regard.

It is not our goal to do semantic validation via XML schema. XML schema
provides *syntactic* validation, and we expect that OWL will provide the
semantic validation for the RDFa triples. The same approach is taken
with RDF/XML and N3: semantic validation is left to OWL.

> [...] The second challenge is that the syntax of RDF/XML is too
> unwieldy for use as a mechanism for adding metadata to a document
> about the document itself. [...]
> 
> That might be true, but the RDF/A syntax is much more complicated to
> me than using the widely understood RDF/XML syntax. There is no
> obvious way in which RDF/A can be used without understanding a whole
> lot of specs.

This is largely a matter of taste. We have aimed RDFa at HTML authors,
and we expect that they will find the RDFa syntax much clearer than RDF/XML.

Certainly, your comment at the time made sense, but we believe the
complexity has been addressed.

> [...] The resolution is usually to 'hard-wire' attributes directly
> into the XML language, to represent specific concepts. For example,
> in XHTML 1.1 and HTML there is a cite attribute. The attribute allows
> an author to add information to a document to indicate the origin of
> a quote. The following example comes from [HTML], although it has
> been reformatted as XHTML [XHTML]:
> 
> [...]
> 
> The problem here is that we have had to add a specific attribute to 
> designate citation, and further, both the browser and some metadata 
> processor need to have knowledge of this attribute, and its position 
> within the mark-up. [...]
> 
> Here I fail again to see how this problem is solved by RDF/A, in
> order to do something meaningful with the metadata the browser would
> still need to have knowledge about whatever replaces the cite
> attribute, the same goes for the "metadata processor" if it does more
> than generating "triples" from the document. The RDF/A proposal in
> fact makes things much worse as there is no obvious way for authors
> to encode specific information and no obvious way for implementers to
> implement something meaningful.

It seems you are worried mostly about RDF, not so much about its
serialization as RDFa. The goal of RDFa is to enable the extraction of
triples from an HTML document with a generic parser. How those triples
are interpreted by the client application is not within the scope of
RDFa, nor any other RDF serialization, nor even RDF itself. The goal is
to have a framework for expressing the metadata. What a client
application chooses to do with this metadata is not meant to be
specified here.

Furthermore, the RDFa syntax means that you can extend the metadata
properties expressed in an XHTML document without changing the schema.

> It further reduces the author's chances to make use of the meta data 
> himself e.g. by using sXBL, XSLT, scripting or CSS for the specific
> meta data.

Let's separate those issues. You can certainly process the triples and
convert them with XSLT (using GRDDL). Again, though, if you want to do
something meaningful, you have to dig into the semantics. If you want an
extensible metadata language, you have to assume that the parser cannot
know ahead of time every possible statement that might be made.

With regards to the second issue - CSS-, comments below.

> With
>
>   <blockquote cite="...">...
>
> it is easy to write a style sheet like
>
>   @media print
>   {
>     blockquote[cite]:after { content: "(Source: " attr(cite) ")" }
>   }
>
> With
>
>   <blockquote xmlns:dc="...">
>     <link rel="dc:source" href="..." />
>     ...
>
> that's pretty much impossible. CSS does not provide means to match on
> QName attribute values (which would be required to select the relevant
> <link> element) and at least with CSS < 3.0 it is neither possible to
> place the value of the href attribute after the end of blockquote.
> With CSS 3.0 this might become possible,
>
>   @media print
>   {
>    blockquote > link /* ... */ { string-set: source attr(href) }
>    blockquote[cite]::after { content: "(Source: " string(source) ")" }
>   }
>
> (If this works at all, if string(source) evaluates to an empty string
> if there is no link child element and the content property value thus
> does not fail it would render a broken "(Source: )" which is
> undesirable).
> 
> Inventing some quick ad-hoc syntax to match QName attribute values, a
>  new pseudo-class :qname-attr(name, namespace-prefix, value), this
> might then become
>
>   @namespace dc url( http://purl.org/dc/elements/1.1/ );
>   @media print
>   {
>     blockquote > link[href]:qname-attr(rel, dc, source)
>     {
>       string-set: source attr(href)
>     }
>     blockquote::after { content: "(Source: " string(source) ")" }
>   }

You can still use <cite> in XHTML2. Thus, if you don't care about RDF
and want to style a <cite>, you can use the same XHTML and CSS as before.

CSS may eventually develop the ability to style HTML elements according
to the RDF triples they express. Whether CSS chooses to make the styling
of RDFa-qualified elements part of its scope is, of course, not part of
our scope.

> And the style sheet would not work for other documents that use 
> something different than dc:source for the same purpose. This is 
> obviously much more complicated than the alternate solution which is
> very odd for something set out to simplify things.

Again, whether and how CSS chooses to style RDFa-qualified elements is
not part of our scope.

I want to stress that our goal isn't to simplify stylesheets, it is to
enable, as simply as possible, the expression of RDF in HTML.

Once again, to summarize:

1) if you don't care about embedding semantic metadata, don't use RDFa
and you can go on using CSS in any way you like

2) if you *do* care about embedding semantic metadata and want to use
CSS, you still can, of course. CSS may want to one day undertake
selectability by RDF, but that's outside the scope of this discussion.

Nothing in RDFa creates a new problem here. It's only opening the door
to RDF in HTML.

> [...] Whilst this approach gives unlimited flexibility, it has not
> been widely adopted outside the RDF community. It is certainly more 
> difficult for an SVG or HTML author to learn, and so their documents 
> tend to contain little 'extra' metadata. [...]
> 
> This is obviously wrong. First, HTML does not allow to embed RDF/XML,
>  neither does any XHTML Recommendation so far, so authors cannot use 
> the alternate syntax which means the complexity of the approach does 
> not hinder its adoption, it unavailability does.

This is dependent on whether you see the world through standards-colored
glasses or through practice-colored ones. In practice, people have been
including RDF in all sorts of strange ways, none of which is truly
satisfying. The need is there. It is being addressed in non-standard,
ugly ways. We are trying to address it with a real standard, in a
reasonably pretty way.

> Second, the vast majority of HTML authors is not looking for funny 
> ways to bloat their documents with "'extra' metadata". Adding such
> data required additional effort for which there need to be good
> reasons to convince them to do it. HTML authors don't add this meta
> data because there are no such good reasons as the user agents
> relevant to them (browsers and search engines) do not make use of
> such data. I do not understand how RDF/A Syntax is going to change
> that, for the blockquote cite case it seems pretty obvious that RDF/A
> further reduces the chances of meaningful implementations.

If there is no facility, no browser or search engine will implement it,
and so, by definition, no one will use it. By providing this facility,
we can break this vicious circle, thus giving browsers and search
engines something to implement, and therefore a reason to use it.

The use cases for RDFa are clear, and the popular demand is also quite
clear. As an example, one should note that there are, to date, more than
50,000,000 HTML pages that are licensed under a Creative Commons
license. Even though Creative Commons attempts to push the envelope in
terms of standards support (XHTML 1.1 and RDF since 2002) there is still
no standards-based way of expressing this licensing act *within* the
HTML document itself - a critical CC requirement. RDFa will allow them
to do so with a single additional attribute in the HTML. That's simple,
efficient, and very much in demand.

Add to that the 30,000,000 blogs tracked by Technorati and the ways in
which simple RDF metadata extensions including Dublin Core properties,
trackback, and FOAF could help interpret and process this data in new
and incredibly powerful ways. This community is currently using
Microformats, which clearly shows that they are interested in expressing
structured metadata. RDFa will let that drive be transformed into truly
interoperable, semantic statements.

The need to provide semantic meaning to existing, rendered HTML is not
artificial, it is very real. RDFa accomplishes this in a manner which we
think is natural to people who are familiar with HTML markup. As further
confirmation of this, Mike Shaver of Mozilla recently expressed strong
interest in finding ways to support RDFa in Firefox.

> It is further noteworthy that for authors already aware of RDF it is
> certainly more difficult to learn RDF/A than to use RDF/XML directly,
> and in case of SVG, the SVG community has in fact lots of experience
> with using embedded RDF/XML in SVG. It seems in fact obvious that the
> SVG approach is much easier to use,
>
>   <rdf:Description about=" http://example.org/myfoo "
>     dc:title       = "MyFoo Financial Report"
>     dc:description = "$three $bar $thousands $dollars $from 1998 ..."
>     dc:publisher   = "Example Organization"
>     dc:date         = "2000-04-11"
>     dc:format       = "image/svg+xml"
>     dc:language     = "en"
>   >
>     <dc:creator>
>     <rdf:Bag>
>       <rdf:li>Irving Bird</rdf:li>
>       <rdf:li>Mary Lambert</rdf:li>
>     </rdf:Bag>
>     </dc:creator>
>
>   </rdf:Description>
>
> versus probably something like
>
>   <head>
>     <meta about=" http://example.org/myfoo " property="dc:title"
>           content = "MyFoo Financial Report" />
>     <meta about=" http://example.org/myfoo " property="dc:description"
>           content = "$three $bar $thousands $dollars $from 1998 ..." />
>     <meta about=" http://example.org/myfoo " property="dc:publisher"
>           content = "Example Organization" />
>     <meta about=" http://example.org/myfoo " property="dc:date"
>           content = "2000-04-11" />
>     <meta about=" http://example.org/myfoo " property="dc:format"
>           content = "image/svg+xml" />
>     <meta about=" http://example.org/myfoo " property="dc:language"
>           content = "en" />
>
>     <meta about=" http://example.org/myfoo " property="dc:creator">
>       <link rel = "rdf:Bag">
>         <meta property="rdf:li" content = "Irving Bird" />
>         <meta property="rdf:li" content = "Mary Lambert" />
>       </link>
>     </meta>

The above example is probably written with a much earlier version of
RDFa in mind, although, even then, it forgoes all shorthand, which
obviously makes it look much more complicated than it needs to be.
Here's how we would write it now:

===========
 <div about="http://example.org/myfoo">
    <meta property="dc:title">MyFoo Financial Report</meta>
    <meta property="dc:description">$three $bar $thousands $dollars
$from 1998 ..."</meta>
    <meta property="dc:publisher">Example Organization</meta>
    <meta property="dc:date">2000-04-11</meta>
    <meta property="dc:format">image/svg+xml</meta>
    <meta property="dc:language">en</meta>

   <link rel="dc:creator" href="#authors" />
   <ul id="authors">
<li>Irving Bird</li>
        <li>Mary Lambert</li>
   </ul>
</div>
===========

As you can see, the structure is almost identical to the RDF/XML, with a
few more markup characters but no data duplication.

However, your point that authors familiar with RDF/XML may prefer the
SVG mechanism is certainly well-taken.  The history of the HTML language
is very different from the history of the SVG language and we've had to
accommodate that history.

Remember also that the goal of RDFa is not to make RDF metadata
inclusion in HTML simpler, it's to make it *possible* in the first
place. That said, we've also made it quite simple, given additional
constraints, such as the idea is that visible, rendered HTML could also
serve the role of a literal.

Consider the following snippet of HTML:

=========
This web page was authored by
<a rel="dc:author" href="http://ben.adida.net">Ben Adida</a>
and is licensed under a
<a rel="cc:license"
href="http://creativecommons.org/licenses/by-sa/2.0/">Creative Commons
License</a>.
=========

This is clearly recognizable HTML. Even the REL attribute should be
familiar to HTML authors, as it exists since HTML 4. You get both a
clickable, perfectly-rendered HTML page as well as two fully correct RDF
statements about the page, without ever having to maintain duplicate
versions of the data. It's difficult to imagine a simpler way to express
this wealth of information so succinctly and clearly.

> I thus do not buy the "unwieldy" cited in the motivation. Even if it
> were more like
>
>   <head>
>     <meta property="dc:title"
>           content = "MyFoo Financial Report" />
>     <meta property="dc:description"
>           content = "$three $bar $thousands $dollars $from 1998 ..." />
>     <meta property="dc:publisher"
>           content = "Example Organization" />
>     <meta property="dc:date"
>           content = "2000-04-11" />
>     <meta property="dc:format"
>           content = "image/svg+xml" />
>     <meta property="dc:language"
>           content = "en" />
>
>     <meta property="dc:creator">
>       <link rel = "rdf:Bag">
>         <meta property="rdf:li" content = "Irving Bird" />
>         <meta property="rdf:li" content = "Mary Lambert" />
>       </link>
>     </meta>
>
> It should still be obvious that this syntax is much worse than
> RDF/XML, the entire idea of encoding everything in attribute values
> is flawed, I've already filed that as comments
> 
>   http://www.w3.org/mid/418eabe3.466861160@smtp.bjoern.hoehrmann.de
>   http://www.w3.org/mid/4184c275.318109327@smtp.bjoern.hoehrmann.de

One can debate whether the syntax is worse or better than RDF/XML. The
point remains that there is significant demand for direct HTML embedding
of semantic metadata with reuse of rendered data. RDF/XML does not
fulfill this need.

> on the latest XHTML 2.0 Working Draft. Thus, please reject the comment
>
>   http://www.w3.org/mid/BE26E3E2-2C2B-11D9-837B-0003939247DC@mit.edu
>
> made by the Semantic Web Best Practices and Deployment Working Group
> and substantively re-work the meta data mechanisms in the latest
> XHTML 2.0 Working Draft to meet the needs of the HTML authoring
> community.

Many of your comments have been implicitly addressed with subsequent
revisions of RDFa. We invite you to review the latest RDFa working draft
and reevaluate your comments in light of those developments.

The needs of the web authoring community at large, both presentation and
semantic, includes our use cases [9]. We believe RDFa, especially in its
latest form [10], fits those needs very well without hindering HTML
authors who wish to ignore these new developments.

-Ben Adida
Chair, RDF-in-XHTML Task Force

[1] http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2004Nov/0007
[2] http://lists.w3.org/Archives/Member/w3c-html-wg/2005AprJun/0067
[3] http://lists.w3.org/Archives/Public/public-qa-dev/2006Jul/0011
[4] http://www.formsplayer.com/notes/xhtml-meta-data-02.html
[5]
http://www.idealliance.org/papers/dx_xmle04/papers/04-04-02/04-04-02.html
[6] http://www.w3.org/TeamSubmission/2005/SUBM-grddl-20050516/
[7] http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2004Sep/0017
[8] http://www.w3.org/2001/sw/BestPractices/HTML/2004-10-12-tf.html
[9] http://www.w3.org/2001/sw/BestPractices/HTML/2004-10-12-tf.html
[10] http://www.w3.org/TR/xhtml-rdfa-primer/
Received on Tuesday, 1 August 2006 13:07:30 UTC