The Complexity Argument (was: Re: Request to publish HTML+RDFa (draft 3) as FPWD)

Hi Jonas,

Thanks for the feedback, I've noted (on the RDFa wiki) that I should go
through your e-mail in detail and pick out the issues that should be
addressed via the spec, but while ensuring compatibility with XHTML+RDFa

I'd like to address some of the higher-level points you made and not
focus on specific spec changes in this e-mail. My goal is to not start a
perma-thread discussing RDFa's acceptable level of complexity. I'd like
to express that there are similar concerns among all of us and that many
of your core beliefs are shared among the semantic web and RDFa community.

The "technology X is too complex for authors and therefore we shouldn't
deploy it" argument is one that each standards community deals with
constantly. Rather than attempt to refute that argument, I'd like to
explain why I don't think it's an argument that has a clear "winner".

I say this as someone who has consistently argued for simplicity over
complexity when possible, both in the Microformats community and again
in the RDFa community. If there is one thing that Microformats, RDFa and
Microdata have in common, it's the desire to make the markup as simple
as possible for authors while not sacrificing use cases that are
important to each community. There are many talented people working on
this semantics problem and "acceptable complexity" is a very hard thing
to nail down when you're talking about web authors. It really boils down
to the use cases that you're attempting to solve and RDFa is attempting
to solve a much larger set of use cases than Microformats or Microdata.

Whether or not those use cases are worth solving is debatable and is
something that we have and continue to spend a great deal of time

RDFa is more complex than Microformats and Microdata. It is more complex
because the set of use cases are more complex. Follow-your-nose,
vocabulary validation, data typing, and inferencing are just a few of
the design goals for RDFa, based on the requirements in the use cases.

It's fairly clear that RDFa is more complex than Microformats and
Microdata, and I would say that is true because it solves a larger set
of problems. What is not clear is whether it is too complex for the
majority of its adopters. No amount of discussion is going to discover
if this statement is true or not... we'll just have to wait and see. If
it is true, RDFa will fail in the marketplace. If it is not true, then
we're good.

To look at this another way, one could claim that HTML5, Javascript,
canvas, or SVG is too complex for regular web authors. Those
technologies are certainly far more complex than RDFa, yet we see
widespread use of each of those technologies on the web.

If you think RDFa is too complex, please propose an alternative or
propose alternatives to the way RDFa works that are backwards compatible
with XHTML+RDFa 1.0. We have some fairly large field deployments of RDFa
and the number is growing, not shrinking. We need to be very careful to
support these early adopters while making things simpler, if possible.

Some of these improvements that we're currently discussing will simplify
the markup a bit, but they make the implementations a bit more challenging:

* The ability to use URLs in CURIEs, which would allow people to not use
prefixes if they don't want to. For example:

<a rel=""

The ability to create vocabulary bundles and also extend the default set
of reserved words, so authors could do the following:

<a rel="likes" href="">Wuf</a>

Unfortunately, those changes are going to take some time to gain
consensus and traction in the community. Some of the features may never
make it into RDFa. It's clear that there will be an RDFa 1.1 and in that
we're going to unify the RDFa in XHTML and RDFa in HTML documents. So,
we do recognize that there is room for improvement and if you'd like to
help make those improvements, please do take part in the RDFa community
to make those changes.

Jonas Sicking wrote:
> For example the CURIE mechanism means that there's a big risk that if
> part of a document is copied, it risks loosing its meaning since
> prefixes might no longer be declared. It could potentially even change
> its meaning if the same prefixes are declared, but declared to a
> different value, though this seems less likely to happen.

We spent a considerable amount of time discussing cut-paste scenarios
and did as much as we could to prevent triple generation when RDFa
markup wasn't clearly specified. For example, erroneous triples are not
generated if there isn't a clear subject, predicate and object. While
this can lead to semantic data loss, it's the trade-off we made to
enable (arguably) more accurate markup.

RDFa, by design, also allows anybody to pre-load a list of known prefix
mappings in their application and create a separate graph of triples
with those known prefix mappings. So, if cut-paste resulting in
undefined prefix mappings is an issue to a certain class of user agents,
there is a mechanism that allows them to pre-define prefix mappings so
that there are less triples lost to cut-paste errors in major vocabularies.

> The same thing could potentially even happen if a Node is moved 
> around in the DOM.

This is true for any technology that depends on the structure of the
HTML DOM. Microformats, Microdata and RDFa are susceptible to this
issue... but so is everything else that depends on the structure of the

> Another problem is the flexibility of RDFa to express arbitrary graphs
> of data. While this might not be a huge problem for someone authoring
> a document since you can simply choose not to express complex graphs
> someone reading a document with RDFa risks being exposed to it.

Complex markup is... complex. :)

Somebody reading the HTML5 spec source is exposed to 4MB of HTML/CSS
markup. Somebody looking at Gmail's Javascript/HTML source code, or
Facebook's page source, are exposed to a tremendous amount of
complexity. That doesn't make the technology any less useful to society,
but complexity does make it harder to understand a subset of products of
that technology.

There are trade-offs by going simpler, if there were no tradeoffs, we'd
have made those changes a long time ago. So, if there is a
simplification that you can see that doesn't result in a big tradeoff
and is backwards-compatible with most of XHTML+RDFa 1.0, we'd be glad to
go that route if it results in a simpler system.

To do that, however, we need a concrete proposal.

> It's not just the ability to create arbitrary graphs that seems
> worrisome, but features like incomplete triplets, that I'm concerned
> will make it hard for authors to understand other authors documents.

I wish I could tell you that I know whether or not RDFa is going to do
what you claim it is going to do. I don't think it will, neither does
most of the RDFa community, but we can't predict the future with that
degree of accuracy.

> While processing using DOM Level 1 methods is seems doable, it is the
> only specification that I know of where that is done. I would imagine
> that this will cause problems. For example a javascript implementation
> couldn't use the lookupNamespacePrefix function defined in DOM Level 3
> and implemented in most commonly used browsers.

Could you elaborate more on this point? Specifically, are you saying
that most Javascript implementations wouldn't be able to implement RDFa?
Or are you saying technologies that only provide DOM Level 3 wouldn't be
able to implement RDFa? Or are you saying something else?

> Again, I realize that some of these problems might not be possible to
> address, but I wanted to let you know my concerns early in the
> process.

Thank you for taking the time and posting your concerns to the list,
Jonas. Thank you, especially for doing so while keeping a sense of
decorum and mutual respect that I hope will permeate discussions around
this WD as it moves through the HTML WG process.

-- manu

Manu Sporny (skype: msporny, twitter: manusporny)
President/CEO - Digital Bazaar, Inc.
blog: The Pirate Bay and Building an Equitable Culture

Received on Friday, 18 September 2009 21:19:57 UTC