RDFa vs Microdata, and the separation of data and presentation

I've been pondering RDFa and Microdata and what unifying them (or replacing
them with a single new thing) might mean, in parallel to thinking about JSON
graph-serialization.

The thing that bothers me about all the approaches to embedding ids and
predicates and scopes and types and such in HTML is that, well, they involve
embedding, or attempting to embed, machine-audience data structures inside
human-audience presentation structures. Why are we doing this at all? It's
terrible and we know better, and it isn't even helpful. We may very
reasonably want to know when a bit of presentation *corresponds* to a bit of
data, but I see no argument at all for why the entire structures need or
even want to be interleaved.

Here is what might be a much, much simpler and yet better idea:

1. Add to HTML5 a new global attribute called "data". This takes, as a
value, a space-separated list of absolute or relative IRIs, which identify
data objects represented by the contents of the HTML element so-marked. The
exact semantics of "represented by" are human, not technical, but we could
provide many guiding examples.

2. Add to HTML5 a new element called "DATA". The contents of this are a
canonical JSON serialization of the data structure underlying the contents
of the page, presumably including (but not limited to) the objects referred
to by "data" attributes on elements in the BODY.

Isn't this vastly simpler to understand, produce and consume than any of the
existing embedding schemes, to at least the same benefit?

The embedding part of this is now concerned *only* with associating the
visible content ands its corresponding data, so we get from 3 embedding
schemes to 1.

But maybe even more importantly, by separating the data-structure from the
embedding we eliminate the need to have embedded encodings separate from the
regular non-embedded encodings. And we provide the most compelling possible
justification for using JSON as the canonical serialization (i.e., DATA is
effectively a SCRIPT block with an implicit jsonp callback). And we
eliminate the need for content negotiation in a vast number of cases,
because a machine agent can just take the DATA from the page. And we ensure
that people use IRIs for everything, because that's how it all works. And we
start to establish the expectation that a data-backed page *should* have its
data included.

And then the task of this mailing-list/group/whatever would become very
specific: provide the rules for how the DATA element is written. That is,
it's not just *a* JSON serialization, but *the* JSON serialization. In fact,
it's not just *a* web data-graph serialization, but basically *the* web
data-graph serialization.

glenn

Received on Friday, 1 July 2011 17:50:37 UTC