Re: GRDDL profile for RDF-A

Keith,

I'm not opposed to having a profile for RDFa, but it's important to
point out that we already have a proposed way of flagging RDFa content:
change your HTML version to XHTML1.1+RDFa. (And then your doc even
validates, which is important to a number of folks.) Even GRDDL works
here since we can specify the transform in the XHTML1.1+RDFa schema doc
(when we eventually specify this using XML schema.)

But let's talk practice here, since that's part of what you're worried
about (specifically, efficiency of checking for presence of RDFa.) If
you build software that assumes some RDFa header flag is always there
when RDFa is present in the document, then you're going to lose big time.

The main argument is simple: we now live in a world of mashups and
widgets. There are now third-party applications that run inside
Facebook's very own HTML page. Chances are, some widgets will include
RDFa, even if the containing page does not flag the presence of RDFa. If
you want to find the structured data in the page, you're going to have
to try the RDFa parser and see what comes out. I can't imagine that
you'll get anything useful out of the structured-data web if you don't
do this.

This isn't an RDFa issue. It's just the way the web is: pages aren't
atomic chunks anymore, they're bags of disparate chunks of HTML, each
one of which might have been authored by a different party.

The good news is that, unlike microformats, there's only one RDFa
parser, and it's not going to change regularly over time as we use more
vocabularies. That's a key difference.

On to some details....

> HTML (I'd argue) isn't really suited for being a candidate for treating
> data as a first class citizen, because its primary use is for presenting
> documents (not units of data) to humans.

We have a notable disagreement here :) What other format would you use
for providing units of data to humans? XML+XSLT (ouch)? When units of
data are presented to a human, they need to be rendered, yet you also
need to close the loop so that I can point my mouse to the rendered
stuff and get back to the structured unit of data.

That's why, in my mind, HTML is actually a *very good* place to put some
amount of structured data. Not all structured data, but certainly data
that's meant to be interpreted by human eyes to some degree.

[...]

> you still have to provide the information twice

RDFa has as one specific goal to try to *not* repeat the information. I
think we've seen that this is actually quite doable for a whole bunch of
use cases, so I'd have to disagree with that point.

> I think the heart of my disagreement with this attitude towards
> @profile, is that you obviously want *RDFa* to be in First Class, and
> all *other* methods of embedding data in html to lump it in Third -
> which is pretty much the same impression I get with regards to HTML 5's
> attitude to microformats.

This is not quite a correct comparison. Microformats don't have a
consistent syntax. You can't parse microformats without knowing *which*
vocabulary you're looking for. So there's no way that kind of loose
syntax can ever be first class. eRDF still requires declaring
page-specific stuff (like namespaces) in the HEAD of the document, so it
can't be mashed up.

What I mean is this: this isn't an *attitude* that RDFa should be First
Class and other methods should be Third. It's a realization that the web
needs *some* kind of generic syntax that is mashup-compatible, and
neither microformats nor eRDF (nor any other syntax that we know of)
fits the bill.

> And that's disappointing because there isn't one syntax that's going to
> be best for everyone all of the time, and there doesn't have to be a
> 'winner'

Your argument is deceptive here. Of course most people agree that "one
syntax isn't going to be best for everyone all the time." But you're
missing some of the context: we're already talking about embedding
something inside the *HTML syntax*. We already agree that, if you're
going to combine sources of HTML, you probably want the components
you're combining to be fairly consistent and use more or less the same
version of HTML, and not, say, XAML. <P> better mean a paragraph break.

If you accept the assumption that a web page is no longer an atomic
chunk of data, then it seems to me a worthy goal to aim for one fairly
generic syntax for structured data *in HTML*. There's a clear benefit
there. Of course that syntax won't be the best all the time for all
users, but if you want your HTML remixed, mashed up, taken apart and put
back together, then what else can you do but aim for one reasonable
syntax that has the important advantage of being consistent and produce
self-contained chunks?

I certainly see how in some cases, using GRDDL on some highly
domain-specific data would be better. But then you can't break up that
page and recombine it with something else. You give up generic syntax in
order to gain efficiency.

RDFa aims to be a generic enough syntax that you can do mashup-able
structured data in HTML. It won't always be the most compact, especially
 in certain information-dense domains, but at least it will be generic,
and that's something we actually need.

> I apologise for making my first post to this list one of dissent, and
> I'm sorry if I'm irritating you all about an issue that has already been
> laid to bed, I just think that offering at least the *option* of a
> @profile to authors is important. It  doesn't stop anyone from using or
> parsing RDFa without it, it just acknowledges that not every web page
> contains RDFa, and that RDFa isn't the only syntax for expressing RDF in
> HTML.

Happy to hear this dissent, it helps crystallize ideas and shakes us
from any potential tunnel vision.

I hope that my dissent from your ideas can similarly cause some good
discussion here :)

-Ben

Received on Saturday, 26 May 2007 02:38:35 UTC