W3C home > Mailing lists > Public > public-grddl-comments@w3.org > April to June 2007

Re: GRDDL profile for RDF-A

From: Keith Alexander <keithalexander@keithalexander.co.uk>
Date: Sat, 26 May 2007 12:23:18 +0100
Message-ID: <465818A6.5000309@keithalexander.co.uk>
To: Ben Adida <ben@adida.net>, RDFa <public-rdf-in-xhtml-tf@w3.org>, public-grddl-comments@w3.org

Hi Ben,

I think the fundamental reason for our differences comes down to your 
view (and probably the view of many on this list) is that RDFa is 
*natural* to HTML, and that "nearly all HTML documents contain RDFa 
anyway."  (http://osdir.com/ml/org.w3c.html.rdf/2006-12/msg00022.html)

Whereas my view is that, as an author of HTML content, I want to be able to say (to any user-agent that cares) 
whether my HTML contains RDFa or not. This is because I don't view RDFa as a natural extension to HTML, 
but an arbitrary syntax for expressing triples within it. 

Sure, a custom doctype that signifies that I'm using it goes part of the way, 
but it's a different mechanism from other (GRDDLable) syntaxes, and I suspect it's not as robust (or as simple). 
And as such, it demands special treatment over other syntaxes, which seems unnecessary. 

>  If you build software that assumes some RDFa header flag is always there
> when RDFa is present in the document, then you're going to lose big time.
As I said previously, it depends on your priorities. If the goal of the 
software is simply to find as many RDFa triples as possible, then 
obviously it is better not to get hung up on whether a profile is used 
or not.

However, if the /quality/ of the data (and/or the performance of the 
software is important) then assuming that anything that *looks* like 
RDFa *is* RDFa  could be a very bad strategy.
> The main argument is simple: we now live in a world of mashups and
> widgets. There are now third-party applications that run inside
> Facebook's very own HTML page. Chances are, some widgets will include
> RDFa, even if the containing page does not flag the presence of RDFa. If
> you want to find the structured data in the page, you're going to have
> to try the RDFa parser and see what comes out. I can't imagine that
> you'll get anything useful out of the structured-data web if you don't
> do this.
There is more to the web than blogs and social networking sites. The 
less trivial the data, the more important authorial intention is.
A key advantage of RDF, after all, is that you can use it say precisely 
what you mean.

> This isn't an RDFa issue. It's just the way the web is: pages aren't
> atomic chunks anymore, they're bags of disparate chunks of HTML, each
> one of which might have been authored by a different party.
If the data in those chunks is important, then it argues more for a 
mechanism to express the authorial intention per chunk (something like 
@profile on any element perhaps, as  Jeremy suggested).

Also, if your mashup web page has any RDF-in-HTML smarts to it, it 
probably wouldn't be republishing the HTML verbatim anyway - it would 
parse out the data, and format it how it likes (eg: see 
http://semwebdev.keithalexander.co.uk/snap.html - the page grabs 
'chunks' of eRDF from other pages, and republishes them as RDFa )
>  good news is that, unlike microformats, there's only one RDFa
> parser, and it's not going to change regularly over time as we use more
> vocabularies. That's a key difference.
A key advantage of RDF over something like microformats is the precision 
available to authorial intention - you can find or create URIs to say 
exactly what you mean. But for that to work, you need to use the *right* 
parser. (incidentally, this is even a problem right now for those who 
want to use RDFa while the spec is still in a state of  flux.)

>> HTML (I'd argue) isn't really suited for being a candidate for treating
>> data as a first class citizen, because its primary use is for presenting
>> documents (not units of data) to humans.
> We have a notable disagreement here :) What other format would you use
> for providing units of data to humans? XML+XSLT (ouch)? 
My apologies, I phrased that clumsily. My sentiment was not that there 
are better formats for presenting machine-readable data to humans, but 
that humans often need a different representation of  (some types of 
data) from machines. Machines, for instance, like timestamps, humans 
prefer that information represented a little differently. Human's often 
prefer to view floats rounded to certain number of decimal places; 
humans prefer to see the word "English" rather than the equivalent ISO 
639 code. etc etc.
> When units of
> data are presented to a human, they need to be rendered, yet you also
> need to close the loop so that I can point my mouse to the rendered
> stuff and get back to the structured unit of data.
Yes, hence the need for workarounds like @content.
> That's why, in my mind, HTML is actually a *very good* place to put some
> amount of structured data. Not all structured data, but certainly data
> that's meant to be interpreted by human eyes to some degree.
We don't disagree here (I think). I like embedding data in HTML as much 
as anyone. I'm just saying that machine readable data isn't a first 
class citizen in HTML, which is first and foremost for encoding 
human-readable documents. I think everyone agrees on that (that HTML 
documents should be presentable to human readers), but probably some 
disagree with my conclusion that therefore HTML not ought to be too 
tightly coupled with any one method of conveying machine readable data 
within it.
Perhaps it will help the debate if I lay out my assumptions:

1.  If the function of a  document format (HTML) is to convey 
information to human readers, it cannot also be *optimal* as a 
data-exchange format, even though it is still often desirable to make 
that format perform both functions.
2. Therefore compromises have to be made (for example, in the 
simplicity, verbosity, and universality of the format's syntax).
3. Therefore the compromises of some syntaxes may be more acceptable 
than others in different situations.
4. Therefore it would be disadvantageous to those who use the document 
format if any of those syntaxes became an intrinsic part of the format.

> this isn't an *attitude* that RDFa should be First
> Class and other methods should be Third. It's a realization that the web
> needs *some* kind of generic syntax that is mashup-compatible, and
> neither microformats nor eRDF (nor any other syntax that we know of)
> fits the bill.
I recognise that there are advantages to using a standardised syntax 
(reusing existing tools, and exploiting the html context of the data - 
like Ben Nowack's Live Clipboard, or my linked data preview demo ), but 
there are also valid reasons for using other syntaxes instead.

All I'm arguing for really, is that RDFa remain a *choice*  and that 
some care is taken not to get in the way of other options that authors 
have to express RDF in HTML.

If RDF-in-HTML is going to be at all significant, then we are still at 
an early stage in the game. Almost nobody is doing it yet, and the 
depths of possibilities are still pretty uncharted. RDFa doesn't need to 
make further experimentation and innovation in the wild harder; it can 
be both a standard, and an option. All you need to do is to provide a 
GRDDL profile and encourage authors to use it where possible.

If the non-atomic nature of 'mashed-up' web pages is a problem for RDFa 
using GRDDL, perhaps this is a wider problem for GRDDL to look at?


Received on Saturday, 26 May 2007 11:23:35 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:52:29 UTC