- From: Keith Alexander <keithalexander@keithalexander.co.uk>
- Date: Sat, 26 May 2007 12:23:18 +0100
- To: Ben Adida <ben@adida.net>, RDFa <public-rdf-in-xhtml-tf@w3.org>, public-grddl-comments@w3.org
Hi Ben, I think the fundamental reason for our differences comes down to your view (and probably the view of many on this list) is that RDFa is *natural* to HTML, and that "nearly all HTML documents contain RDFa anyway." (http://osdir.com/ml/org.w3c.html.rdf/2006-12/msg00022.html) Whereas my view is that, as an author of HTML content, I want to be able to say (to any user-agent that cares) whether my HTML contains RDFa or not. This is because I don't view RDFa as a natural extension to HTML, but an arbitrary syntax for expressing triples within it. Sure, a custom doctype that signifies that I'm using it goes part of the way, but it's a different mechanism from other (GRDDLable) syntaxes, and I suspect it's not as robust (or as simple). And as such, it demands special treatment over other syntaxes, which seems unnecessary. > If you build software that assumes some RDFa header flag is always there > when RDFa is present in the document, then you're going to lose big time. > As I said previously, it depends on your priorities. If the goal of the software is simply to find as many RDFa triples as possible, then obviously it is better not to get hung up on whether a profile is used or not. However, if the /quality/ of the data (and/or the performance of the software is important) then assuming that anything that *looks* like RDFa *is* RDFa could be a very bad strategy. > The main argument is simple: we now live in a world of mashups and > widgets. There are now third-party applications that run inside > Facebook's very own HTML page. Chances are, some widgets will include > RDFa, even if the containing page does not flag the presence of RDFa. If > you want to find the structured data in the page, you're going to have > to try the RDFa parser and see what comes out. I can't imagine that > you'll get anything useful out of the structured-data web if you don't > do this. > There is more to the web than blogs and social networking sites. The less trivial the data, the more important authorial intention is. A key advantage of RDF, after all, is that you can use it say precisely what you mean. > This isn't an RDFa issue. It's just the way the web is: pages aren't > atomic chunks anymore, they're bags of disparate chunks of HTML, each > one of which might have been authored by a different party. > If the data in those chunks is important, then it argues more for a mechanism to express the authorial intention per chunk (something like @profile on any element perhaps, as Jeremy suggested). Also, if your mashup web page has any RDF-in-HTML smarts to it, it probably wouldn't be republishing the HTML verbatim anyway - it would parse out the data, and format it how it likes (eg: see http://semwebdev.keithalexander.co.uk/snap.html - the page grabs 'chunks' of eRDF from other pages, and republishes them as RDFa ) > good news is that, unlike microformats, there's only one RDFa > parser, and it's not going to change regularly over time as we use more > vocabularies. That's a key difference. > > A key advantage of RDF over something like microformats is the precision available to authorial intention - you can find or create URIs to say exactly what you mean. But for that to work, you need to use the *right* parser. (incidentally, this is even a problem right now for those who want to use RDFa while the spec is still in a state of flux.) >> HTML (I'd argue) isn't really suited for being a candidate for treating >> data as a first class citizen, because its primary use is for presenting >> documents (not units of data) to humans. >> > > We have a notable disagreement here :) What other format would you use > for providing units of data to humans? XML+XSLT (ouch)? My apologies, I phrased that clumsily. My sentiment was not that there are better formats for presenting machine-readable data to humans, but that humans often need a different representation of (some types of data) from machines. Machines, for instance, like timestamps, humans prefer that information represented a little differently. Human's often prefer to view floats rounded to certain number of decimal places; humans prefer to see the word "English" rather than the equivalent ISO 639 code. etc etc. > When units of > data are presented to a human, they need to be rendered, yet you also > need to close the loop so that I can point my mouse to the rendered > stuff and get back to the structured unit of data. > > Yes, hence the need for workarounds like @content. > That's why, in my mind, HTML is actually a *very good* place to put some > amount of structured data. Not all structured data, but certainly data > that's meant to be interpreted by human eyes to some degree. > > > We don't disagree here (I think). I like embedding data in HTML as much as anyone. I'm just saying that machine readable data isn't a first class citizen in HTML, which is first and foremost for encoding human-readable documents. I think everyone agrees on that (that HTML documents should be presentable to human readers), but probably some disagree with my conclusion that therefore HTML not ought to be too tightly coupled with any one method of conveying machine readable data within it. Perhaps it will help the debate if I lay out my assumptions: 1. If the function of a document format (HTML) is to convey information to human readers, it cannot also be *optimal* as a data-exchange format, even though it is still often desirable to make that format perform both functions. 2. Therefore compromises have to be made (for example, in the simplicity, verbosity, and universality of the format's syntax). 3. Therefore the compromises of some syntaxes may be more acceptable than others in different situations. 4. Therefore it would be disadvantageous to those who use the document format if any of those syntaxes became an intrinsic part of the format. > > this isn't an *attitude* that RDFa should be First > Class and other methods should be Third. It's a realization that the web > needs *some* kind of generic syntax that is mashup-compatible, and > neither microformats nor eRDF (nor any other syntax that we know of) > fits the bill. > > I recognise that there are advantages to using a standardised syntax (reusing existing tools, and exploiting the html context of the data - like Ben Nowack's Live Clipboard, or my linked data preview demo ), but there are also valid reasons for using other syntaxes instead. All I'm arguing for really, is that RDFa remain a *choice* and that some care is taken not to get in the way of other options that authors have to express RDF in HTML. If RDF-in-HTML is going to be at all significant, then we are still at an early stage in the game. Almost nobody is doing it yet, and the depths of possibilities are still pretty uncharted. RDFa doesn't need to make further experimentation and innovation in the wild harder; it can be both a standard, and an option. All you need to do is to provide a GRDDL profile and encourage authors to use it where possible. If the non-atomic nature of 'mashed-up' web pages is a problem for RDFa using GRDDL, perhaps this is a wider problem for GRDDL to look at? Cheers, Keith
Received on Saturday, 26 May 2007 11:23:35 UTC