W3C home > Mailing lists > Public > public-rdf-in-xhtml-tf@w3.org > February 2009

Re: RDFa Use Cases (was: RDFa and Web Directions North 2009)

From: Ian Hickson <ian@hixie.ch>
Date: Tue, 17 Feb 2009 02:16:24 +0000 (UTC)
To: Manu Sporny <msporny@digitalbazaar.com>
Cc: RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>, Sam Ruby <rubys@intertwingly.net>, Dan Brickley <danbri@danbri.org>, Dan Connolly <connolly@w3.org>
Message-ID: <Pine.LNX.4.62.0902170130420.6186@hixie.dreamhostps.com>
On Sun, 15 Feb 2009, Manu Sporny wrote:
> 
> http://rdfa.info/wiki/rdfa-use-cases#Using_a_Data_Model_to_Generate_User_Interfaces

That's much better, yeah.

It would be helpful if the pseudo-code more obviously indicated how 
exactly the problem is solved. The problem here says "Browser UIs for 
working with web page data suck" and "[...] we could use the data in a web 
page to generate a custom browser UI". The pseudo-code's entire treatment 
of this problem is "information for that video object is used to construct 
a Firefox 3 UI". However, in practice, that's the hardest part -- 
designing good UI that doesn't "suck" takes weeks and weeks of really 
tedious and careful usability studies, with one-way glass, eye-tracking 
hardware, cardboard mock UIs, etc. How does RDFa help here?

The other thing that the code snippet glosses over is actually extracting 
information from the page. Since this is something that's going to be done 
by hundreds or thousands of people in all kinds of environments (I 
presume, anyway, correct me if I'm wrong), I would expect that getting 
useful information out of the page would be easy -- a few dozen lines of 
code probably. With the Creative Commons data embedded as RDF/XML in HTML 
comments, this was not the case, and instead people used regular 
expressions to parse the data (!), ignoring the RDFness.

(The pseudo-code says that triples seen on the page are just stored in the 
triple store, but I assume that this is a simplification -- wouldn't this 
mean that, e.g., an <iframe> could affect the data of the page it is 
embedded in? That seems like a security risk.)


> However, the list of potential solutions and issues involved have been 
> moved to a separate page:
> 
> http://rdfa.info/wiki/non-rdfa-open-data-solutions
> 
> It's a healthy exercise to document how we might solve these problems if 
> RDFa did not exist.

All the solutions on that page are basically the same as RDFa, just with a 
different syntax and a different data model. Yes, the point of RDF is the 
data model and the point of RDFa is the syntax for that data model. But 
the solution space is much bigger. For example, in practice sites that 
expose data to solve the problem in question -- scraping sites -- tend to 
actually use imperative APIs over HTTP. (Google's GData APIs, Amazon's 
APIs, the Flickr API, OpenSocial, etc.) Other solutions might include 
natural language processing, automated pattern detection, user-shared 
transformations like microsummaries or site-provided transformations like 
GRDDL, "semantic sheets" (CSS for data extraction), or having the site 
explictly include any UI that the site thinks should be exposed 
(Microsoft's venture into this space with SmartTags suggests that site 
authors may in fact not _want_ user agents to automatically expose new UI 
based on the page content).


> > and code snippets showing how the data would then be processed;
> 
> I'm assuming pseudo-code/pseudo-process or a pointer to the project 
> would be good enough for most of these cases?

Pseudo-code is an acceptable (though inferior) substitute to real code. A 
pointer to a full project isn't really useful. The point here is to show 
that this kind of thing is not an undue implementation burden. I wouldn't 
worry about code snippets for things that are intended to be implemented 
only by a few user agents (whether those be browsers, extensions, or 
whatever). It's only important with a variety of implementations.

For example, showing the code for an HTML5 parser wasn't necessary when 
discussing syntax issues in HTML, because the expectation is that there 
will be parser libraries written for each platform, and beyond that it 
would be a non-issue. However, for WebSocket, I did check that 
implementing the server-side would be easy, because the expectation is 
that lots of people will independently implement this, and that they 
wouldn't always use a library.


> > and a discussion of ways to deal with the likely problems (e.g., for 
> > this particular use case:
> 
> These are all similar problems relevant to most use cases dealing with 
> data sharing on the web. These challenges have been broken out into a 
> separate document and could be pointed to from the RDFa Use Cases 
> Document (note that the answers haven't been filled out yet):
> 
> http://rdfa.info/wiki/common-arguments-against-rdfa

These aren't "arguments against RDFa". They are questions that any 
technology needs to have answers for. I think you are framing this as 
defending RDFa against arguments when in fact it is just due diligence 
and research.


> > how to deal with authors screwing up and encoding bad data
> 
> http://rdfa.info/wiki/Common-arguments-against-rdfa#How_does_RDFa_deal_with_authors_screwing_up_and_encoding_bad_data.3F

Does this mean that the data that is found on a page is not stored beyond 
that page? My understanding was that people wanted RDF data to be 
persisted across multiple sessions, which would lead to bad data 
"poisoning the well" in a way that no other feature in Web browsers has 
yet had to deal with. (Search engine vendors spend gigantic amounts of 
resources dealing with this problem with today's HTML -- if the goal is to 
have the same kind of processing happening client-side, then the 
technology needs to be resilient to this kind of thing or else it will 
just collapse under its own weight.)

HTH,
-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 17 February 2009 02:17:01 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 17 February 2009 02:17:02 GMT