- From: Ian Hickson <ian@hixie.ch>
- Date: Tue, 17 Feb 2009 02:16:24 +0000 (UTC)
- To: Manu Sporny <msporny@digitalbazaar.com>
- Cc: RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>, Sam Ruby <rubys@intertwingly.net>, Dan Brickley <danbri@danbri.org>, Dan Connolly <connolly@w3.org>
On Sun, 15 Feb 2009, Manu Sporny wrote: > > http://rdfa.info/wiki/rdfa-use-cases#Using_a_Data_Model_to_Generate_User_Interfaces That's much better, yeah. It would be helpful if the pseudo-code more obviously indicated how exactly the problem is solved. The problem here says "Browser UIs for working with web page data suck" and "[...] we could use the data in a web page to generate a custom browser UI". The pseudo-code's entire treatment of this problem is "information for that video object is used to construct a Firefox 3 UI". However, in practice, that's the hardest part -- designing good UI that doesn't "suck" takes weeks and weeks of really tedious and careful usability studies, with one-way glass, eye-tracking hardware, cardboard mock UIs, etc. How does RDFa help here? The other thing that the code snippet glosses over is actually extracting information from the page. Since this is something that's going to be done by hundreds or thousands of people in all kinds of environments (I presume, anyway, correct me if I'm wrong), I would expect that getting useful information out of the page would be easy -- a few dozen lines of code probably. With the Creative Commons data embedded as RDF/XML in HTML comments, this was not the case, and instead people used regular expressions to parse the data (!), ignoring the RDFness. (The pseudo-code says that triples seen on the page are just stored in the triple store, but I assume that this is a simplification -- wouldn't this mean that, e.g., an <iframe> could affect the data of the page it is embedded in? That seems like a security risk.) > However, the list of potential solutions and issues involved have been > moved to a separate page: > > http://rdfa.info/wiki/non-rdfa-open-data-solutions > > It's a healthy exercise to document how we might solve these problems if > RDFa did not exist. All the solutions on that page are basically the same as RDFa, just with a different syntax and a different data model. Yes, the point of RDF is the data model and the point of RDFa is the syntax for that data model. But the solution space is much bigger. For example, in practice sites that expose data to solve the problem in question -- scraping sites -- tend to actually use imperative APIs over HTTP. (Google's GData APIs, Amazon's APIs, the Flickr API, OpenSocial, etc.) Other solutions might include natural language processing, automated pattern detection, user-shared transformations like microsummaries or site-provided transformations like GRDDL, "semantic sheets" (CSS for data extraction), or having the site explictly include any UI that the site thinks should be exposed (Microsoft's venture into this space with SmartTags suggests that site authors may in fact not _want_ user agents to automatically expose new UI based on the page content). > > and code snippets showing how the data would then be processed; > > I'm assuming pseudo-code/pseudo-process or a pointer to the project > would be good enough for most of these cases? Pseudo-code is an acceptable (though inferior) substitute to real code. A pointer to a full project isn't really useful. The point here is to show that this kind of thing is not an undue implementation burden. I wouldn't worry about code snippets for things that are intended to be implemented only by a few user agents (whether those be browsers, extensions, or whatever). It's only important with a variety of implementations. For example, showing the code for an HTML5 parser wasn't necessary when discussing syntax issues in HTML, because the expectation is that there will be parser libraries written for each platform, and beyond that it would be a non-issue. However, for WebSocket, I did check that implementing the server-side would be easy, because the expectation is that lots of people will independently implement this, and that they wouldn't always use a library. > > and a discussion of ways to deal with the likely problems (e.g., for > > this particular use case: > > These are all similar problems relevant to most use cases dealing with > data sharing on the web. These challenges have been broken out into a > separate document and could be pointed to from the RDFa Use Cases > Document (note that the answers haven't been filled out yet): > > http://rdfa.info/wiki/common-arguments-against-rdfa These aren't "arguments against RDFa". They are questions that any technology needs to have answers for. I think you are framing this as defending RDFa against arguments when in fact it is just due diligence and research. > > how to deal with authors screwing up and encoding bad data > > http://rdfa.info/wiki/Common-arguments-against-rdfa#How_does_RDFa_deal_with_authors_screwing_up_and_encoding_bad_data.3F Does this mean that the data that is found on a page is not stored beyond that page? My understanding was that people wanted RDF data to be persisted across multiple sessions, which would lead to bad data "poisoning the well" in a way that no other feature in Web browsers has yet had to deal with. (Search engine vendors spend gigantic amounts of resources dealing with this problem with today's HTML -- if the goal is to have the same kind of processing happening client-side, then the technology needs to be resilient to this kind of thing or else it will just collapse under its own weight.) HTH, -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 17 February 2009 02:17:01 UTC