Re: RDFa Use Cases from Manu Sporny on 2009-02-17 (public-rdf-in-xhtml-tf@w3.org from February 2009)

From: Manu Sporny <msporny@digitalbazaar.com>
Date: Tue, 17 Feb 2009 11:01:10 -0500
To: Ian Hickson <ian@hixie.ch>
CC: RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>, Sam Ruby <rubys@intertwingly.net>, Dan Brickley <danbri@danbri.org>, Dan Connolly <connolly@w3.org>
Message-ID: <499ADF46.8010005@digitalbazaar.com>
Ian Hickson wrote:
> On Sun, 15 Feb 2009, Manu Sporny wrote:
>> http://rdfa.info/wiki/rdfa-use-cases#Using_a_Data_Model_to_Generate_User_Interfaces
> 
> That's much better, yeah.

Good, glad this is moving in a desirable direction. :)

> These aren't "arguments against RDFa". They are questions that any
> technology needs to have answers for. I think you are framing this as
> defending RDFa against arguments when in fact it is just due diligence
> and research.

Agreed, the document title has been changed to "Developer FAQ":

http://rdfa.info/wiki/developer-faq

> It would be helpful if the pseudo-code more obviously indicated how 
> exactly the problem is solved. How does RDFa help here?
> The other thing that the code snippet glosses over is actually
> extracting information from the page.

The exact code snippets that are used have been added, does this help to
explain how RDFa is applied to solve the problem? If not, what is missing?

http://rdfa.info/wiki/rdfa-use-cases#Pseudo-code_Example_Using_Markup

> (The pseudo-code says that triples seen on the page are just stored in the 
> triple store, but I assume that this is a simplification -- wouldn't this 
> mean that, e.g., an <iframe> could affect the data of the page it is 
> embedded in? That seems like a security risk.)

Added a question to the developer FAQ and first-cut at answering the
question:

http://rdfa.info/wiki/developer-faq#Are_iframes_a_security_risk_to_RDFa.3F

> All the solutions on that page are basically the same as RDFa, just with a 
> different syntax and a different data model. Yes, the point of RDF is the 
> data model and the point of RDFa is the syntax for that data model. But 
> the solution space is much bigger. For example, in practice sites that 
> expose data to solve the problem in question -- scraping sites -- tend to 
> actually use imperative APIs over HTTP. (Google's GData APIs, Amazon's 
> APIs, the Flickr API, OpenSocial, etc.) Other solutions might include 
> natural language processing, automated pattern detection, user-shared 
> transformations like microsummaries or site-provided transformations like 
> GRDDL, "semantic sheets" (CSS for data extraction), or having the site 
> explictly include any UI that the site thinks should be exposed 
> (Microsoft's venture into this space with SmartTags suggests that site 
> authors may in fact not _want_ user agents to automatically expose new UI 
> based on the page content).

All of the examples you provided as alternatives, including a first cut
at the issues for each, have been added to the non-RDFa solutions page:

http://rdfa.info/wiki/non-rdfa-open-data-solutions#Screen_Scraping

>>> how to deal with authors screwing up and encoding bad data
>> http://rdfa.info/wiki/Common-arguments-against-rdfa#How_does_RDFa_deal_with_authors_screwing_up_and_encoding_bad_data.3F
> 
> Does this mean that the data that is found on a page is not stored beyond 
> that page? 

In that specific use case, yes. The duration of triple storage has been
added to the pseudo code explanation in Step #6.

> My understanding was that people wanted RDF data to be 
> persisted across multiple sessions, which would lead to bad data 
> "poisoning the well" in a way that no other feature in Web browsers has 
> yet had to deal with. 

Some people do, some don't. I think we should assume that the RDF triple
store may be more akin to the browser cache (can be cleared on a whim)
than to a traditional database (clearing the data is bad).

> (Search engine vendors spend gigantic amounts of 
> resources dealing with this problem with today's HTML -- if the goal is to 
> have the same kind of processing happening client-side, then the 
> technology needs to be resilient to this kind of thing or else it will 
> just collapse under its own weight.)

The solution to the "poisoning the well" problem seems to be to use
digital signatures to verify data that goes into any particular "well".
Keep in mind that even though we're saying "triple store", there may be
multiple triple stores per browser. For example, "personal",
"clipboard", "google", etc. Perhaps there will be a per-site triple
store in each browser (like how the browser's cookie store operates).

I have added an item for this question here:

http://rdfa.info/wiki/Developer-faq#How_does_one_prevent_bad_triples_from_corrupting_a_local_triple_store.3F

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: Bitmunk 3.1 Website Launch
http://blog.digitalbazaar.com/2009/01/16/bitmunk-3-1-website-launch
Received on Tuesday, 17 February 2009 16:01:59 UTC