gap analysis from Paul Groth on 2010-08-02 (public-xg-prov@w3.org from August 2010)

From: Paul Groth <pgroth@gmail.com>
Date: Mon, 02 Aug 2010 13:04:36 +0200
To: "public-xg-prov@w3.org" <public-xg-prov@w3.org>
Message-ID: <4C56A644.4020008@gmail.com>

Hi All,

As discussed at last week's telecon, I came up with some ideas about the 
gaps necessary to realize the News Aggregator Scenario. I've put these 
in the wiki and I append them below to help start the discussion. Let me 
know what you think.

Gap Analysis- News Aggregator

For each step within the News Aggregator scenario, there are existing 
technologies or relevant research that could solve that step. For 
example, once can properly insert licensing information into a photo 
using a creative commons license and the Extensible Metadata Platform. 
One can track the origin of tweets either through retweets or using some 
extraction technologies within twitter. However, the problem is that 
across multiple sites there is no common format and api to access and 
understand provenance information whether it is explicitly or implicitly 
determined. To inquire about retweets or inquire about trackbacks one 
needs to use different apis and understand different formats. 
Furthermore, there is no (widely deployed) mechanism to point to 
provenance information on another site. For example, once a tweet is 
traced to the end of twitter there is no way to follow where that tweet 
came from.

Systems largely do not document the software by which changes were made 
to data and what those pieces of software did to data. However, there 
are existing technologies that allow this to be done. For example, in a 
domain specific setting, XMP allows the transformations of images to be 
documented. More general formats such as OPM, and PML allow this to be 
expressed but are not currently widely deployed.

Finally, while many sites provide for identity and their are several 
widely deployed standards for identity (OpenId), there are no existing 
mechanisms for tying identity to objects or provenance traces. This 
directly ties to the attribution of objects and provenance.

Summing up there are 4 existing gaps to realizing the News Aggregator 
scenario:

- No common standard to target for exposing and expressing provenance 
information that captures processes as well as the other content dimensions.
- No well-defined standard for linking provenance between sites (i.e. 
trackback but for the whole web).
- No guidance for how exisiting standards can be put together to provide 
provenance (e.g. linking to identity).
- No guidance for how application developers should go about exposing 
provenance in there web systems.

Received on Monday, 2 August 2010 11:09:28 UTC