Re: gap analysis from Paul Groth on 2010-08-02 (public-xg-prov@w3.org from August 2010)

From: Paul Groth <pgroth@gmail.com>
Date: Mon, 02 Aug 2010 13:37:22 +0200
To: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
CC: "public-xg-prov@w3.org" <public-xg-prov@w3.org>
Message-ID: <4C56ADF2.1060507@gmail.com>
Luc,

Thanks for the comments. I agree with you that organizing the gaps 
according to the dimensions (Use/Management) is good. The dimensions 
have really provided a nice organizational tool to connect all the 
groups work together.

So introducing querying is interesting, the question is do you need to 
have a query mechanism to solve the News Aggregator Scenarion or is just 
exposing the data suitable enough? For example, imagine I marked up a 
web page with provenance information in RDFa and that had links to other 
provenance information would I need a query api for that?  I think you 
could solve that without a query api, you just need a crawler then.

What do you think?

Paul




Luc Moreau wrote:
> Thanks Paul for this proposal for the gap analysis.
> Twice you mention 'exposing' and i thought we could introduce 
> 'querying' provenance too.
>
> Also, maybe the gaps could be structured in content vs apis.
> Like this, maybe.
>
>
> Content:
> - No common standard for expressing provenance information that 
> captures processes as well as the other content dimensions.
> - No guidance for how existing standards can be put together to 
> provide provenance (e.g. linking to identity).
>
> APIs (or protocols):
> - No common API for obtaining/querying provenance information
> - No guidance for how application developers should go about exposing 
> provenance in their web systems.
> - No well-defined standard for linking provenance between sites (i.e. 
> trackback but for the whole web).
>
>
> I also wondered whether they should be structured according to the 
> provenance dimensions (so instead of API, break
> this into Use/Management).
>
> Luc
>
>
>
> On 08/02/2010 12:04 PM, Paul Groth wrote:
>> Hi All,
>>
>> As discussed at last week's telecon, I came up with some ideas about 
>> the gaps necessary to realize the News Aggregator Scenario. I've put 
>> these in the wiki and I append them below to help start the 
>> discussion. Let me know what you think.
>>
>> Gap Analysis- News Aggregator
>>
>> For each step within the News Aggregator scenario, there are existing 
>> technologies or relevant research that could solve that step. For 
>> example, once can properly insert licensing information into a photo 
>> using a creative commons license and the Extensible Metadata 
>> Platform. One can track the origin of tweets either through retweets 
>> or using some extraction technologies within twitter. However, the 
>> problem is that across multiple sites there is no common format and 
>> api to access and understand provenance information whether it is 
>> explicitly or implicitly determined. To inquire about retweets or 
>> inquire about trackbacks one needs to use different apis and 
>> understand different formats. Furthermore, there is no (widely 
>> deployed) mechanism to point to provenance information on another 
>> site. For example, once a tweet is traced to the end of twitter there 
>> is no way to follow where that tweet came from.
>>
>> Systems largely do not document the software by which changes were 
>> made to data and what those pieces of software did to data. However, 
>> there are existing technologies that allow this to be done. For 
>> example, in a domain specific setting, XMP allows the transformations 
>> of images to be documented. More general formats such as OPM, and PML 
>> allow this to be expressed but are not currently widely deployed.
>>
>> Finally, while many sites provide for identity and their are several 
>> widely deployed standards for identity (OpenId), there are no 
>> existing mechanisms for tying identity to objects or provenance 
>> traces. This directly ties to the attribution of objects and provenance.
>>
>> Summing up there are 4 existing gaps to realizing the News Aggregator 
>> scenario:
>>
>> - No common standard to target for exposing and expressing provenance 
>> information that captures processes as well as the other content 
>> dimensions.
>> - No well-defined standard for linking provenance between sites (i.e. 
>> trackback but for the whole web).
>> - No guidance for how exisiting standards can be put together to 
>> provide provenance (e.g. linking to identity).
>> - No guidance for how application developers should go about exposing 
>> provenance in there web systems.
Received on Monday, 2 August 2010 11:42:14 UTC