Re: gap analysis from Paul Groth on 2010-08-02 (public-xg-prov@w3.org from August 2010)

From: Paul Groth <pgroth@gmail.com>
Date: Mon, 02 Aug 2010 21:20:53 +0200
To: Yolanda Gil <gil@isi.edu>
CC: Luc Moreau <L.Moreau@ecs.soton.ac.uk>, "public-xg-prov@w3.org" <public-xg-prov@w3.org>
Message-ID: <4C571A95.30901@gmail.com>
Hi Yolanda,

I think the gaps I identified are crucial to solving this scenario. I 
also see at least for the gaps 1 and 3 that you specify those being the 
same as gaps that were originally defined. Personally, I don't see there 
being a particular problem with sharing gaps between scenarios, it just 
means that there's less to solve technically.

Also, I wonder if we should make a distinction between what gaps you 
would need to fill to even begin to solve a scenario and all the gaps 
that need to filled to actually produce a solution. What I'm getting at 
is that I believe there's a distinction between a platform that enables 
a solution and the technology you put on top it. For example, if you 
exposed a large amount of provenance information then maybe someone 
would come along and solve the scalability problem. In the normal web, 
for instance, we had imperfections in HTML but the browser vendors were 
able to deal with it. (Also, see AltaVista and search...).

Thoughts?
Paul


Yolanda Gil wrote:
> Paul:
>
> Thanks for getting this started.  I agree with your response below, 
> and with Luc's original point of using the dimensions.
>
> I wonder if we can make things more crisp and targeted to the 
> scenario.  By that I mean that the 4 gaps you mention apply to the 
> Disease Outbreak Scenario just as well.  Perhaps the kinds of gaps 
> that are unique in this scenario are things like: 1) the 
> identity/derivation of information as it is disseminated in the 
> blogosphere and the twittesphere, 2) the scale of the provenance 
> records to be processed (in DO we would have much smaller records), 3) 
> access of the information in terms of how to find the provenance of 
> each item that needs to be checked, 4) dealing with imperfections in 
> provenance, 5) making trust judgements based on provenance as there 
> will be varying information quality.
>
> So I think your 4 gaps below exist in this scenario, but are much more 
> central to the DO scenario.  Conversely, I think the 5 gaps above 
> exist in the DO scenario but are much more central to this one.
>
> Yolanda
>
>
>
> On Aug 2, 2010, at 4:37 AM, Paul Groth wrote:
>
>> Luc,
>>
>> Thanks for the comments. I agree with you that organizing the gaps 
>> according to the dimensions (Use/Management) is good. The dimensions 
>> have really provided a nice organizational tool to connect all the 
>> groups work together.
>>
>> So introducing querying is interesting, the question is do you need 
>> to have a query mechanism to solve the News Aggregator Scenarion or 
>> is just exposing the data suitable enough? For example, imagine I 
>> marked up a web page with provenance information in RDFa and that had 
>> links to other provenance information would I need a query api for 
>> that?  I think you could solve that without a query api, you just 
>> need a crawler then.
>>
>> What do you think?
>>
>> Paul
>>
>>
>>
>>
>> Luc Moreau wrote:
>>> Thanks Paul for this proposal for the gap analysis.
>>> Twice you mention 'exposing' and i thought we could introduce 
>>> 'querying' provenance too.
>>>
>>> Also, maybe the gaps could be structured in content vs apis.
>>> Like this, maybe.
>>>
>>>
>>> Content:
>>> - No common standard for expressing provenance information that 
>>> captures processes as well as the other content dimensions.
>>> - No guidance for how existing standards can be put together to 
>>> provide provenance (e.g. linking to identity).
>>>
>>> APIs (or protocols):
>>> - No common API for obtaining/querying provenance information
>>> - No guidance for how application developers should go about 
>>> exposing provenance in their web systems.
>>> - No well-defined standard for linking provenance between sites 
>>> (i.e. trackback but for the whole web).
>>>
>>>
>>> I also wondered whether they should be structured according to the 
>>> provenance dimensions (so instead of API, break
>>> this into Use/Management).
>>>
>>> Luc
>>>
>>>
>>>
>>> On 08/02/2010 12:04 PM, Paul Groth wrote:
>>>> Hi All,
>>>>
>>>> As discussed at last week's telecon, I came up with some ideas 
>>>> about the gaps necessary to realize the News Aggregator Scenario. 
>>>> I've put these in the wiki and I append them below to help start 
>>>> the discussion. Let me know what you think.
>>>>
>>>> Gap Analysis- News Aggregator
>>>>
>>>> For each step within the News Aggregator scenario, there are 
>>>> existing technologies or relevant research that could solve that 
>>>> step. For example, once can properly insert licensing information 
>>>> into a photo using a creative commons license and the Extensible 
>>>> Metadata Platform. One can track the origin of tweets either 
>>>> through retweets or using some extraction technologies within 
>>>> twitter. However, the problem is that across multiple sites there 
>>>> is no common format and api to access and understand provenance 
>>>> information whether it is explicitly or implicitly determined. To 
>>>> inquire about retweets or inquire about trackbacks one needs to use 
>>>> different apis and understand different formats. Furthermore, there 
>>>> is no (widely deployed) mechanism to point to provenance 
>>>> information on another site. For example, once a tweet is traced to 
>>>> the end of twitter there is no way to follow where that tweet came 
>>>> from.
>>>>
>>>> Systems largely do not document the software by which changes were 
>>>> made to data and what those pieces of software did to data. 
>>>> However, there are existing technologies that allow this to be 
>>>> done. For example, in a domain specific setting, XMP allows the 
>>>> transformations of images to be documented. More general formats 
>>>> such as OPM, and PML allow this to be expressed but are not 
>>>> currently widely deployed.
>>>>
>>>> Finally, while many sites provide for identity and their are 
>>>> several widely deployed standards for identity (OpenId), there are 
>>>> no existing mechanisms for tying identity to objects or provenance 
>>>> traces. This directly ties to the attribution of objects and 
>>>> provenance.
>>>>
>>>> Summing up there are 4 existing gaps to realizing the News 
>>>> Aggregator scenario:
>>>>
>>>> - No common standard to target for exposing and expressing 
>>>> provenance information that captures processes as well as the other 
>>>> content dimensions.
>>>> - No well-defined standard for linking provenance between sites 
>>>> (i.e. trackback but for the whole web).
>>>> - No guidance for how exisiting standards can be put together to 
>>>> provide provenance (e.g. linking to identity).
>>>> - No guidance for how application developers should go about 
>>>> exposing provenance in there web systems.
Received on Monday, 2 August 2010 19:21:43 UTC