Re: gap analysis from Yolanda Gil on 2010-08-02 (public-xg-prov@w3.org from August 2010)

From: Yolanda Gil <gil@isi.edu>
Date: Mon, 2 Aug 2010 11:36:23 -0700
To: Paul Groth <pgroth@gmail.com>
Cc: Luc Moreau <L.Moreau@ecs.soton.ac.uk>, "public-xg-prov@w3.org" <public-xg-prov@w3.org>
Message-Id: <D714BFA6-E7AF-4587-B61C-10E5C97E3BA7@isi.edu>
Paul:

Thanks for getting this started.  I agree with your response below,  
and with Luc's original point of using the dimensions.

I wonder if we can make things more crisp and targeted to the  
scenario.  By that I mean that the 4 gaps you mention apply to the  
Disease Outbreak Scenario just as well.  Perhaps the kinds of gaps  
that are unique in this scenario are things like: 1) the identity/ 
derivation of information as it is disseminated in the blogosphere and  
the twittesphere, 2) the scale of the provenance records to be  
processed (in DO we would have much smaller records), 3) access of the  
information in terms of how to find the provenance of each item that  
needs to be checked, 4) dealing with imperfections in provenance, 5)  
making trust judgements based on provenance as there will be varying  
information quality.

So I think your 4 gaps below exist in this scenario, but are much more  
central to the DO scenario.  Conversely, I think the 5 gaps above  
exist in the DO scenario but are much more central to this one.

Yolanda



On Aug 2, 2010, at 4:37 AM, Paul Groth wrote:

> Luc,
>
> Thanks for the comments. I agree with you that organizing the gaps  
> according to the dimensions (Use/Management) is good. The dimensions  
> have really provided a nice organizational tool to connect all the  
> groups work together.
>
> So introducing querying is interesting, the question is do you need  
> to have a query mechanism to solve the News Aggregator Scenarion or  
> is just exposing the data suitable enough? For example, imagine I  
> marked up a web page with provenance information in RDFa and that  
> had links to other provenance information would I need a query api  
> for that?  I think you could solve that without a query api, you  
> just need a crawler then.
>
> What do you think?
>
> Paul
>
>
>
>
> Luc Moreau wrote:
>> Thanks Paul for this proposal for the gap analysis.
>> Twice you mention 'exposing' and i thought we could introduce  
>> 'querying' provenance too.
>>
>> Also, maybe the gaps could be structured in content vs apis.
>> Like this, maybe.
>>
>>
>> Content:
>> - No common standard for expressing provenance information that  
>> captures processes as well as the other content dimensions.
>> - No guidance for how existing standards can be put together to  
>> provide provenance (e.g. linking to identity).
>>
>> APIs (or protocols):
>> - No common API for obtaining/querying provenance information
>> - No guidance for how application developers should go about  
>> exposing provenance in their web systems.
>> - No well-defined standard for linking provenance between sites  
>> (i.e. trackback but for the whole web).
>>
>>
>> I also wondered whether they should be structured according to the  
>> provenance dimensions (so instead of API, break
>> this into Use/Management).
>>
>> Luc
>>
>>
>>
>> On 08/02/2010 12:04 PM, Paul Groth wrote:
>>> Hi All,
>>>
>>> As discussed at last week's telecon, I came up with some ideas  
>>> about the gaps necessary to realize the News Aggregator Scenario.  
>>> I've put these in the wiki and I append them below to help start  
>>> the discussion. Let me know what you think.
>>>
>>> Gap Analysis- News Aggregator
>>>
>>> For each step within the News Aggregator scenario, there are  
>>> existing technologies or relevant research that could solve that  
>>> step. For example, once can properly insert licensing information  
>>> into a photo using a creative commons license and the Extensible  
>>> Metadata Platform. One can track the origin of tweets either  
>>> through retweets or using some extraction technologies within  
>>> twitter. However, the problem is that across multiple sites there  
>>> is no common format and api to access and understand provenance  
>>> information whether it is explicitly or implicitly determined. To  
>>> inquire about retweets or inquire about trackbacks one needs to  
>>> use different apis and understand different formats. Furthermore,  
>>> there is no (widely deployed) mechanism to point to provenance  
>>> information on another site. For example, once a tweet is traced  
>>> to the end of twitter there is no way to follow where that tweet  
>>> came from.
>>>
>>> Systems largely do not document the software by which changes were  
>>> made to data and what those pieces of software did to data.  
>>> However, there are existing technologies that allow this to be  
>>> done. For example, in a domain specific setting, XMP allows the  
>>> transformations of images to be documented. More general formats  
>>> such as OPM, and PML allow this to be expressed but are not  
>>> currently widely deployed.
>>>
>>> Finally, while many sites provide for identity and their are  
>>> several widely deployed standards for identity (OpenId), there are  
>>> no existing mechanisms for tying identity to objects or provenance  
>>> traces. This directly ties to the attribution of objects and  
>>> provenance.
>>>
>>> Summing up there are 4 existing gaps to realizing the News  
>>> Aggregator scenario:
>>>
>>> - No common standard to target for exposing and expressing  
>>> provenance information that captures processes as well as the  
>>> other content dimensions.
>>> - No well-defined standard for linking provenance between sites  
>>> (i.e. trackback but for the whole web).
>>> - No guidance for how exisiting standards can be put together to  
>>> provide provenance (e.g. linking to identity).
>>> - No guidance for how application developers should go about  
>>> exposing provenance in there web systems.
Received on Monday, 2 August 2010 18:38:02 UTC