Re: technical features from PML discussion from Yolanda Gil on 2010-11-05 (public-xg-prov@w3.org from November 2010)

From: Yolanda Gil <gil@isi.edu>
Date: Fri, 5 Nov 2010 07:58:00 -0700
To: James Myers <qqmyers@hotmail.com>
Cc: <public-xg-prov@w3.org>
Message-Id: <2C35B8E1-FF45-4752-87F8-FBAA05AE6015@isi.edu>
Jim,

Thanks.  It would be helpful to hear if any of the issues you raise  
are missing from our major items in:

	http://www.w3.org/2005/Incubator/prov/wiki/Recommendations_for_scenarios

As far as I can see, all the issues below are captured in the group's  
recommendations although you suggest a different set of terms that may  
or may not be better.  And yes, we have already converged as a group  
on the term "resource".

The group also agreed to a set of priorities for those  
recommendations, and some of the things you raise below were ranked as  
lower priority.  Are you suggesting that we change the priorities?   
That is fine, but if we do that we will have a broader set of goals  
which will take the working group more time to sort through.  Note  
that the incubator group is a pretty homogeneous set of people and  
therefore relatively uncontroversial, and I anticipate the working  
group to be more diverse and therefore take longer to converge.  We  
should be careful.

Yolanda


On Nov 3, 2010, at 7:46 AM, James Myers wrote:

> All,
>
> Paulo, Deborah, and I were able to get together for a lengthy  
> discussion last week and made progress in identifying the spec-level  
> technical issues that the PML community's work raises. We're hopeful  
> that disentangling these issues from their larger body of their work  
> to develop tools, explore rule-based inference 'processes', etc.,  
> will enable further productive discussion. My sense is that this has  
> been harder to do with PML than with other languages due to its roots.
>
> The comments below are mine but have been discussed by the three of  
> us (credit to Paulo and Deborah, blame for the parts you don't like  
> to me :-)
>
> The PML folks have seen a lot of benefits from including a  
> characterization of 'process' steps that improves the ability for  
> human and/or machines to understand what happened. PML has been used  
> in areas where process steps are based on inference rules  
> ('inferencesteps have an association with the 'inferencerules' the  
> execute) which allows very rigorous analysis, but a similar argument  
> applies when the characterization is more textual - i.e. text  
> describing a step in a computational workflow. I know a number of  
> the groups using OPM and other languages under discussion have ways  
> of associating provenance with workflow templates, computational  
> codes, papers, or other process explanations, but OPM does not  
> standardize it. When we thought about what would really need to be  
> in a spec to allow these use cases, it seemed like a minimal  
> requirement would be to have a relationship between 'process' steps  
> and the recipe/template/description of the process that was executed  
> in the step. Once such a link exists, it might also be possible/ 
> useful to standardize the association of the roles of 'artifacts' in  
> the 'process' steps with the recipe/template.
>
> PML has a 'source' construct which serves the purpose of  
> representing mutable resources (people, organizations, databases,  
> documents, etc) from which resources of interest are extracted (a  
> 'sourceusage' construct allows description of when, where, how the  
> extraction occurred). In PML, mutable resources can be 'Agents' or  
> 'Documents'. I also understand that 'source' has a connotation of  
> assertion/backing - good to know that your quote came from a NYTimes  
> story for example. I think I've heard echoes in other discussions of  
> both the need for capturing the mutable to fixed transition (e.g.  
> for versioning) and for documenting the idea that something is  
> backed/asserted by some agent (e.g. that the act of publishing is  
> special in this way). It is less clear what the best construct(s)  
> would be here, but I think a discussion of sourcing, versioning, and  
> publishing should be in scope for a new spec.
>
> A third area where more discussion is needed is in finalizing the  
> terms for concepts that OPM labels as 'artifact' and 'process'. An  
> 'artifact' in data is something that needs to be removed, and  
> 'process' doesn't convey the separation between a process and the  
> execution of a process - a discrete invocation of it. The PML terms  
> - 'information', 'inference step'  aren't better, but perhaps  
> 'resource' that is already being used in the group and something  
> like 'process execution', 'process step' might be alternatives.
>
> I think the common thread among these is that, at the level of a  
> specification, relatively straight-forward modifications/additions  
> would add value and more powerful interoperability across languages.  
> I think it is interesting to note that although PML has very  
> different roots and hence it has been harder to map, in the end the  
> core concepts do match and the desirable mods/extensions are fairly  
> well aligned with those derived from the other provenance language  
> comparisons.
>
> In addition to these which I think are the core issues, we discussed  
> a few other things that we'd like to capture:
>
> PML includes a way of annotating parts of artifacts (e.g. through  
> describing offsets in text where a sentence can be found). I know  
> there's other work going on to standardize annotation languages that  
> do this for multiple types of media - should a W3C standard include  
> something in this area? (Maybe as a 'profile'/extension?)
>
> We wondered whether/how account was different than an artifact (is  
> it a fourth core concept or is it a special type of artifact?) This  
> discussion was brought to the table when comparing OPM account and  
> PML capabilities to encode accounts. Our conclusion was that  
> accounts are important and that PML and OPM could represent the  
> provenance of provenance just by considering an entire graph to be  
> an 'artifact', but it was unclear to me what the consequences of  
> making an account a type of 'artifact' might be. (Seems like the  
> arguments parallel the NamedGraph discussions for RDF.)
>
> Also related to accounts, we also wondered whether progress could be  
> made in thinking of accounts as 'alternate explanations' - PML tries  
> to address the fact that there may be two sets of rules that can get  
> you to a conclusion and they are explicitly 'alternate explanations'  
> and not just two 'accounts' - I think this may come back to the type  
> of discussion of labeling accounts as 'alternatives' that we had in  
> OPM at one point.
>
> I guess these last two raise the potential need for further group  
> discussion of what needs to be in the language to represent the  
> provenance of provenance (one of our W3C requirements) and what  
> relationships might be needed to work with them.
>
>   Cheers,
>
>  Jim
>
>
>
> James D. Myers, Ph.D.
> Director, Computational Center for Nanotechnology Innovations (CCNI)
> Rensselaer Technology Park, 405 Jordan Road, Troy, NY 12180-3590 USA
> Phone: 518-276-2858
> Fax: 518-276-2392
> E-mail: myersj4@rpi.edu
>
Received on Friday, 5 November 2010 14:59:05 UTC