- From: James Myers <qqmyers@hotmail.com>
- Date: Sat, 6 Nov 2010 17:31:40 -0400
- To: <public-xg-prov@w3.org>
- Message-ID: <Bay138-DS1205D4910596C6142A9328BF4D0@phx.gbl>
Yolanda, I think they do overlap, particular the first three. I think the 'recipe' feature aligns with Recommendation 7 on being able to represent a procedure which has been enacted and is tied to reproducibility as well (Rec. 5). Similarly I think the idea of a source aligns with Rec. 1 and perhaps Rec. 6. Changing the terminology relative to OPM is also captured by Rec. 1 and I don't think I/we recommended anything new w.r.t. the XG group - resource and "processing step" are already in Rec. 1. So - I think the biggest take-home from the last email was that the spec. level issues derived from this side discussion are not that different. I guess I would also add that the discussion made me think that it might be useful to not be too strict about setting the spec group goals to cover only Recs 1,2, and 3. While I share the concern of adding too much complexity and trying to standardize things that are still 'researchy', I could see some simple things like the link from process step to the underlying procedure (without trying to standardize how such procedures are represented) might be useful and still lightweight/non-controversial. Rather than argue for any specific inclusion, I might just argue that a spec group should look across Recs 4-8 and, while recognizing that fully standardizing how such recommended would be met is too big/complex/controversial/slow, it should see if there are lightweight additions that would minimize the gratuitous differences that would otherwise arise as different providers try to extend the new standard. Jim From: public-xg-prov-request@w3.org [mailto:public-xg-prov-request@w3.org] On Behalf Of Yolanda Gil Sent: Friday, November 05, 2010 10:58 AM To: James Myers Cc: public-xg-prov@w3.org Subject: Re: technical features from PML discussion Jim, Thanks. It would be helpful to hear if any of the issues you raise are missing from our major items in: http://www.w3.org/2005/Incubator/prov/wiki/Recommendations_for_scenarios As far as I can see, all the issues below are captured in the group's recommendations although you suggest a different set of terms that may or may not be better. And yes, we have already converged as a group on the term "resource". The group also agreed to a set of priorities for those recommendations, and some of the things you raise below were ranked as lower priority. Are you suggesting that we change the priorities? That is fine, but if we do that we will have a broader set of goals which will take the working group more time to sort through. Note that the incubator group is a pretty homogeneous set of people and therefore relatively uncontroversial, and I anticipate the working group to be more diverse and therefore take longer to converge. We should be careful. Yolanda On Nov 3, 2010, at 7:46 AM, James Myers wrote: All, Paulo, Deborah, and I were able to get together for a lengthy discussion last week and made progress in identifying the spec-level technical issues that the PML community's work raises. We're hopeful that disentangling these issues from their larger body of their work to develop tools, explore rule-based inference 'processes', etc., will enable further productive discussion. My sense is that this has been harder to do with PML than with other languages due to its roots. The comments below are mine but have been discussed by the three of us (credit to Paulo and Deborah, blame for the parts you don't like to me :-) The PML folks have seen a lot of benefits from including a characterization of 'process' steps that improves the ability for human and/or machines to understand what happened. PML has been used in areas where process steps are based on inference rules ('inferencesteps have an association with the 'inferencerules' the execute) which allows very rigorous analysis, but a similar argument applies when the characterization is more textual - i.e. text describing a step in a computational workflow. I know a number of the groups using OPM and other languages under discussion have ways of associating provenance with workflow templates, computational codes, papers, or other process explanations, but OPM does not standardize it. When we thought about what would really need to be in a spec to allow these use cases, it seemed like a minimal requirement would be to have a relationship between 'process' steps and the recipe/template/description of the process that was executed in the step. Once such a link exists, it might also be possible/useful to standardize the association of the roles of 'artifacts' in the 'process' steps with the recipe/template. PML has a 'source' construct which serves the purpose of representing mutable resources (people, organizations, databases, documents, etc) from which resources of interest are extracted (a 'sourceusage' construct allows description of when, where, how the extraction occurred). In PML, mutable resources can be 'Agents' or 'Documents'. I also understand that 'source' has a connotation of assertion/backing - good to know that your quote came from a NYTimes story for example. I think I've heard echoes in other discussions of both the need for capturing the mutable to fixed transition (e.g. for versioning) and for documenting the idea that something is backed/asserted by some agent (e.g. that the act of publishing is special in this way). It is less clear what the best construct(s) would be here, but I think a discussion of sourcing, versioning, and publishing should be in scope for a new spec. A third area where more discussion is needed is in finalizing the terms for concepts that OPM labels as 'artifact' and 'process'. An 'artifact' in data is something that needs to be removed, and 'process' doesn't convey the separation between a process and the execution of a process - a discrete invocation of it. The PML terms - 'information', 'inference step' aren't better, but perhaps 'resource' that is already being used in the group and something like 'process execution', 'process step' might be alternatives. I think the common thread among these is that, at the level of a specification, relatively straight-forward modifications/additions would add value and more powerful interoperability across languages. I think it is interesting to note that although PML has very different roots and hence it has been harder to map, in the end the core concepts do match and the desirable mods/extensions are fairly well aligned with those derived from the other provenance language comparisons. In addition to these which I think are the core issues, we discussed a few other things that we'd like to capture: PML includes a way of annotating parts of artifacts (e.g. through describing offsets in text where a sentence can be found). I know there's other work going on to standardize annotation languages that do this for multiple types of media - should a W3C standard include something in this area? (Maybe as a 'profile'/extension?) We wondered whether/how account was different than an artifact (is it a fourth core concept or is it a special type of artifact?) This discussion was brought to the table when comparing OPM account and PML capabilities to encode accounts. Our conclusion was that accounts are important and that PML and OPM could represent the provenance of provenance just by considering an entire graph to be an 'artifact', but it was unclear to me what the consequences of making an account a type of 'artifact' might be. (Seems like the arguments parallel the NamedGraph discussions for RDF.) Also related to accounts, we also wondered whether progress could be made in thinking of accounts as 'alternate explanations' - PML tries to address the fact that there may be two sets of rules that can get you to a conclusion and they are explicitly 'alternate explanations' and not just two 'accounts' - I think this may come back to the type of discussion of labeling accounts as 'alternatives' that we had in OPM at one point. I guess these last two raise the potential need for further group discussion of what needs to be in the language to represent the provenance of provenance (one of our W3C requirements) and what relationships might be needed to work with them. Cheers, Jim James D. Myers, Ph.D. Director, Computational Center for Nanotechnology Innovations (CCNI) Rensselaer Technology Park, 405 Jordan Road, Troy, NY 12180-3590 USA Phone: 518-276-2858 Fax: 518-276-2392 E-mail: myersj4@rpi.edu
Received on Saturday, 6 November 2010 21:32:44 UTC