- From: James Myers <qqmyers@hotmail.com>
- Date: Wed, 3 Nov 2010 10:46:50 -0400
- To: <public-xg-prov@w3.org>
- Message-ID: <BAY138-DS784ECB9D3AB9D081F43C6BF4A0@phx.gbl>
All, Paulo, Deborah, and I were able to get together for a lengthy discussion last week and made progress in identifying the spec-level technical issues that the PML community's work raises. We're hopeful that disentangling these issues from their larger body of their work to develop tools, explore rule-based inference 'processes', etc., will enable further productive discussion. My sense is that this has been harder to do with PML than with other languages due to its roots. The comments below are mine but have been discussed by the three of us (credit to Paulo and Deborah, blame for the parts you don't like to me :-) The PML folks have seen a lot of benefits from including a characterization of 'process' steps that improves the ability for human and/or machines to understand what happened. PML has been used in areas where process steps are based on inference rules ('inferencesteps have an association with the 'inferencerules' the execute) which allows very rigorous analysis, but a similar argument applies when the characterization is more textual - i.e. text describing a step in a computational workflow. I know a number of the groups using OPM and other languages under discussion have ways of associating provenance with workflow templates, computational codes, papers, or other process explanations, but OPM does not standardize it. When we thought about what would really need to be in a spec to allow these use cases, it seemed like a minimal requirement would be to have a relationship between 'process' steps and the recipe/template/description of the process that was executed in the step. Once such a link exists, it might also be possible/useful to standardize the association of the roles of 'artifacts' in the 'process' steps with the recipe/template. PML has a 'source' construct which serves the purpose of representing mutable resources (people, organizations, databases, documents, etc) from which resources of interest are extracted (a 'sourceusage' construct allows description of when, where, how the extraction occurred). In PML, mutable resources can be 'Agents' or 'Documents'. I also understand that 'source' has a connotation of assertion/backing - good to know that your quote came from a NYTimes story for example. I think I've heard echoes in other discussions of both the need for capturing the mutable to fixed transition (e.g. for versioning) and for documenting the idea that something is backed/asserted by some agent (e.g. that the act of publishing is special in this way). It is less clear what the best construct(s) would be here, but I think a discussion of sourcing, versioning, and publishing should be in scope for a new spec. A third area where more discussion is needed is in finalizing the terms for concepts that OPM labels as 'artifact' and 'process'. An 'artifact' in data is something that needs to be removed, and 'process' doesn't convey the separation between a process and the execution of a process - a discrete invocation of it. The PML terms - 'information', 'inference step' aren't better, but perhaps 'resource' that is already being used in the group and something like 'process execution', 'process step' might be alternatives. I think the common thread among these is that, at the level of a specification, relatively straight-forward modifications/additions would add value and more powerful interoperability across languages. I think it is interesting to note that although PML has very different roots and hence it has been harder to map, in the end the core concepts do match and the desirable mods/extensions are fairly well aligned with those derived from the other provenance language comparisons. In addition to these which I think are the core issues, we discussed a few other things that we'd like to capture: PML includes a way of annotating parts of artifacts (e.g. through describing offsets in text where a sentence can be found). I know there's other work going on to standardize annotation languages that do this for multiple types of media - should a W3C standard include something in this area? (Maybe as a 'profile'/extension?) We wondered whether/how account was different than an artifact (is it a fourth core concept or is it a special type of artifact?) This discussion was brought to the table when comparing OPM account and PML capabilities to encode accounts. Our conclusion was that accounts are important and that PML and OPM could represent the provenance of provenance just by considering an entire graph to be an 'artifact', but it was unclear to me what the consequences of making an account a type of 'artifact' might be. (Seems like the arguments parallel the NamedGraph discussions for RDF.) Also related to accounts, we also wondered whether progress could be made in thinking of accounts as 'alternate explanations' - PML tries to address the fact that there may be two sets of rules that can get you to a conclusion and they are explicitly 'alternate explanations' and not just two 'accounts' - I think this may come back to the type of discussion of labeling accounts as 'alternatives' that we had in OPM at one point. I guess these last two raise the potential need for further group discussion of what needs to be in the language to represent the provenance of provenance (one of our W3C requirements) and what relationships might be needed to work with them. Cheers, Jim James D. Myers, Ph.D. Director, Computational Center for Nanotechnology Innovations (CCNI) Rensselaer Technology Park, 405 Jordan Road, Troy, NY 12180-3590 USA Phone: 518-276-2858 Fax: 518-276-2392 E-mail: myersj4@rpi.edu
Received on Friday, 5 November 2010 13:05:04 UTC