technical features from PML discussion from James Myers on 2010-11-03 (public-xg-prov@w3.org from November 2010)

From: James Myers <qqmyers@hotmail.com>
Date: Wed, 3 Nov 2010 10:46:50 -0400
To: <public-xg-prov@w3.org>
Message-ID: <BAY138-DS784ECB9D3AB9D081F43C6BF4A0@phx.gbl>
All, 
 
Paulo, Deborah, and I were able to get together for a lengthy discussion
last week and made progress in identifying the spec-level technical issues
that the PML community's work raises. We're hopeful that disentangling these
issues from their larger body of their work to develop tools, explore
rule-based inference 'processes', etc., will enable further productive
discussion. My sense is that this has been harder to do with PML than with
other languages due to its roots.
 
The comments below are mine but have been discussed by the three of us
(credit to Paulo and Deborah, blame for the parts you don't like to me :-) 
 

The PML folks have seen a lot of benefits from including a characterization
of 'process' steps that improves the ability for human and/or machines to
understand what happened. PML has been used in areas where process steps are
based on inference rules ('inferencesteps have an association with the
'inferencerules' the execute) which allows very rigorous analysis, but a
similar argument applies when the characterization is more textual - i.e.
text describing a step in a computational workflow. I know a number of the
groups using OPM and other languages under discussion have ways of
associating provenance with workflow templates, computational codes, papers,
or other process explanations, but OPM does not standardize it. When we
thought about what would really need to be in a spec to allow these use
cases, it seemed like a minimal requirement would be to have a relationship
between 'process' steps and the recipe/template/description of the process
that was executed in the step. Once such a link exists, it might also be
possible/useful to standardize the association of the roles of 'artifacts'
in the 'process' steps with the recipe/template. 
 
PML has a 'source' construct which serves the purpose of representing
mutable resources (people, organizations, databases, documents, etc) from
which resources of interest are extracted (a 'sourceusage' construct allows
description of when, where, how the extraction occurred). In PML, mutable
resources can be 'Agents' or 'Documents'. I also understand that 'source'
has a connotation of assertion/backing - good to know that your quote came
from a NYTimes story for example. I think I've heard echoes in other
discussions of both the need for capturing the mutable to fixed transition
(e.g. for versioning) and for documenting the idea that something is
backed/asserted by some agent (e.g. that the act of publishing is special in
this way). It is less clear what the best construct(s) would be here, but I
think a discussion of sourcing, versioning, and publishing should be in
scope for a new spec.

A third area where more discussion is needed is in finalizing the terms for
concepts that OPM labels as 'artifact' and 'process'. An 'artifact' in data
is something that needs to be removed, and 'process' doesn't convey the
separation between a process and the execution of a process - a discrete
invocation of it. The PML terms - 'information', 'inference step'  aren't
better, but perhaps 'resource' that is already being used in the group and
something like 'process execution', 'process step' might be alternatives. 
 

I think the common thread among these is that, at the level of a
specification, relatively straight-forward modifications/additions would add
value and more powerful interoperability across languages. I think it is
interesting to note that although PML has very different roots and hence it
has been harder to map, in the end the core concepts do match and the
desirable mods/extensions are fairly well aligned with those derived from
the other provenance language comparisons.
 
In addition to these which I think are the core issues, we discussed a few
other things that we'd like to capture:

PML includes a way of annotating parts of artifacts (e.g. through describing
offsets in text where a sentence can be found). I know there's other work
going on to standardize annotation languages that do this for multiple types
of media - should a W3C standard include something in this area? (Maybe as a
'profile'/extension?)

We wondered whether/how account was different than an artifact (is it a
fourth core concept or is it a special type of artifact?) This discussion
was brought to the table when comparing OPM account and PML capabilities to
encode accounts. Our conclusion was that accounts are important and that PML
and OPM could represent the provenance of provenance just by considering an
entire graph to be an 'artifact', but it was unclear to me what the
consequences of making an account a type of 'artifact' might be. (Seems like
the arguments parallel the NamedGraph discussions for RDF.)
 
Also related to accounts, we also wondered whether progress could be made in
thinking of accounts as 'alternate explanations' - PML tries to address the
fact that there may be two sets of rules that can get you to a conclusion
and they are explicitly 'alternate explanations' and not just two 'accounts'
- I think this may come back to the type of discussion of labeling accounts
as 'alternatives' that we had in OPM at one point.
 
I guess these last two raise the potential need for further group discussion
of what needs to be in the language to represent the provenance of
provenance (one of our W3C requirements) and what relationships might be
needed to work with them.
 
  Cheers,
 
 Jim

 

 

James D. Myers, Ph.D.

Director, Computational Center for Nanotechnology Innovations (CCNI)

Rensselaer Technology Park, 405 Jordan Road, Troy, NY 12180-3590 USA 

Phone: 518-276-2858

Fax: 518-276-2392

E-mail: myersj4@rpi.edu
Received on Friday, 5 November 2010 13:05:04 UTC