RE: technical features from PML discussion from James Myers on 2010-11-06 (public-xg-prov@w3.org from November 2010)

From: James Myers <qqmyers@hotmail.com>
Date: Sat, 6 Nov 2010 17:31:40 -0400
To: <public-xg-prov@w3.org>
Message-ID: <Bay138-DS1205D4910596C6142A9328BF4D0@phx.gbl>
Yolanda,

 

I think they do overlap, particular the first three. I think the 'recipe'
feature aligns with Recommendation 7 on being able to represent a procedure
which has been enacted and is tied to reproducibility as well (Rec. 5).
Similarly I think the idea of a source aligns with Rec. 1 and perhaps Rec.
6. Changing the terminology relative to OPM is also captured by Rec. 1 and I
don't think I/we recommended anything new w.r.t. the XG group - resource and
"processing step" are already in Rec. 1.

 

So - I think the biggest take-home from the last email was that the spec.
level issues derived from this side discussion are not that different. 

 

I guess I would also add that the discussion made me think that it might be
useful to not be too strict about setting the spec group goals to cover only
Recs 1,2, and 3. While I share the concern of adding too much complexity and
trying to standardize things that are still 'researchy', I could see some
simple things like the link from process step to the underlying procedure
(without trying to standardize how such procedures are represented) might be
useful and still lightweight/non-controversial. Rather than argue for any
specific inclusion, I might just argue that a spec group should look across
Recs 4-8 and, while recognizing that fully standardizing how such
recommended would be met is too big/complex/controversial/slow, it should
see if there are lightweight additions that would minimize the gratuitous
differences that would otherwise arise as different providers try to extend
the new standard.

 

Jim

 

 

From: public-xg-prov-request@w3.org [mailto:public-xg-prov-request@w3.org]
On Behalf Of Yolanda Gil
Sent: Friday, November 05, 2010 10:58 AM
To: James Myers
Cc: public-xg-prov@w3.org
Subject: Re: technical features from PML discussion

 

Jim,

 

Thanks.  It would be helpful to hear if any of the issues you raise are
missing from our major items in:

 

 
http://www.w3.org/2005/Incubator/prov/wiki/Recommendations_for_scenarios

 

As far as I can see, all the issues below are captured in the group's
recommendations although you suggest a different set of terms that may or
may not be better.  And yes, we have already converged as a group on the
term "resource".

 

The group also agreed to a set of priorities for those recommendations, and
some of the things you raise below were ranked as lower priority.  Are you
suggesting that we change the priorities?  That is fine, but if we do that
we will have a broader set of goals which will take the working group more
time to sort through.  Note that the incubator group is a pretty homogeneous
set of people and therefore relatively uncontroversial, and I anticipate the
working group to be more diverse and therefore take longer to converge.  We
should be careful.

 

Yolanda

 

 

On Nov 3, 2010, at 7:46 AM, James Myers wrote:





All, 
 
Paulo, Deborah, and I were able to get together for a lengthy discussion
last week and made progress in identifying the spec-level technical issues
that the PML community's work raises. We're hopeful that disentangling these
issues from their larger body of their work to develop tools, explore
rule-based inference 'processes', etc., will enable further productive
discussion. My sense is that this has been harder to do with PML than with
other languages due to its roots.
 
The comments below are mine but have been discussed by the three of us
(credit to Paulo and Deborah, blame for the parts you don't like to me :-) 
 

The PML folks have seen a lot of benefits from including a characterization
of 'process' steps that improves the ability for human and/or machines to
understand what happened. PML has been used in areas where process steps are
based on inference rules ('inferencesteps have an association with the
'inferencerules' the execute) which allows very rigorous analysis, but a
similar argument applies when the characterization is more textual - i.e.
text describing a step in a computational workflow. I know a number of the
groups using OPM and other languages under discussion have ways of
associating provenance with workflow templates, computational codes, papers,
or other process explanations, but OPM does not standardize it. When we
thought about what would really need to be in a spec to allow these use
cases, it seemed like a minimal requirement would be to have a relationship
between 'process' steps and the recipe/template/description of the process
that was executed in the step. Once such a link exists, it might also be
possible/useful to standardize the association of the roles of 'artifacts'
in the 'process' steps with the recipe/template. 
 
PML has a 'source' construct which serves the purpose of representing
mutable resources (people, organizations, databases, documents, etc) from
which resources of interest are extracted (a 'sourceusage' construct allows
description of when, where, how the extraction occurred). In PML, mutable
resources can be 'Agents' or 'Documents'. I also understand that 'source'
has a connotation of assertion/backing - good to know that your quote came
from a NYTimes story for example. I think I've heard echoes in other
discussions of both the need for capturing the mutable to fixed transition
(e.g. for versioning) and for documenting the idea that something is
backed/asserted by some agent (e.g. that the act of publishing is special in
this way). It is less clear what the best construct(s) would be here, but I
think a discussion of sourcing, versioning, and publishing should be in
scope for a new spec.

A third area where more discussion is needed is in finalizing the terms for
concepts that OPM labels as 'artifact' and 'process'. An 'artifact' in data
is something that needs to be removed, and 'process' doesn't convey the
separation between a process and the execution of a process - a discrete
invocation of it. The PML terms - 'information', 'inference step'  aren't
better, but perhaps 'resource' that is already being used in the group and
something like 'process execution', 'process step' might be alternatives. 
 

I think the common thread among these is that, at the level of a
specification, relatively straight-forward modifications/additions would add
value and more powerful interoperability across languages. I think it is
interesting to note that although PML has very different roots and hence it
has been harder to map, in the end the core concepts do match and the
desirable mods/extensions are fairly well aligned with those derived from
the other provenance language comparisons.
 
In addition to these which I think are the core issues, we discussed a few
other things that we'd like to capture:

PML includes a way of annotating parts of artifacts (e.g. through describing
offsets in text where a sentence can be found). I know there's other work
going on to standardize annotation languages that do this for multiple types
of media - should a W3C standard include something in this area? (Maybe as a
'profile'/extension?)

We wondered whether/how account was different than an artifact (is it a
fourth core concept or is it a special type of artifact?) This discussion
was brought to the table when comparing OPM account and PML capabilities to
encode accounts. Our conclusion was that accounts are important and that PML
and OPM could represent the provenance of provenance just by considering an
entire graph to be an 'artifact', but it was unclear to me what the
consequences of making an account a type of 'artifact' might be. (Seems like
the arguments parallel the NamedGraph discussions for RDF.)
 
Also related to accounts, we also wondered whether progress could be made in
thinking of accounts as 'alternate explanations' - PML tries to address the
fact that there may be two sets of rules that can get you to a conclusion
and they are explicitly 'alternate explanations' and not just two 'accounts'
- I think this may come back to the type of discussion of labeling accounts
as 'alternatives' that we had in OPM at one point.
 
I guess these last two raise the potential need for further group discussion
of what needs to be in the language to represent the provenance of
provenance (one of our W3C requirements) and what relationships might be
needed to work with them.
 
  Cheers,
 
 Jim

 

 

James D. Myers, Ph.D.

Director, Computational Center for Nanotechnology Innovations (CCNI)

Rensselaer Technology Park, 405 Jordan Road, Troy, NY 12180-3590 USA

Phone: 518-276-2858

Fax: 518-276-2392

E-mail: myersj4@rpi.edu
Received on Saturday, 6 November 2010 21:32:44 UTC