Re: Issue 89 - why? from Satya Sahoo on 2011-09-19 (public-prov-wg@w3.org from September 2011)

From: Satya Sahoo <satya.sahoo@case.edu>
Date: Sun, 18 Sep 2011 22:02:28 -0400
To: Graham Klyne <GK@ninebynine.org>
Cc: "Myers, Jim" <MYERSJ4@rpi.edu>, W3C provenance WG <public-prov-wg@w3.org>
Message-ID: <CAOMwk6zZuYzYLpBCVqS5K2pZrAs5Tb-fzw9wW61imFhMdu+e2g@mail.gmail.com>
Hi Graham,
My responses are interleaved.

>d1, d1v2, d1v2 are surely different things, but I don't see that modelling
them is fundamentally different.  >What sorts of things can one say about
d1v1/d1v2 that one cannot say about d1?  And vice versa?

Thanks for correcting the typo - d1v2 and not d2v2!

In an OWL ontology, if d1 is asserted to be an instance of class "Document"
then its version information is optional. On the other hand, if d1v1 and
d1v2 are to be asserted to be instances of a class "VersionedDocument" then
a user has to associate a version number to its attribute "hasVersion".

The advantages of having a class "VersionedDocument" versus only "Document"
is now we can make all kinds of assertions that Jim wanted - "this is the
latest version of the Appointment Letter", "this version of the Appointment
Letter has wrong starting date", "this version of the Appointment Letter has
the 2011 university rules on academic integrity", there is a clear link
between various versions etc.

A "Document" class on the other hand *does not need* its instances to have
version information, hence (1) there is no consistency check possible by an
OWL reasoner to flag incorrect assertions which do not have version
information - this will allow corruption of knowledge base, (2) if there are
multiple instances of Appointment Letter with missing version information,
there is very little information to distinguish between various versions - a
user may have to manually review timestamps, name of file etc. (defeating
the purpose of using SW technologies).

A similar analogy is a class "Person" versus specialized classes "Employee",
"US Senator", "Surgeon" etc.

Overall, my point is that if we decide to create an OWL ontology for PROV we
have to follow the OWL requirements.

> my point was that it doesn't prevent one from describing (say) d1v1, d1v2,
etc. and also separately saying >that they arr "versions" of d1.  And it's a
*lot* simpler than the current proposal.
Since OPMV (and even the Provenir ontology[1]) use OWL, it does exactly what
we are doing in the PROV ontology - define what are "necessary" conditions
for an instance to be member of a class.

There is no difference in how any OWL ontology will describe d1v1 and d1v2
as well as that they are versions of d1 in terms of "necessary" conditions
for class definition. Just to clarify, we have not discussed anything that
is not standard way of defining an OWL ontology class (no PROV ontology
specific complication).

Maybe a specific example will help us understand our points better or you
(and others) are invited to attend our regular ontology telcon at 12noon US
ET - please send me your Skype ID separately if you would like to attend.

Thanks.

Best,
Satya

[1] http://wiki.knoesis.org/index.php/Provenir_Ontology

On Sun, Sep 18, 2011 at 5:16 PM, Graham Klyne <GK@ninebynine.org> wrote:

> On 18/09/2011 19:52, Satya Sahoo wrote:
>
>> Hi Jim and Graham,
>>
>>  If we don't distinguish at all, we have a mess - a document and a version
>>> can't be distinguished if we can't>talk about fixed content and we'd then
>>> be unable to answer questions about when the document was>created (with
>>> the
>>> first version or only when the text was finalized).
>>>
>>
>>
>> I believe modeling a document d1 versus modeling versions of document
>> d1v1,
>> d2v2 are two distinct notions.
>>
>
> d1, d1v2, d1v2 are surely different things, but I don't see that modelling
> them is fundamentally different.  What sorts of things can one say about
> d1v1/d1v2 that one cannot say about d1?  And vice versa?
>
>
>  The d1v1 and d2v2 are specialized (maybe
>> subclass) notions of d1. Also, modeling concepts such as d1v1, d2v2 are
>> not
>> required by all provenance applications.
>>
>>
>>  For example, OPMV avoids this whole issue by saying that the things to
>>>
>> which provenance are applied are>static [1].
>> The OPMV has used the original OPM Artifact definition and hence the OPM
>> notion of "static" Artifact.
>>
>
> Certainly - my point was that it doesn't prevent one from describing (say)
> d1v1, d1v2, etc. and also separately saying that they arr "versions" of d1.
>  And it's a *lot* simpler than the current proposal.
>
> #g
> --
>
>  On Sun, Sep 18, 2011 at 6:20 AM, Graham Klyne<GK@ninebynine.org>  wrote:
>>
>>  Jim,
>>>
>>>
>>> On 17/09/2011 16:15, Myers, Jim wrote:
>>>
>>>  Are you asking whether we need to distinguish between something and
>>>> 'something that can't change in some ways' to unambiguously record
>>>> provenance, or just whether frozen attributes is the best way to do
>>>> that?
>>>>
>>>> If we don't distinguish at all, we have a mess - a document and a
>>>> version
>>>> can't be distinguished if we can't talk about fixed content and we'd
>>>> then be
>>>> unable to answer questions about when the document was created (with the
>>>> first version or only when the text was finalized).
>>>>
>>>>
>>> Agreed, we need to be able to distinguish between the document and its
>>> "versions" for which some values about which we make provenance
>>> assertions
>>> are invariant.
>>>
>>>
>>>  (This is the problem with things - we don't always agree on what aspects
>>>
>>>> of a thing can change and still be recognizable as the same thing, so we
>>>> define entities for which the aspects that important relative to the
>>>> provenance we're recording are clearly changeable or not changeable, not
>>>> open to interpretation).
>>>>
>>>> If we consider the alternatives to fixing attributes, the most obvious
>>>> would be to stick the constraint in the type/class - as we do with
>>>> document
>>>> and document-version. Either works, but you end up with a lot of type
>>>> proliferation. 'document-version<#>-at-****location<>-inEncoding<>-****
>>>> withEncryption<>'
>>>> is well defined relative to moving, encoding and encryption changes,
>>>> etc.
>>>> The alternative encoding is to fix the attributes. To me, the
>>>> interpretation
>>>> should be the same in both cases - a version is really a different kind
>>>> of
>>>> thing than a document even if we record it as document with a  fixed
>>>> content
>>>> attribute. (The statue and other examples make this clearer).
>>>>
>>>>
>>> I take a view that something may be a "version" of something else if it
>>> is
>>> asserted to be (*).  The important consequence of being such a "version"
>>> is
>>> that valid provenance assertions made with respect to these versions are
>>> permanent truths, and can they can be said to be about some aspect of the
>>> original resource.  Beyond that, why do we need to know what are the
>>> particular constraints for a particular "version"?
>>>
>>> I guess I'm trying to dodge the philosophical minefields about what
>>> constitutes identity.  I'm more concerned with what we need as a minimum
>>> to
>>> be able to record, exchange and do useful things with provenance
>>> information.
>>>
>>> It could be that I'm missing something important here, hence my original
>>> question being phrased as "what breaks?"
>>>
>>> ...
>>>
>>> You also raise what I see as a separate issue:  "a version is really a
>>> different kind of thing than a document".  In some senses, this is almost
>>> tautologically true, but from a perspective of ontologizing, I'm not sure
>>> it's useful.  Can versions have versions (I think so).  Then we are faced
>>> with a potentially infinite regress of types, or a type that can be
>>> reflexive (if that's an allowable use) with respect to the version
>>> relationship; i.e. a type that can be both range and domain of a "has
>>> version".  To me, the latter seems to be the simpler course, unless and
>>> until we find some essential functionality that is broken in such an
>>> approach.
>>>
>>> ...
>>>
>>> (*) of course, it may be of interest to others to understand what makes
>>> something a "version" of something else, and to understand the variant
>>> and
>>> invariant elements in detail.  I'm just asking if this needs to be part
>>> of
>>> the _provenance_ discussion, or if it can be treated separately.
>>>
>>> For example, OPMV avoids this whole issue by saying that the things to
>>> which provenance are applied are static [1].  This is enough for OPMV to
>>> be
>>> useful in a significant range of applications for provenance (I
>>> understand
>>> it is used in the current UK open gov data work).  I personally think
>>> that
>>> might be too strong a constraint, but if the price of relaxing that
>>> constraint is to wade into difficult philosophical territory, them I'm
>>> not
>>> so sure it's worth it.
>>>
>>> The fact that the things OPMV describes may be different versions of some
>>> underlying thing is simply not part of this particular ontology, and it
>>> seems to work OK so far.
>>>
>>> [1] http://open-biomed.**sourcefor**ge.net/opmv/ns.html#**sec-**
>>> specification <http://sourceforge.net/opmv/ns.html#**sec-specification><
>>> http://open-**biomed.sourceforge.net/opmv/**ns.html#sec-specification<http://open-biomed.sourceforge.net/opmv/ns.html#sec-specification>>-
>>> see sub-section on "Artifact"
>>>
>>> #g
>>> --
>>>
>>>
>>>
>>>  -----Original Message-----
>>>
>>>> From: public-prov-wg-request@w3.org [mailto:public-prov-wg-
>>>>> request@w3.org] On Behalf Of Graham Klyne
>>>>> Sent: Saturday, September 17, 2011 3:07 AM
>>>>> To: W3C provenance WG
>>>>> Subject: Issue 89 - why?
>>>>>
>>>>> I've been reading some of the discussion of Issue 89:
>>>>>
>>>>>    http://www.w3.org/2011/prov/****track/issues/89<http://www.w3.org/2011/prov/**track/issues/89>
>>>>> <http://www.w3.**org/2011/prov/track/issues/89<http://www.w3.org/2011/prov/track/issues/89>
>>>>> >
>>>>>
>>>>>
>>>>> which seems to my mind be getting rather like a counting of angels-on-
>>>>> pinheads, and I wonder if we're not in danger of over-ontologizing
>>>>> here.
>>>>>
>>>>> Going back to the original issue, I see:
>>>>>
>>>>> [[
>>>>> The conceptual model defines an entity in terms of an identifier and a
>>>>> list of
>>>>> attribute-value pairs. It is indeed crucial for the asserter to
>>>>> identify
>>>>> the
>>>>> attributes that have been frozen in a given entity.
>>>>> ]]
>>>>>
>>>>> Why is it so crucial to identify what attributes have been frozen?
>>>>>
>>>>> What practical application of provenance is prevented is we don't
>>>>> require
>>>>> this?
>>>>>
>>>>> #g
>>>>> --
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
Received on Monday, 19 September 2011 02:03:02 UTC