Re: Doppelgänger fallacy; Was Re: OA and provenance from Stian Soiland-Reyes on 2013-08-19 (public-openannotation@w3.org from August 2013)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Mon, 19 Aug 2013 10:53:44 +0100
To: Antoine Isaac <aisaac@few.vu.nl>
Cc: public-openannotation <public-openannotation@w3.org>
Message-ID: <CAPRnXtm8vwpS_6+9j-ifov1e=eQu=a9-6BVLpBz2m5zpZcz+Lw@mail.gmail.com>
This principle would indicate that 'serializedBy' should be rather
expressed on the RDF resource or graph itself - which should (at least
in usecases such as Darwin annotations) have a distinct URI from the
oa:Annotation.

e.g.

GET http://example.com/annotations/1337

Content-Type: text/turtle

# @prefix ...
<>
    oa:serializedBy <fred> ;
    foaf:primaryTopic <#ann> .

<#ann> a oa:Annotation ;
   oa:annotatedBy <the-original-annotator> .


the philosophy of PROV is to follow this to the letter, anything with
a separate provenance is a separate prov:Entity. In PAV, however, we
saw the need for 'shortcuts' to avoid having to unchain a potentially
very complex history to find actors like the authors and the digital
creator. I see oa:*By as similar kind of shortcuts - but we might not
have specified strongly what we mean by 'serializing' and 'annotating'
- hence two different implementations might be describing different
actors for the same scenario.



On 17 August 2013 22:34, Antoine Isaac <aisaac@few.vu.nl> wrote:
> Hi Bob,
>
> That's a great name :-)
>
> In the Dublin Core community (and several others, I expect) the principle
> that one resource should not be used as a vector for describing two entities
> is often refered to as the "one-to-one principle":
> http://wiki.dublincore.org/index.php/Glossary/One-to-One_Principle
>
> Cheers,
>
> Antoine
>
>
>
>
>> Antoine-
>>
>> I take no position on whether Paolo is indeed introducing a confusion
>> between an annotation and the digital representation of the
>> annotation. But my experience has been that such confusion is very
>> common and not just about annotations, but, as you  point out, about
>> any kind of resource. The most typical case I find among natural
>> scientists is eagerness to assign the same identifier to a physical
>> object as to a digital description of it.  Paolo himself tends to warn
>> about this in talks about annotation,  using the Eiffel Tower as an
>> example.  It's a natural thing for humans to do, because in human
>> dialogue, the context  often makes clear which is under discussion.
>> But when the context doesn't, confusion ensues.  One venue where
>> confusion is \likely/ is when informaticians are discussing a case in
>> which both require treatment.
>>
>> I think the phenomenon is so pervasive that we need a short, memorable
>> name for it, especially one I can use to bludgeon my natural science
>> colleagues who think it's \helpful/ to have a single identifier for a
>> physical thing and its digital description.
>>
>> I propose to call what you remark upon  a "doppelgänger fallacy."
>>
>> I especially relish the prospect of hopping up in a talk and saying
>> "Paolo! You, of all people, have introduced a doppelgänger fallacy on
>> slide 23!"  :-)
>>
>> Bob Morris
>>
>>
>> On Fri, Aug 16, 2013 at 8:42 AM, Antoine Isaac <aisaac@few.vu.nl> wrote:
>>>
>>> Hi Paolo,
>>>
>>> I agree that the decisions on scenario 1-2-3 are entirely up to you. And
>>> the
>>> more fundamental decisions on having one annotation or one annotation and
>>> something else (another annotation or a PROV entity) also. But as long as
>>> your short-cuts have a sound grounding! If others can't understand
>>> clearly
>>> what you've done then it's a big problem ;-)
>>>
>>> As for the mappings (pav:authoredBy is a sub-property of oa:annotatedBy,
>>> pav:curatedBy a sub-property of oa:annotatedBy) I agree with you. I used
>>> "sub-property" but I meant it in a "local" context. I.e for all triples
>>> in
>>> your annotation case, you should generate another triple of the more
>>> general
>>> property. I think what you suggest is ok.
>>>
>>>
>>> As for pav:createdBy, sentence like:
>>>
>>> "In pav:createdBy is used for the digital artifact only. While
>>> pav:authoredBy, pav:curatedBy and so on...  are for the content of the
>>> artifact. So you can have both pav:createdBy (person that created the
>>> artifact) and pav:curatedBy (person that collected and curated its
>>> content)."
>>> and:
>>>
>>> "pav:createdBy is more specific than dct:creator as it refers only to the
>>> digital artifact."
>>> are really confusing.
>>> In RDF is one resource that is the object of the statements, and it
>>> denotes
>>> ONE entity. You can't have two properties applied to one resource, and
>>> sometimes the resource should be interpreted as the annotation, and some
>>> other times as the digital representation of the annotation.
>>>
>>> If you want to use createdBy as a shortcut, and still subject it to the
>>> oa:Annotation resource, then your definition should rather be :
>>> "pav:createdBy is used for the creator of digital artifact that
>>> represents
>>> the resource". And then the name should be changed because it doesn't
>>> reflect the short-cut at all. It should be something like
>>> pav:hasDigitalRepresentationCreatedBy...
>>>
>>> If you want to use createdBy on a resource that is a digital
>>> representation
>>> of the oa:Annotation (so not the oa:Annotation itself) then the wording
>>> is
>>> better. But in this case, well, you can't subject it to the resource you
>>> wanted to subject it on. And there's no point in minting a property that
>>> is
>>> not dc:creator... (or course dc:creator can be applied to any resource,
>>> be
>>> it a conceptual one of a representation).
>>>
>>> Cheers,
>>>
>>> Antoine
>>>
>>>
>>>> Hi Antoine,
>>>>
>>>>
>>>> On Fri, Aug 16, 2013 at 7:12 AM, Antoine Isaac <aisaac@few.vu.nl
>>>> <mailto:aisaac@few.vu.nl>> wrote:
>>>>
>>>>      Hi,
>>>>
>>>>      Wow, the return of a serious discussion, in an even more complex
>>>> form,
>>>> awesome ;-)
>>>>
>>>>      First a remark on Stian's comment:
>>>>
>>>>      "and the conclusion seemed to have been that it is simpler to merge
>>>> the conceptual annotation with the formalized annotation as a
>>>>      datastructure."
>>>>
>>>>      Yes, and this was about the data structure only. The annotation is
>>>> really of conceptual nature. We just allow for attributes (e.g.
>>>> oa:serializedBy) that shortcut some provenance info. A full, correct
>>>> representation has the serialization appear as a fully-fledged (PROV)
>>>> entity, distinct from the oa:Annotation, as pictured at
>>>>
>>>> http://www.openannotation.org/__spec/core/appendices.html#__ProvMapping
>>>> <http://www.openannotation.org/spec/core/appendices.html#ProvMapping>
>>>>
>>>>
>>>>      Based on this indeed pav:authoredBy is a sub-property of
>>>> oa:annotatedBy (or an equivalent, in the specific context).
>>>>
>>>>
>>>> I would say it is more an equivalent as pav:authoredBy is not only for
>>>> annotations.
>>>>
>>>>
>>>>      The question next is how to handle the extra level of "digital
>>>> annotation" - the guy who captures the annotation in the system (I'll
>>>> just
>>>> focus on the "creator" aspect, the discussion is long enough, let's
>>>> ignore
>>>> "with" "at" and "on").
>>>>
>>>>      I like Jacco's and Stian's suggestions of double annotations
>>>> (whether
>>>> one is the target or the body or the other...). It is complex, but it
>>>> represents the situation quite well. In this case both are annotators.
>>>>
>>>>
>>>> This would complicate the implementation though. In some sense, I see
>>>> the
>>>> "extra level of "digital annotation"" more as extra level of provenance
>>>> so I
>>>> can stay 'compact'.
>>>> We had a similar issue with Claims representation. You have the
>>>> conceptual
>>>> Claim and then multiple embodiment of that claim in text. It is a tough
>>>> problem.
>>>>
>>>> But sure, you could think of both of them as annotators. Not sure Darwin
>>>> would like that but I cannot talk for him :)
>>>>
>>>>      An alternative is to create one annotation (oa:annotatedBy Darwin)
>>>> and
>>>> another non-annotation resource. Something similar to the PROV entity we
>>>> have for the serialization. It would represent the act of capturing a
>>>> annotation in the system, where the student plays the creator role.
>>>>
>>>>      In any case, that's two resources.
>>>>
>>>>      But as for the serialization case, you may want to have only one
>>>> resource in a 'core' solution. Two options here:
>>>>
>>>>      1. Consider that the oa:Annotation is the result of the
>>>> intellectual
>>>> work of both Darwin and the student. In this case both are the object of
>>>> an
>>>> oa:annotatedBy. I think this choice is borderline, but in a specific
>>>> application context, where students spend hard work
>>>> deciphering/interpreting
>>>> an annotation, why not?
>>>>      If you want to use pav:curatedBy still, then you would need to have
>>>> it
>>>> a sub-property of oa:annotatedBy
>>>>
>>>>
>>>> We cannot really do that as pav:curatedBy is also used for objects that
>>>> are not annotations.
>>>> What I could think of doing now is:
>>>>
>>>> <ann1>
>>>>         oa:annotatedBy <Darwin>
>>>>         oa:annotatedBy <Student>
>>>>         pav:authoredBy <Darwin>
>>>>         pav:curatedBy <Student>
>>>>
>>>> It is redundant but that way the semantics is clear for both OA and PAV
>>>> and it allows OA clients to get to the provenance.
>>>> What do you think of it?
>>>>
>>>>
>>>>      2. Consider that the role of the student is minor. In this case, I
>>>> think a property with a name like pav:curatedBy still makes sense. But
>>>> it
>>>> would be a specialization of something more general, maybe
>>>> dc:contributor.
>>>> And its semantic would in fact be the one of "short-cut" for the more
>>>> complex situation where a second annotation (or a PROV entity) exist to
>>>> represent the situation at the right granularity.
>>>>
>>>>
>>>> In PAV most of the properties are short-cuts. The idea is to have a
>>>> single
>>>> object rather than a series of them. It does not solve everything, but
>>>> it
>>>> works for many use cases.
>>>> At the moment pav:curatedBy is sub-property of prov:wasAttributedTo and
>>>> also dct:contributor. So I think we are on the same page.
>>>>
>>>>
>>>>      3. Consider that the role of Darwin is minor (very borderline
>>>> maybe).
>>>> In this case the student is the oa:annotatedBy, and Darwin a mere
>>>> dc:contributor.
>>>>
>>>>
>>>> Frankly I feel uncomfortable with this approach. But it is true, it
>>>> depends on how you intend the annotation.
>>>>
>>>>
>>>>      In any case I don't think you can do anything practical with a
>>>> solution that would only have one resource of type oa:Annotation and a
>>>> short-cut property with a name like pav:createdBy. The name and
>>>> intuitive
>>>> semantics are really too close to dc:creator and oa:annotatedBy (as the
>>>> creator of the Annotation)! In fact pav:curatedBy is much better, which
>>>> is
>>>> why I think it could be defined as a short-cut in option 2 above.
>>>>
>>>>
>>>> In pav:createdBy is used for the digital artifact only. While
>>>> pav:authoredBy, pav:curatedBy and so on...  are for the content of the
>>>> artifact.
>>>> So you can have both pav:createdBy (person that created the artifact)
>>>> and
>>>> pav:curatedBy (person that collected and curated its content).
>>>>
>>>>
>>>>      Note that PAV mentions dct:createdBy as the super-property of
>>>> pav:createdBy, which to my knowledge does not exist. In fact I really
>>>> believe PAV would benefit from removing pav:createdBy. If you need it,
>>>> re-introduce it with a better name, and clearer semantics!
>>>>
>>>>
>>>> That is just a typo in the description it is dct:creator. pav:createdBy
>>>> is
>>>> more specific than dct:creator as it refers only to the digital
>>>> artifact.
>>>>
>>>> Best,
>>>> Paolo
>>>>
>>>
>
>



-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester
http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718
Received on Monday, 19 August 2013 09:54:37 UTC