PROV-ISSUE-405 (Feedback_AI): Feedback on the mapping from Antoine Isaac (DC community) [Mapping PROV-O to Dublin Core] from Provenance Working Group Issue Tracker on 2012-06-09 (public-prov-wg@w3.org from June 2012)

From: Provenance Working Group Issue Tracker <sysbot+tracker@w3.org>
Date: Sat, 09 Jun 2012 19:14:15 +0000
To: public-prov-wg@w3.org
Message-Id: <E1SdR71-0003ve-LO@tibor.w3.org>

PROV-ISSUE-405 (Feedback_AI): Feedback on the mapping from Antoine Isaac (DC community) [Mapping PROV-O to Dublin Core]

http://www.w3.org/2011/prov/track/issues/405

Raised by: Daniel Garijo
On product: Mapping PROV-O to Dublin Core

====== PROV reference

It seems that you are using really "fresh" documents from the PROV working group. E.g. the property prov:generatedAtTime can be found in
http://dvcs.w3.org/hg/prov/raw-file/default/ontology/Overview.html
but not in the latest official working draft
http://www.w3.org/TR/prov-o/
Putting the reference to the latest draft in your docs could be handy!

====== Dublin Core as a Simple Provenance Vocabulary

I'm uncomfortable with the strict categorization of elements into "descriptive" and "provenance" metadata. Some elements are questionable to belong to one or the other. You've addressed already many doubts, but maybe you should acknowledge that you categorization is not "hard" or if it is, give more rationale for the questionable elements...
My personal list:
- hasPart, isPartOf: Perhaps isPartOf has indeed often a provenance flavor, especially when it's used from one element of a collection to that collection. But I'd argue many of their uses can be descriptive, especially hasPart. Unless you consider a mereological description of objects (typical example of a car having wheels) to be always about provenance?
- conformsTo, rights and accessRights may reflect provenance info (though it is "derived")
- accrual properties: I wonder whether all should be in (accrualPolicy seems interesting for provenance) or out (accrualMethod could be questioned). But a mixed position seems strange.

By the way method-wise, should there be strict correspondence between the elements in the "provenance" category and the ones that are mapped to a PROV element in the direct mapping?
What does it say on an a given element, if it's in the "provenance" category but is not mapped to PROV?

Other comment:
[
It can be questioned if a resource changes by being published, however, we consider the publication as an action that affects the state of the resource and therefore it is relevant for the provenance.
]
-> if provenance is about "where does an object come from", then this one is a no-brainer!

====== Basic considerations

[
if a specialization of a document is generated by one activity and a specialization is used by a different activity later in time,
]
-> What does "specialization" mean, in practice? I know that it is a notion from PROV, but the word is highly ambiguous, a Primer would benefit from some (short) explanation here.

By the way yourself are using "specialization" for something else (the extension of PROV for handling DC "nuances").

====== What is ex:doc1?

[
it is semantically incorrect to have several activities that all generate the same entity at different points in time.
]
-> Please cite the PROV context explicitly here!
Many people (I'd expect most) will gladly accept that several activities contribute to the realization of one same resource. Even in a FRBR or CIDOC-CRM context, which are already seen as (too) fine-grained models by many.
By the way, I think later you try indeed to relate to simpler approaches, so that must mean you thing it is *not* semantically incorrect ;-)

====== Direct mappings

dct:date rdfs:subPropertyOf prov:generatedAtTime .
seems dubious. dct:valid is a sub-property of dct:date, which means that it is also a sub-property of prov:generatedAtTime. You correctly represent this in the mapping document, btw. But I'm quite sure this relation does not hold in absolute.

dct:rightsHolder rdfs:subPropertyOf prov:wasAttributedTo .
This also seems strange at first sight. Looking at the definition for dct:rightsHolder:
"A person or organization owning or managing rights over the resource." This may include some institution who manages/stores a resource on behalf of its creator, or anyone who "owns" the resource.
I think is compatible with PROV's super-vague meaning of attribution ("Attribution is the ascribing of an entity to an agent.", http://www.w3.org/TR/prov-dm/). But that's quite a stretch from what many Dublin Core readers will understand for "attribution". Perhaps you could give some explanation!

======= PROV Specializations (and rationale for complex mappings)

The constructs introduced and their mapping to PROV seem ok.
But I think you could say one sentence about the rationale of these specializations. I understand the need to "properly reflect the meaning of the Dublin Core terms". Yet, do we need to go for a solution that result in having the complexity of patterns of PROV next to the semantic distinctions made in DC? We could as well just keep the granularity of DC, in terms of patterns. I.e., using the simple mappings between DC properties and the related "short-cut properties" in the PROV patterns (e.g., prov:wasAttributedTo).

This of course relates for the rationale for having complex mappings in the first step. There are several options that PROV offers, in terms of granularity. Especially, having more or less fine distinctions for linking agents to entities. For a same "creation data" PROV can represent direct links between persons and created resource (prov:wasAttributedTo), links between persons and resources via Activity (prov:wasAssociatedWith) and links between persons and Activity via Roles.

Having all of these levels of granularity at once is probably more harmful than beneficial, given the complexity of the PROV pattern in general (especially with "specializations"!). Or are the complex mappings just an *option* you provide? If yes, a small paragraph elaborating on this would be useful for your primer. In fact, it may be enough to gather some sentences you already have scattered in different sections.

======= Complex mappings, Stage 1

[
A lot of blank nodes are created, however, keep in mind that we envision a second stage that relates them and provides stable URIs for the entities.
]
-> Everyone won't be ready to create and maintain URIs for all the entity/activity/role splitting in the PROV pattern, certainly. What is the application scenario for this? I guess it would depend. So maybe at this stage it's safer to say that some applications would create URIs, some would keep to blank nodes. And of course many others won't use the more complex mappings.

Other comments:

- I don't get why you opted for a simpler mapping pattern for "Entity/Entity (How)". It's quite equivalent to the sub-property mappings you have in the "Direct mappings" sections. According to the PROV model, for a simple "version" link you can create one or several creation activities, as well as half a dozen of "in" and "out" views/specializations of the document, which play each a different role in these activities.
I understand you would want a simple mapping (so do I) but in this Primer perhaps you should make a bit clearer reference on why you made that choice here, as opposed to the more complex mappings that are listed before this one.

- Is Prov:Entity provided with any specific semantics? If not, then perhaps you can remove the explicit rdf:type that links to it. That would make the example graphs simpler.

====== Conflating PROV specializations

I understand that the stage 2 of the complex mapping will "merge" a lot of the "ins" and "outs" nodes of individual activities. This should already a progress compared to the extreme atomization that is currently carried out. I'm looking forward to seeing the details!

However, it seems this will still result in one entity being specialized into at least as many "versions" as there will be activities. I expect many in our community will just hate having that. In fact that could be smartly related to modeling distinctions such as the ones made in FRBR.
But then (or even without it) we run into the kind of problems denounced here: http://blogs.ecs.soton.ac.uk/webteam/2010/09/02/the-modeler/ ;-)

In this respect, it would be a good idea to at least make these specialization distinctions *optional*. Is it really not possible to have several activities carried out on a single instance of entity, say, the ex:doc1 in your example?

======= [end]

Received on Saturday, 9 June 2012 19:14:23 UTC