PROV-ISSUE-195: Section 5.3.3.3 (PROV-DM as on Dec 5) from Provenance Working Group Issue Tracker on 2011-12-07 (public-prov-wg@w3.org from December 2011)

From: Provenance Working Group Issue Tracker <sysbot+tracker@w3.org>
Date: Wed, 07 Dec 2011 02:14:22 +0000
To: public-prov-wg@w3.org
Message-Id: <E1RY71a-0007Mh-R1@tibor.w3.org>

PROV-ISSUE-195: Section 5.3.3.3 (PROV-DM as on Dec 5)

http://www.w3.org/2011/prov/track/issues/195

Raised by: Satya Sahoo
On product:

Hi,
The following are my comments for Section 5.3.3.3 of the PROV-DM (as on Dec 5):

5.3.3.3 Complementarity Record
1. "A complementarity record is a relationship between two entities..."

Comment: Is the complementarity record a relation between two entity records or entities. As I mentioned earlier, there is a distinction between the entity and assertions about the entity (or entity records), especially in case of description logic, OWL, and RDF. Hence, the characterizations of entities are records or views or assertions about the entity and are not the same as the entity.

2. "This intuition is made more precise by considering the entities that form the representations of entities at a certain point in time. An entity record represents, by means of attribute-value pairs, a thing and its situation in the world, which remain constant over a characterization interval."

Comment: The current grammar for entity record do not include any notion of "characterization interval" - is it event or time instants?

3. It is very hard to understand what Figure 3 conveys without an accompanying description.

4. Suppose entity records A and B share a set P of attributes, and each of them has other attributes in addition to P. If the values assigned to each attribute in P are compatible between A and B, then we say that A is-complement-of B, and B is-complement-of A, in a symmetrical fashion.

Comment: This is a very loosely worded constraint with too many implicit assumptions that are beyond any Web application to interpret consistently and it can be easily demonstrated that it trivially holds for any arbitrary set of entities, which was not the original intention I believe.
For example, if we consider the following two assertions on their own
entity(rs_m1,[ex:membership=250, ex:year=1900])
entity(rs_m2,[ex:membership=300, ex:year=1945])

What prevents from asserter A to create another record entity(rs_m1, [name="County Cricket Club"]) and asserter B to create record entity (rs_m2, [speaker of the house = "ABC"])? Then, together the four entity records can be used to assert wasComplementOf(rs_m1, rs_m2), which does not make any sense? There is no correlation between the identifiers being used to assert the different entity records. How is a user or provenance application supposed to know when to assert complement of relation between two entity records?

In data integration, there is a notion of "reference reconciliation" that uniquely identifies entities based on their attribute-value pairs [1]. The current state-of-the-art reference reconciliation algorithms are highly complex multi-step approaches, including machine learning approaches - how is a provenance application supposed to implement reference reconciliation for the current complementOf property defined in the DM?

[1] http://dl.acm.org/citation.cfm?id=1066168

5. "An assertion "wasComplementOf(B,A)" holds over the temporal intersection of A and B, only if:
* if a mapping can be established from an attribute X of entity record identified by B to an attribute Y of entity record identified by A, then the values of A and B must be consistent with that mapping;
* entity record identified by B has some attribute that entity record identified by A does not have.

Comment: Similar as above comment, how is this constraint practical when there is no easy mechanism available for reference reconciliation?

Thanks.

Best,
Satya

Received on Wednesday, 7 December 2011 02:14:29 UTC