- From: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
- Date: Thu, 23 Feb 2012 13:51:46 +0000
- To: public-prov-wg@w3.org
Hi Graham, I am sorry, but I don't understand which document you have reviewed. http://dvcs.w3.org/hg/prov/raw-file/7aadc6332722/model/ProvenanceModel.html is WD3. What needed to be reviewed is: http://dvcs.w3.org/hg/prov/raw-file/default/model/working-copy/towards-wd4.html http://dvcs.w3.org/hg/prov/raw-file/default/model/working-copy/prov-dm-constraints.html http://dvcs.w3.org/hg/prov/raw-file/default/model/working-copy/prov-asn.html as indicated on http://www.w3.org/2011/prov/wiki/ProvDMWorkingDraft4 Regards, Luc On 02/23/2012 01:16 PM, Graham Klyne wrote: > Reviewing: > http://dvcs.w3.org/hg/prov/raw-file/7aadc6332722/model/ProvenanceModel.html > > > Summary: I'm sorry to say that I don't think the document even starts > to bring in the kind of simplification discussed at the F2F meeting, > which is required if this spec is to gain traction with web developers. > > I find the document is still difficult to read, and in a full morning > of reviewing it I've only got as far as section 5. I think further > *radical* simplification is required for the data model description, > and I think it's possible without losing any essential information > about the model. > > ... > > (Nit: when I load this document from a local copy of the repository, I > get an error reported indicating a problem with fetching the CSS. It > loads OK from the above URI. Is there a problematic relative URI > reference in the source document?) > > ... > > I thought we'd agreed at F2F to provide a simple "scruffy" > introduction to the DM (part 1), then introduce the requirement and > refinements for more formally tractable provenance expressions that > can be used to build accurate historical records over multiple related > artifacts (part 2). The document I'm reading does very little that I > can see to make the prov-dm more approachable, as was indicated that > we need to do at the F2F. As far as I can tell, the only thing that > has been in this direction is to *add* a new section on > interpretation. This, of itself, does nothing to simplify the DM > description. > > I think we should be placing far more emphasis on making it a simple > as we possibly can for information providers to publish provenance. > Consider that the primary beneficiaries of provenance information are > the *consumers* of published information, not the *publishers*, so if > we make life unnecessarily hard for publishers we're shooting > ourselves in the collective foot. From this, I think the initial > introduction to the DM needs to be radically simplified to the extent > that a developer can spend 10-15 minutes glancing at it and think "oh > yes, I can easily add this to my output data". If necessary, we push > some of the work of understanding what needs to be done to harmonize > the data to make it more suitable for building a historical record > towards the consumer. > > ... > > With this in mind: > > Section 2: > > The introductory material in section 2.1 is unhelpful, and I propose > it be removed from the introduction. Most of this material is not > important until we come to consider the more formal aspects of the > DM. With the exception of 2.1.2.1 about events, which I think should > be introduced in the PROV-DM core model section. Similarly sections > 2.2 and 2.3 (maybe moving the two introductory sentences of 2.2 into > section 2.4). Thus section 2 would become just a very brief intro to > the notation used for describing ASN, and maybe this could be moved > into the PROV-DM core section (sect 5). > > Section 3 looks generally useful. But it still mentions an "account > record", which I understood was being dropped. It also mentions > "alternateOf" and "specializationOf" which are not necessary for a > "scruffy" introduction to provenance, so I suggest mention of these is > dropped from here. I suggest dropping the sentence about core and > common relations - it's just noise. With the removal of accounts, I > think the whole purpose of notes/annotation records *as part of the > provenance model* has become moot, and suggest that these be dropped > from the spec. There's nothing to prevent annotations being added to > the provenance data as rdfs:comment or rdfs:label values. I suggest > dropping the mention of extensibility points: again, it's just noise > at this point. > > Section 4: to my mind, this example section adds no useful > information and doesn't help understanding of the (on account of being > harder to follow than the ASN model description), and suggest that it > be dropped. Alternatively, I suggest moving it to an appendix. > > Section 5: this is the vital core of this document. Section 3 > provides a very useful high-level overview, so this section can just > get down to describing the constructs. > > I note that ASN is mis-named: it's not really an *abstract* syntax > notation; it's quite concrete, so it's more like a > (technology-neutral) functional syntax notion. @@raise separate issue > for this? > > Section 5.1: prov-dm is a data model, not an implementation, right? > So why do we need to introduce "housekeeping constructs ... to > facilitate their interchange"? Suggest dropping most of the > discussion of "record container", and simply introduce the > "recordContainer" and "namespaceDeclaration" productions along with > production for "record". > > > Section 5.2.1: Entity record > > Suggest drop "In PROV-DM, " - it's redundant. > > Suggest the examples focus more on web documents, with "car" as more > of an afterthought. Primary use will probably be to describe web > documents, sop lets keep this at front-of-mind? > > Suggest dropping all mentions of "asserters viewpoint" and "situation > in the world" - these don't matter for the "scruffy" view of provenance. > > Suggest dropping the idea that the attributes somehow define the > entity ("whose situation in the world is represented by the > attribute-value pairs"). They're just there to provide information > about the entity, and as hooks for interoperability. (I argued > previously for dropping attributes completely, but was persuaded > otherwise by the interoperability argument from the provenance > challenges - don't try to make more of them.) > > Suggest drop issue mentioning "characterization interval" - I think > it's now a non-issue. > > I think the issue of uniqueness of identifiers should be dealt with in > the introduction to ASN, not under the individual elements. > > Under "further considerations", suggest dropping all but 3rd and 6th > bullets. In the 6th bullet, I don't understand the stuff about "a > namespace also declares the number of occurrences...". I have deep > concern about what this might be trying to say. In any case, > shouldn't this be covered under a description of the namespace, if > needed? > > I think the material about "activities" and "plans" really doesn't > belong in this section. > > > Section 5.2.2 Activity record > > Suggest drop "In PROV-DM, " - it's redundant. > > Didn't we discuss replacing the start, end times by events? I don't > recall the outcome - I'm just mentioning this in case it's been missed. > > For the example, I suggest leading on something to do with information > on the web. > > It was a surprise to me to learn that PROV-DM has reserved > attributes. If attributes are in the model to support > interoperability with other provenance frameworks (which is my > understanding from previous discussions), this feels like a poor > design choice. Maybe it should be a separate parameter? In any case, > I think the intent of this "subtyping" needs to be explained. > > If this is to be a "scruffy" introduction, I think the reference to > start-view-end is not needed here. In any case, the cross-reference > is almost impossible to locate in a printed copy of the spec. > > I think the issue of uniqueness of identifiers should be dealt with in > the introduction to ASN, not under the individual elements. > > Suggest dropping the "further considerations bullets." > > Did we not agree that activities *would* be allowable as entities > (especially if entities are just stuff that can identified).? > > > Section 5.2.3, Agent record > > Having introduced a framework for subtyping for activities, why not > use the same approach for different types of agents ... especially > considering that two major agent types are defined by reference to > existing foaf definitions? I suggest not asserting the claim that the > agent types are mutually exclusive. > > Suggest drop reference to "situation in the world". > > Suggest drop discussion of inferences of agent records - if needed, > they should come later along with a more formal ("non-scruffy") > treatment of the data model. > > > Section 5.2.4, Note record > > I think this should be dropped from the data model. I don't see that > it serves any needed *provenance* function. "extra information" can > be added by format-specific extensions. As such, this record type > only adds noise to the specification. > > > Section 5.3.1.1 generation record > > I believe the ASN syntax given verges on being ambiguous, and is > unnecessarily tricky to parse by a human or machine consumer; e.g. > consider: > > wasGeneratedBy(a,b) > wasGeneratedBy(a,b,) > > The presence of the trailing comma in the second example completely > changes the parse tree productions associated with a and b. I think > it would be much easier if ASN simply required a dummy activity > identifier to be provided; i.e. don't make aidentifier optional. > Indeed, rather than allowing optional identifiers anywhere in the ASN, > one might use a placeholder (e.g. '_') for any unspecified identifier, > which would make the overall syntax much more regular. > > Since the id is used only for annotations, I suggest dropping it (see > section 5.2.4 comment above). > > If this is to be a "scruffy" introduction, I think the reference to > generation-within-activity is not needed here. In any case, the > cross-reference is almost impossible to locate in a printed copy of > the spec. Suggest drop this. > > Similarly, suggest dropping the structural constraint here. > > > Section 5.3.1.2 Usage record > > Suggest drop "In PROV-DM, " - it's redundant. > > Why is there an identifier for a usage record? > > Suggest lead with example of consuming a web resource. > > Suggest drop reference to annotation record (see above note about 5.2.4) > > Suggest drop reference to interpretation here > > > Section 5.3.2.1 Association record > > Para 3: Suggest drop first sentence, and simplify; i.e. just say; > "Activities may reflect the execution of a plan..." > > Para 4, there quite a bit of redundancy redundancy here. Suggest: > [[ > A plan is the description of a set of actions or steps intended by one > or more agents to achieve some goal. PROV-DM is not prescriptive about > the nature of plans, their representation, the actions and steps they > consist of, and their intended goals. A plan can be a workflow for a > scientific experiment, a recipe for a cooking activity, or a list of > instructions for a micro-processor execution. Plans are entities, > which may have associated provenance. An activity may be associated > with multiple plans, allowing for descriptions of activities initially > associated with a plan, which was changed, on the fly, as the activity > progresses. Plans can be successfully executed or they can fail. We > expect applications to exploit PROV-DM extensibility mechanisms to > capture the rich nature of plans and associations between activities > and plans. > ]] > > Para 5: I see no value in cross-referencing the responsibility record > here. Suggest dropping this paragraph. > > Why is there an identifier for an association record? > > > Section 5.3.2.2 Start and End records > > This seems to overlap with start, end parameters on an activity. > It's not immediately clear how they play together. > > Should this record not describe an "event"? Then the id should > identify the start/end event, not the record. cf. Issue 207. > > Identifiers should denote activities and agents, *not records*. > > > Section 5.3.3.1 Responsibility record > > Suggest drop "To promote take-up... " and instead lead with a simple > introduction of what the record describes. > > Para 3: It seems to me that the responsibility record should stand > independently of any association record. Suggest drop "Given an > activity association record... (...)" > > Why is there an identifier for an responsibility record? > > > Section 5.3.3.2 Derivation record > > Suggest drop "In PROV-DM, " > > This whole section seems way to complicated. My understanding is that > the "Common relations" section is intended to cover those useful > short-cut expressions that can be expressed with less convenience in > the core model. As such, I think the derivation record should be a > "common" rather than a "core" relation. > > Aside from that, I really don't see the utility of all this stuff > about precise and imprecise derivations. I think there is just one > useful relation to define, roughly corresponding to "imprecise > n-derivation record" here: > > - I note that the "imprecise 1-derivation record" and "imprecise > n-derivation record" are not syntactically distingushable, so there's > no point in discussing the difference. > > - the "precise 1-derivation record" can be expressed using an > activity, usage and generation record: I'm not convinced this > alternative syntax is really buying anything worthwhile. > > Suggest radical simplification along these lines, and move to section > 6. Don't introduce all the formal stuff until a later section > handling more formal treatments. > > > Section 5.3.3.3 Alternate and Specialization records > > In considering a "scruffy" view of provenance, these relations aren't > really needed. However, they do underpin a more formal treatment in > the face of dynamic resources. > > I would give serious consideration to introducing these later, when > the more formal treatment of dynamic resources is considered. > > > Section 5.3.4. Annotation record > > I think this serves no needed purpose, and should be dropped. (See > earlier comments for section 5.2.4.) > > > Section 5.4.1 Account record > > I understood we'd agreed to drop this. > > > Section 5.4.2 Record container > > I think this is mainly an artifact of the ASN syntax, and should be > introduced more briefly in the introductory section 5.1 (see previous > comments) > > > Section 5.5.1 Attribute > > I think the "optional-attribute-value" productions covered in section > 5.2.1 (Entity) should be covered here since they apply to multiple > record types. > > I would prefer to see attribute names presented as being IRIs in the > data model, with the namespace-qualified CURIE syntax available as a > convenience in the ASN presentation. > > I think the predefined attribute names should be dealt with in a > separate section. I'm actually not convinced this is the best design > choice for properties with DM-defined meaning, as opposed to (say) > using separate record parameters, but that's more of a style issue > than a fundamental objection. > > As indicated earlier, I think the whole discussion of derivation steps > is too much detail, and I don't see the value, and would suggest > dropping the prov:steps attribute. > > For attribute prov:label: why not just use rdfs:label? > > > Section 5.5.2 Identifiers > > The text says they are *qualified* names, but in most of the example > they are not. Also, some identifiers are described as having local > scope: this is not compatible with using *qualified* names which are > essentially IRIs. > > The text describes identifiers as denoting *records* (e.g. entity > record) - I think this is wrong, and in any case is inconsistent with > text elsewhere in the document. They should demote "entity", > "activity", "agent", etc. > > > Section 5.5.3 Literal > > "A PROV-DM Literal represents a value whose interpretation is outside > the scope of PROV-DM." What a Terrible Failure... the whole point of > languages introducing literals is precvisely that their interpretation > *is* defined by the language. If not, they might as well be names. > > I think the intent is that their interpretation is defined by > reference to the corresponding xsd datatype definition, or some other > datatype definition, that is effectively incorporated by reference. > > I'd suggest that an interpretation of literals is provided by: > - http://www.w3.org/TR/rdf-mt/#gddenot > - http://www.w3.org/TR/rdf-mt/#DTYPEINTERP > > Section 5.5.4 Time > > No syntax production provided or indicated. > > I think it's unnecessary and inappropriate to indicate where time is > used. It's just something to go wrong as the document evolves. > > > Section 5.5.5 Asserter > > Do we really still need this (now accounts are gone). Suggest dropping. > > > Section 5.5.6 Namespace > > I'd suggest covering this with the introduction of the record > container syntax production > > > Section 5.5.7 Location > > Do we have any explicit use of this? if not, I'd suggest dropping it. > > ... > > I'm out of time and stopping my review here. There's a general > pattern here that I'd also apply to section 6. > > I'd then take section 7 and (probably) exp[and it into several > sections ("Part 2") introducing and describing a more formal treatment > of provenance that can be used to bridge from and refine the "scruffy" > view to something that can be assembled and processed according to > inferences that flow from the formal semantics. A key point to > introduce here would be that it is possible to create provenance > statements that cannot possibly satisfy the formal semantics, and to > indicate what additional constraints and disciplines should be applied > to ensure that they can (and hence to make the inferences that flow > from those semantics valid). > > #g > -- > > -- Professor Luc Moreau Electronics and Computer Science tel: +44 23 8059 4487 University of Southampton fax: +44 23 8059 2865 Southampton SO17 1BJ email: l.moreau@ecs.soton.ac.uk United Kingdom http://www.ecs.soton.ac.uk/~lavm
Received on Thursday, 23 February 2012 13:52:23 UTC