- From: Graham Klyne <GK@ninebynine.org>
- Date: Thu, 23 Feb 2012 18:21:31 +0000
- To: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
- CC: public-prov-wg@w3.org
On 23/02/2012 13:51, Luc Moreau wrote: > Hi Graham, > > I am sorry, but I don't understand which document you have reviewed. > http://dvcs.w3.org/hg/prov/raw-file/7aadc6332722/model/ProvenanceModel.html > is WD3. Yeah, I realize that now... unfortunately it had WD4 in its title :-o #g -- > What needed to be reviewed is: > http://dvcs.w3.org/hg/prov/raw-file/default/model/working-copy/towards-wd4.html > http://dvcs.w3.org/hg/prov/raw-file/default/model/working-copy/prov-dm-constraints.html > > http://dvcs.w3.org/hg/prov/raw-file/default/model/working-copy/prov-asn.html > > as indicated on http://www.w3.org/2011/prov/wiki/ProvDMWorkingDraft4 > > Regards, > Luc > > > > On 02/23/2012 01:16 PM, Graham Klyne wrote: >> Reviewing: >> http://dvcs.w3.org/hg/prov/raw-file/7aadc6332722/model/ProvenanceModel.html >> >> Summary: I'm sorry to say that I don't think the document even starts to bring >> in the kind of simplification discussed at the F2F meeting, which is required >> if this spec is to gain traction with web developers. >> >> I find the document is still difficult to read, and in a full morning of >> reviewing it I've only got as far as section 5. I think further *radical* >> simplification is required for the data model description, and I think it's >> possible without losing any essential information about the model. >> >> ... >> >> (Nit: when I load this document from a local copy of the repository, I get an >> error reported indicating a problem with fetching the CSS. It loads OK from >> the above URI. Is there a problematic relative URI reference in the source >> document?) >> >> ... >> >> I thought we'd agreed at F2F to provide a simple "scruffy" introduction to the >> DM (part 1), then introduce the requirement and refinements for more formally >> tractable provenance expressions that can be used to build accurate historical >> records over multiple related artifacts (part 2). The document I'm reading >> does very little that I can see to make the prov-dm more approachable, as was >> indicated that we need to do at the F2F. As far as I can tell, the only thing >> that has been in this direction is to *add* a new section on interpretation. >> This, of itself, does nothing to simplify the DM description. >> >> I think we should be placing far more emphasis on making it a simple as we >> possibly can for information providers to publish provenance. Consider that >> the primary beneficiaries of provenance information are the *consumers* of >> published information, not the *publishers*, so if we make life unnecessarily >> hard for publishers we're shooting ourselves in the collective foot. From >> this, I think the initial introduction to the DM needs to be radically >> simplified to the extent that a developer can spend 10-15 minutes glancing at >> it and think "oh yes, I can easily add this to my output data". If necessary, >> we push some of the work of understanding what needs to be done to harmonize >> the data to make it more suitable for building a historical record towards the >> consumer. >> >> ... >> >> With this in mind: >> >> Section 2: >> >> The introductory material in section 2.1 is unhelpful, and I propose it be >> removed from the introduction. Most of this material is not important until we >> come to consider the more formal aspects of the DM. With the exception of >> 2.1.2.1 about events, which I think should be introduced in the PROV-DM core >> model section. Similarly sections 2.2 and 2.3 (maybe moving the two >> introductory sentences of 2.2 into section 2.4). Thus section 2 would become >> just a very brief intro to the notation used for describing ASN, and maybe >> this could be moved into the PROV-DM core section (sect 5). >> >> Section 3 looks generally useful. But it still mentions an "account record", >> which I understood was being dropped. It also mentions "alternateOf" and >> "specializationOf" which are not necessary for a "scruffy" introduction to >> provenance, so I suggest mention of these is dropped from here. I suggest >> dropping the sentence about core and common relations - it's just noise. With >> the removal of accounts, I think the whole purpose of notes/annotation records >> *as part of the provenance model* has become moot, and suggest that these be >> dropped from the spec. There's nothing to prevent annotations being added to >> the provenance data as rdfs:comment or rdfs:label values. I suggest dropping >> the mention of extensibility points: again, it's just noise at this point. >> >> Section 4: to my mind, this example section adds no useful information and >> doesn't help understanding of the (on account of being harder to follow than >> the ASN model description), and suggest that it be dropped. Alternatively, I >> suggest moving it to an appendix. >> >> Section 5: this is the vital core of this document. Section 3 provides a very >> useful high-level overview, so this section can just get down to describing >> the constructs. >> >> I note that ASN is mis-named: it's not really an *abstract* syntax notation; >> it's quite concrete, so it's more like a (technology-neutral) functional >> syntax notion. @@raise separate issue for this? >> >> Section 5.1: prov-dm is a data model, not an implementation, right? So why do >> we need to introduce "housekeeping constructs ... to facilitate their >> interchange"? Suggest dropping most of the discussion of "record container", >> and simply introduce the "recordContainer" and "namespaceDeclaration" >> productions along with production for "record". >> >> >> Section 5.2.1: Entity record >> >> Suggest drop "In PROV-DM, " - it's redundant. >> >> Suggest the examples focus more on web documents, with "car" as more of an >> afterthought. Primary use will probably be to describe web documents, sop lets >> keep this at front-of-mind? >> >> Suggest dropping all mentions of "asserters viewpoint" and "situation in the >> world" - these don't matter for the "scruffy" view of provenance. >> >> Suggest dropping the idea that the attributes somehow define the entity >> ("whose situation in the world is represented by the attribute-value pairs"). >> They're just there to provide information about the entity, and as hooks for >> interoperability. (I argued previously for dropping attributes completely, but >> was persuaded otherwise by the interoperability argument from the provenance >> challenges - don't try to make more of them.) >> >> Suggest drop issue mentioning "characterization interval" - I think it's now a >> non-issue. >> >> I think the issue of uniqueness of identifiers should be dealt with in the >> introduction to ASN, not under the individual elements. >> >> Under "further considerations", suggest dropping all but 3rd and 6th bullets. >> In the 6th bullet, I don't understand the stuff about "a namespace also >> declares the number of occurrences...". I have deep concern about what this >> might be trying to say. In any case, shouldn't this be covered under a >> description of the namespace, if needed? >> >> I think the material about "activities" and "plans" really doesn't belong in >> this section. >> >> >> Section 5.2.2 Activity record >> >> Suggest drop "In PROV-DM, " - it's redundant. >> >> Didn't we discuss replacing the start, end times by events? I don't recall the >> outcome - I'm just mentioning this in case it's been missed. >> >> For the example, I suggest leading on something to do with information on the >> web. >> >> It was a surprise to me to learn that PROV-DM has reserved attributes. If >> attributes are in the model to support interoperability with other provenance >> frameworks (which is my understanding from previous discussions), this feels >> like a poor design choice. Maybe it should be a separate parameter? In any >> case, I think the intent of this "subtyping" needs to be explained. >> >> If this is to be a "scruffy" introduction, I think the reference to >> start-view-end is not needed here. In any case, the cross-reference is almost >> impossible to locate in a printed copy of the spec. >> >> I think the issue of uniqueness of identifiers should be dealt with in the >> introduction to ASN, not under the individual elements. >> >> Suggest dropping the "further considerations bullets." >> >> Did we not agree that activities *would* be allowable as entities (especially >> if entities are just stuff that can identified).? >> >> >> Section 5.2.3, Agent record >> >> Having introduced a framework for subtyping for activities, why not use the >> same approach for different types of agents ... especially considering that >> two major agent types are defined by reference to existing foaf definitions? I >> suggest not asserting the claim that the agent types are mutually exclusive. >> >> Suggest drop reference to "situation in the world". >> >> Suggest drop discussion of inferences of agent records - if needed, they >> should come later along with a more formal ("non-scruffy") treatment of the >> data model. >> >> >> Section 5.2.4, Note record >> >> I think this should be dropped from the data model. I don't see that it serves >> any needed *provenance* function. "extra information" can be added by >> format-specific extensions. As such, this record type only adds noise to the >> specification. >> >> >> Section 5.3.1.1 generation record >> >> I believe the ASN syntax given verges on being ambiguous, and is unnecessarily >> tricky to parse by a human or machine consumer; e.g. consider: >> >> wasGeneratedBy(a,b) >> wasGeneratedBy(a,b,) >> >> The presence of the trailing comma in the second example completely changes >> the parse tree productions associated with a and b. I think it would be much >> easier if ASN simply required a dummy activity identifier to be provided; i.e. >> don't make aidentifier optional. Indeed, rather than allowing optional >> identifiers anywhere in the ASN, one might use a placeholder (e.g. '_') for >> any unspecified identifier, which would make the overall syntax much more >> regular. >> >> Since the id is used only for annotations, I suggest dropping it (see section >> 5.2.4 comment above). >> >> If this is to be a "scruffy" introduction, I think the reference to >> generation-within-activity is not needed here. In any case, the >> cross-reference is almost impossible to locate in a printed copy of the spec. >> Suggest drop this. >> >> Similarly, suggest dropping the structural constraint here. >> >> >> Section 5.3.1.2 Usage record >> >> Suggest drop "In PROV-DM, " - it's redundant. >> >> Why is there an identifier for a usage record? >> >> Suggest lead with example of consuming a web resource. >> >> Suggest drop reference to annotation record (see above note about 5.2.4) >> >> Suggest drop reference to interpretation here >> >> >> Section 5.3.2.1 Association record >> >> Para 3: Suggest drop first sentence, and simplify; i.e. just say; "Activities >> may reflect the execution of a plan..." >> >> Para 4, there quite a bit of redundancy redundancy here. Suggest: >> [[ >> A plan is the description of a set of actions or steps intended by one or more >> agents to achieve some goal. PROV-DM is not prescriptive about the nature of >> plans, their representation, the actions and steps they consist of, and their >> intended goals. A plan can be a workflow for a scientific experiment, a recipe >> for a cooking activity, or a list of instructions for a micro-processor >> execution. Plans are entities, which may have associated provenance. An >> activity may be associated with multiple plans, allowing for descriptions of >> activities initially associated with a plan, which was changed, on the fly, as >> the activity progresses. Plans can be successfully executed or they can fail. >> We expect applications to exploit PROV-DM extensibility mechanisms to capture >> the rich nature of plans and associations between activities and plans. >> ]] >> >> Para 5: I see no value in cross-referencing the responsibility record here. >> Suggest dropping this paragraph. >> >> Why is there an identifier for an association record? >> >> >> Section 5.3.2.2 Start and End records >> >> This seems to overlap with start, end parameters on an activity. It's not >> immediately clear how they play together. >> >> Should this record not describe an "event"? Then the id should identify the >> start/end event, not the record. cf. Issue 207. >> >> Identifiers should denote activities and agents, *not records*. >> >> >> Section 5.3.3.1 Responsibility record >> >> Suggest drop "To promote take-up... " and instead lead with a simple >> introduction of what the record describes. >> >> Para 3: It seems to me that the responsibility record should stand >> independently of any association record. Suggest drop "Given an activity >> association record... (...)" >> >> Why is there an identifier for an responsibility record? >> >> >> Section 5.3.3.2 Derivation record >> >> Suggest drop "In PROV-DM, " >> >> This whole section seems way to complicated. My understanding is that the >> "Common relations" section is intended to cover those useful short-cut >> expressions that can be expressed with less convenience in the core model. As >> such, I think the derivation record should be a "common" rather than a "core" >> relation. >> >> Aside from that, I really don't see the utility of all this stuff about >> precise and imprecise derivations. I think there is just one useful relation >> to define, roughly corresponding to "imprecise n-derivation record" here: >> >> - I note that the "imprecise 1-derivation record" and "imprecise n-derivation >> record" are not syntactically distingushable, so there's no point in >> discussing the difference. >> >> - the "precise 1-derivation record" can be expressed using an activity, usage >> and generation record: I'm not convinced this alternative syntax is really >> buying anything worthwhile. >> >> Suggest radical simplification along these lines, and move to section 6. Don't >> introduce all the formal stuff until a later section handling more formal >> treatments. >> >> >> Section 5.3.3.3 Alternate and Specialization records >> >> In considering a "scruffy" view of provenance, these relations aren't really >> needed. However, they do underpin a more formal treatment in the face of >> dynamic resources. >> >> I would give serious consideration to introducing these later, when the more >> formal treatment of dynamic resources is considered. >> >> >> Section 5.3.4. Annotation record >> >> I think this serves no needed purpose, and should be dropped. (See earlier >> comments for section 5.2.4.) >> >> >> Section 5.4.1 Account record >> >> I understood we'd agreed to drop this. >> >> >> Section 5.4.2 Record container >> >> I think this is mainly an artifact of the ASN syntax, and should be introduced >> more briefly in the introductory section 5.1 (see previous comments) >> >> >> Section 5.5.1 Attribute >> >> I think the "optional-attribute-value" productions covered in section 5.2.1 >> (Entity) should be covered here since they apply to multiple record types. >> >> I would prefer to see attribute names presented as being IRIs in the data >> model, with the namespace-qualified CURIE syntax available as a convenience in >> the ASN presentation. >> >> I think the predefined attribute names should be dealt with in a separate >> section. I'm actually not convinced this is the best design choice for >> properties with DM-defined meaning, as opposed to (say) using separate record >> parameters, but that's more of a style issue than a fundamental objection. >> >> As indicated earlier, I think the whole discussion of derivation steps is too >> much detail, and I don't see the value, and would suggest dropping the >> prov:steps attribute. >> >> For attribute prov:label: why not just use rdfs:label? >> >> >> Section 5.5.2 Identifiers >> >> The text says they are *qualified* names, but in most of the example they are >> not. Also, some identifiers are described as having local scope: this is not >> compatible with using *qualified* names which are essentially IRIs. >> >> The text describes identifiers as denoting *records* (e.g. entity record) - I >> think this is wrong, and in any case is inconsistent with text elsewhere in >> the document. They should demote "entity", "activity", "agent", etc. >> >> >> Section 5.5.3 Literal >> >> "A PROV-DM Literal represents a value whose interpretation is outside the >> scope of PROV-DM." What a Terrible Failure... the whole point of languages >> introducing literals is precvisely that their interpretation *is* defined by >> the language. If not, they might as well be names. >> >> I think the intent is that their interpretation is defined by reference to the >> corresponding xsd datatype definition, or some other datatype definition, that >> is effectively incorporated by reference. >> >> I'd suggest that an interpretation of literals is provided by: >> - http://www.w3.org/TR/rdf-mt/#gddenot >> - http://www.w3.org/TR/rdf-mt/#DTYPEINTERP >> >> Section 5.5.4 Time >> >> No syntax production provided or indicated. >> >> I think it's unnecessary and inappropriate to indicate where time is used. >> It's just something to go wrong as the document evolves. >> >> >> Section 5.5.5 Asserter >> >> Do we really still need this (now accounts are gone). Suggest dropping. >> >> >> Section 5.5.6 Namespace >> >> I'd suggest covering this with the introduction of the record container syntax >> production >> >> >> Section 5.5.7 Location >> >> Do we have any explicit use of this? if not, I'd suggest dropping it. >> >> ... >> >> I'm out of time and stopping my review here. There's a general pattern here >> that I'd also apply to section 6. >> >> I'd then take section 7 and (probably) exp[and it into several sections ("Part >> 2") introducing and describing a more formal treatment of provenance that can >> be used to bridge from and refine the "scruffy" view to something that can be >> assembled and processed according to inferences that flow from the formal >> semantics. A key point to introduce here would be that it is possible to >> create provenance statements that cannot possibly satisfy the formal >> semantics, and to indicate what additional constraints and disciplines should >> be applied to ensure that they can (and hence to make the inferences that flow >> from those semantics valid). >> >> #g >> -- >> >> >
Received on Thursday, 23 February 2012 18:34:18 UTC