- From: Luc Moreau <l.moreau@ecs.soton.ac.uk>
- Date: Wed, 29 Feb 2012 06:24:47 +0100
- To: public-prov-wg@w3.org
Tracker, this is now ISSUE-274 On 23/02/2012 16:59, Graham Klyne wrote: > I now realize I spent all morning reviewing the WRONG DOCUMENT :( > > I've now taken a quick look at > http://dvcs.w3.org/hg/prov/raw-file/a5f7ff3d6b30/model/working-copy/towards-wd4.html > - I think this does start to address some of the provenance complexity > issues, but I also think many of the comments I made do still apply: > > Section 2: I think much of the material here could be in the core > specification. But it's much easier to follow than the previous > material. The diagram is less clear to me that the older diagram, but > I think that's just a placeholder. if the overview text is retained, > I think it might be helpful to have the overview diagram first. > > Section 3: I still find the example not-very-helpful at this point. > It uses ASM expressions before they hjave been defined. I'd suggest > having it as an appendix. I find the process vs authors view approach > is confusing. > > Section 4: many of my previous comments (to previous section 5) are > addressed here, but I still think Note/annotations is superfluous, and > derivation is over-complicated. I'm not seeing the syntax > distinguished symbol production (that used to be > provenanceContainer). I think several of my previous comments about > identifiers attributes and qualified names still apply. > > Out of time - need to join telecon now. > > #g > -- > > > On 23/02/2012 13:16, Graham Klyne wrote: >> Reviewing: >> http://dvcs.w3.org/hg/prov/raw-file/7aadc6332722/model/ProvenanceModel.html >> >> >> Summary: I'm sorry to say that I don't think the document even starts >> to bring >> in the kind of simplification discussed at the F2F meeting, which is >> required if >> this spec is to gain traction with web developers. >> >> I find the document is still difficult to read, and in a full morning of >> reviewing it I've only got as far as section 5. I think further >> *radical* >> simplification is required for the data model description, and I >> think it's >> possible without losing any essential information about the model. >> >> ... >> >> (Nit: when I load this document from a local copy of the repository, >> I get an >> error reported indicating a problem with fetching the CSS. It loads >> OK from the >> above URI. Is there a problematic relative URI reference in the >> source document?) >> >> ... >> >> I thought we'd agreed at F2F to provide a simple "scruffy" >> introduction to the >> DM (part 1), then introduce the requirement and refinements for more >> formally >> tractable provenance expressions that can be used to build accurate >> historical >> records over multiple related artifacts (part 2). The document I'm >> reading does >> very little that I can see to make the prov-dm more approachable, as was >> indicated that we need to do at the F2F. As far as I can tell, the >> only thing >> that has been in this direction is to *add* a new section on >> interpretation. >> This, of itself, does nothing to simplify the DM description. >> >> I think we should be placing far more emphasis on making it a simple >> as we >> possibly can for information providers to publish provenance. >> Consider that the >> primary beneficiaries of provenance information are the *consumers* >> of published >> information, not the *publishers*, so if we make life unnecessarily >> hard for >> publishers we're shooting ourselves in the collective foot. From >> this, I think >> the initial introduction to the DM needs to be radically simplified >> to the >> extent that a developer can spend 10-15 minutes glancing at it and >> think "oh >> yes, I can easily add this to my output data". If necessary, we push >> some of the >> work of understanding what needs to be done to harmonize the data to >> make it >> more suitable for building a historical record towards the consumer. >> >> ... >> >> With this in mind: >> >> Section 2: >> >> The introductory material in section 2.1 is unhelpful, and I propose >> it be >> removed from the introduction. Most of this material is not important >> until we >> come to consider the more formal aspects of the DM. With the >> exception of >> 2.1.2.1 about events, which I think should be introduced in the >> PROV-DM core >> model section. Similarly sections 2.2 and 2.3 (maybe moving the two >> introductory >> sentences of 2.2 into section 2.4). Thus section 2 would become just >> a very >> brief intro to the notation used for describing ASN, and maybe this >> could be >> moved into the PROV-DM core section (sect 5). >> >> Section 3 looks generally useful. But it still mentions an "account >> record", >> which I understood was being dropped. It also mentions "alternateOf" and >> "specializationOf" which are not necessary for a "scruffy" >> introduction to >> provenance, so I suggest mention of these is dropped from here. I >> suggest >> dropping the sentence about core and common relations - it's just >> noise. With >> the removal of accounts, I think the whole purpose of >> notes/annotation records >> *as part of the provenance model* has become moot, and suggest that >> these be >> dropped from the spec. There's nothing to prevent annotations being >> added to the >> provenance data as rdfs:comment or rdfs:label values. I suggest >> dropping the >> mention of extensibility points: again, it's just noise at this point. >> >> Section 4: to my mind, this example section adds no useful >> information and >> doesn't help understanding of the (on account of being harder to >> follow than the >> ASN model description), and suggest that it be dropped. >> Alternatively, I suggest >> moving it to an appendix. >> >> Section 5: this is the vital core of this document. Section 3 >> provides a very >> useful high-level overview, so this section can just get down to >> describing the >> constructs. >> >> I note that ASN is mis-named: it's not really an *abstract* syntax >> notation; >> it's quite concrete, so it's more like a (technology-neutral) >> functional syntax >> notion. @@raise separate issue for this? >> >> Section 5.1: prov-dm is a data model, not an implementation, right? >> So why do we >> need to introduce "housekeeping constructs ... to facilitate their >> interchange"? >> Suggest dropping most of the discussion of "record container", and >> simply >> introduce the "recordContainer" and "namespaceDeclaration" >> productions along >> with production for "record". >> >> >> Section 5.2.1: Entity record >> >> Suggest drop "In PROV-DM, " - it's redundant. >> >> Suggest the examples focus more on web documents, with "car" as more >> of an >> afterthought. Primary use will probably be to describe web documents, >> sop lets >> keep this at front-of-mind? >> >> Suggest dropping all mentions of "asserters viewpoint" and "situation >> in the >> world" - these don't matter for the "scruffy" view of provenance. >> >> Suggest dropping the idea that the attributes somehow define the >> entity ("whose >> situation in the world is represented by the attribute-value pairs"). >> They're >> just there to provide information about the entity, and as hooks for >> interoperability. (I argued previously for dropping attributes >> completely, but >> was persuaded otherwise by the interoperability argument from the >> provenance >> challenges - don't try to make more of them.) >> >> Suggest drop issue mentioning "characterization interval" - I think >> it's now a >> non-issue. >> >> I think the issue of uniqueness of identifiers should be dealt with >> in the >> introduction to ASN, not under the individual elements. >> >> Under "further considerations", suggest dropping all but 3rd and 6th >> bullets. In >> the 6th bullet, I don't understand the stuff about "a namespace also >> declares >> the number of occurrences...". I have deep concern about what this >> might be >> trying to say. In any case, shouldn't this be covered under a >> description of the >> namespace, if needed? >> >> I think the material about "activities" and "plans" really doesn't >> belong in >> this section. >> >> >> Section 5.2.2 Activity record >> >> Suggest drop "In PROV-DM, " - it's redundant. >> >> Didn't we discuss replacing the start, end times by events? I don't >> recall the >> outcome - I'm just mentioning this in case it's been missed. >> >> For the example, I suggest leading on something to do with >> information on the web. >> >> It was a surprise to me to learn that PROV-DM has reserved >> attributes. If >> attributes are in the model to support interoperability with other >> provenance >> frameworks (which is my understanding from previous discussions), >> this feels >> like a poor design choice. Maybe it should be a separate parameter? >> In any case, >> I think the intent of this "subtyping" needs to be explained. >> >> If this is to be a "scruffy" introduction, I think the reference to >> start-view-end is not needed here. In any case, the cross-reference >> is almost >> impossible to locate in a printed copy of the spec. >> >> I think the issue of uniqueness of identifiers should be dealt with >> in the >> introduction to ASN, not under the individual elements. >> >> Suggest dropping the "further considerations bullets." >> >> Did we not agree that activities *would* be allowable as entities >> (especially if >> entities are just stuff that can identified).? >> >> >> Section 5.2.3, Agent record >> >> Having introduced a framework for subtyping for activities, why not >> use the same >> approach for different types of agents ... especially considering >> that two major >> agent types are defined by reference to existing foaf definitions? I >> suggest not >> asserting the claim that the agent types are mutually exclusive. >> >> Suggest drop reference to "situation in the world". >> >> Suggest drop discussion of inferences of agent records - if needed, >> they should >> come later along with a more formal ("non-scruffy") treatment of the >> data model. >> >> >> Section 5.2.4, Note record >> >> I think this should be dropped from the data model. I don't see that >> it serves >> any needed *provenance* function. "extra information" can be added by >> format-specific extensions. As such, this record type only adds noise >> to the >> specification. >> >> >> Section 5.3.1.1 generation record >> >> I believe the ASN syntax given verges on being ambiguous, and is >> unnecessarily >> tricky to parse by a human or machine consumer; e.g. consider: >> >> wasGeneratedBy(a,b) >> wasGeneratedBy(a,b,) >> >> The presence of the trailing comma in the second example completely >> changes the >> parse tree productions associated with a and b. I think it would be >> much easier >> if ASN simply required a dummy activity identifier to be provided; >> i.e. don't >> make aidentifier optional. Indeed, rather than allowing optional >> identifiers >> anywhere in the ASN, one might use a placeholder (e.g. '_') for any >> unspecified >> identifier, which would make the overall syntax much more regular. >> >> Since the id is used only for annotations, I suggest dropping it (see >> section >> 5.2.4 comment above). >> >> If this is to be a "scruffy" introduction, I think the reference to >> generation-within-activity is not needed here. In any case, the >> cross-reference >> is almost impossible to locate in a printed copy of the spec. Suggest >> drop this. >> >> Similarly, suggest dropping the structural constraint here. >> >> >> Section 5.3.1.2 Usage record >> >> Suggest drop "In PROV-DM, " - it's redundant. >> >> Why is there an identifier for a usage record? >> >> Suggest lead with example of consuming a web resource. >> >> Suggest drop reference to annotation record (see above note about 5.2.4) >> >> Suggest drop reference to interpretation here >> >> >> Section 5.3.2.1 Association record >> >> Para 3: Suggest drop first sentence, and simplify; i.e. just say; >> "Activities >> may reflect the execution of a plan..." >> >> Para 4, there quite a bit of redundancy redundancy here. Suggest: >> [[ >> A plan is the description of a set of actions or steps intended by >> one or more >> agents to achieve some goal. PROV-DM is not prescriptive about the >> nature of >> plans, their representation, the actions and steps they consist of, >> and their >> intended goals. A plan can be a workflow for a scientific experiment, >> a recipe >> for a cooking activity, or a list of instructions for a micro-processor >> execution. Plans are entities, which may have associated provenance. >> An activity >> may be associated with multiple plans, allowing for descriptions of >> activities >> initially associated with a plan, which was changed, on the fly, as >> the activity >> progresses. Plans can be successfully executed or they can fail. We >> expect >> applications to exploit PROV-DM extensibility mechanisms to capture >> the rich >> nature of plans and associations between activities and plans. >> ]] >> >> Para 5: I see no value in cross-referencing the responsibility record >> here. >> Suggest dropping this paragraph. >> >> Why is there an identifier for an association record? >> >> >> Section 5.3.2.2 Start and End records >> >> This seems to overlap with start, end parameters on an activity. It's >> not >> immediately clear how they play together. >> >> Should this record not describe an "event"? Then the id should >> identify the >> start/end event, not the record. cf. Issue 207. >> >> Identifiers should denote activities and agents, *not records*. >> >> >> Section 5.3.3.1 Responsibility record >> >> Suggest drop "To promote take-up... " and instead lead with a simple >> introduction of what the record describes. >> >> Para 3: It seems to me that the responsibility record should stand >> independently >> of any association record. Suggest drop "Given an activity >> association record... >> (...)" >> >> Why is there an identifier for an responsibility record? >> >> >> Section 5.3.3.2 Derivation record >> >> Suggest drop "In PROV-DM, " >> >> This whole section seems way to complicated. My understanding is that >> the >> "Common relations" section is intended to cover those useful short-cut >> expressions that can be expressed with less convenience in the core >> model. As >> such, I think the derivation record should be a "common" rather than >> a "core" >> relation. >> >> Aside from that, I really don't see the utility of all this stuff >> about precise >> and imprecise derivations. I think there is just one useful relation >> to define, >> roughly corresponding to "imprecise n-derivation record" here: >> >> - I note that the "imprecise 1-derivation record" and "imprecise >> n-derivation >> record" are not syntactically distingushable, so there's no point in >> discussing >> the difference. >> >> - the "precise 1-derivation record" can be expressed using an >> activity, usage >> and generation record: I'm not convinced this alternative syntax is >> really >> buying anything worthwhile. >> >> Suggest radical simplification along these lines, and move to section >> 6. Don't >> introduce all the formal stuff until a later section handling more >> formal >> treatments. >> >> >> Section 5.3.3.3 Alternate and Specialization records >> >> In considering a "scruffy" view of provenance, these relations aren't >> really >> needed. However, they do underpin a more formal treatment in the face >> of dynamic >> resources. >> >> I would give serious consideration to introducing these later, when >> the more >> formal treatment of dynamic resources is considered. >> >> >> Section 5.3.4. Annotation record >> >> I think this serves no needed purpose, and should be dropped. (See >> earlier >> comments for section 5.2.4.) >> >> >> Section 5.4.1 Account record >> >> I understood we'd agreed to drop this. >> >> >> Section 5.4.2 Record container >> >> I think this is mainly an artifact of the ASN syntax, and should be >> introduced >> more briefly in the introductory section 5.1 (see previous comments) >> >> >> Section 5.5.1 Attribute >> >> I think the "optional-attribute-value" productions covered in section >> 5.2.1 >> (Entity) should be covered here since they apply to multiple record >> types. >> >> I would prefer to see attribute names presented as being IRIs in the >> data model, >> with the namespace-qualified CURIE syntax available as a convenience >> in the ASN >> presentation. >> >> I think the predefined attribute names should be dealt with in a >> separate >> section. I'm actually not convinced this is the best design choice for >> properties with DM-defined meaning, as opposed to (say) using >> separate record >> parameters, but that's more of a style issue than a fundamental >> objection. >> >> As indicated earlier, I think the whole discussion of derivation >> steps is too >> much detail, and I don't see the value, and would suggest dropping the >> prov:steps attribute. >> >> For attribute prov:label: why not just use rdfs:label? >> >> >> Section 5.5.2 Identifiers >> >> The text says they are *qualified* names, but in most of the example >> they are >> not. Also, some identifiers are described as having local scope: this >> is not >> compatible with using *qualified* names which are essentially IRIs. >> >> The text describes identifiers as denoting *records* (e.g. entity >> record) - I >> think this is wrong, and in any case is inconsistent with text >> elsewhere in the >> document. They should demote "entity", "activity", "agent", etc. >> >> >> Section 5.5.3 Literal >> >> "A PROV-DM Literal represents a value whose interpretation is outside >> the scope >> of PROV-DM." What a Terrible Failure... the whole point of languages >> introducing >> literals is precvisely that their interpretation *is* defined by the >> language. >> If not, they might as well be names. >> >> I think the intent is that their interpretation is defined by >> reference to the >> corresponding xsd datatype definition, or some other datatype >> definition, that >> is effectively incorporated by reference. >> >> I'd suggest that an interpretation of literals is provided by: >> - http://www.w3.org/TR/rdf-mt/#gddenot >> - http://www.w3.org/TR/rdf-mt/#DTYPEINTERP >> >> Section 5.5.4 Time >> >> No syntax production provided or indicated. >> >> I think it's unnecessary and inappropriate to indicate where time is >> used. It's >> just something to go wrong as the document evolves. >> >> >> Section 5.5.5 Asserter >> >> Do we really still need this (now accounts are gone). Suggest dropping. >> >> >> Section 5.5.6 Namespace >> >> I'd suggest covering this with the introduction of the record >> container syntax >> production >> >> >> Section 5.5.7 Location >> >> Do we have any explicit use of this? if not, I'd suggest dropping it. >> >> ... >> >> I'm out of time and stopping my review here. There's a general >> pattern here that >> I'd also apply to section 6. >> >> I'd then take section 7 and (probably) exp[and it into several >> sections ("Part >> 2") introducing and describing a more formal treatment of provenance >> that can be >> used to bridge from and refine the "scruffy" view to something that >> can be >> assembled and processed according to inferences that flow from the >> formal >> semantics. A key point to introduce here would be that it is possible >> to create >> provenance statements that cannot possibly satisfy the formal >> semantics, and to >> indicate what additional constraints and disciplines should be >> applied to ensure >> that they can (and hence to make the inferences that flow from those >> semantics >> valid). >> >> #g >> -- >> >> >
Received on Wednesday, 29 February 2012 05:25:19 UTC