- From: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
- Date: Thu, 23 Feb 2012 13:51:46 +0000
- To: public-prov-wg@w3.org
Hi Graham,
I am sorry, but I don't understand which document you have reviewed.
http://dvcs.w3.org/hg/prov/raw-file/7aadc6332722/model/ProvenanceModel.html
is WD3.
What needed to be reviewed is:
http://dvcs.w3.org/hg/prov/raw-file/default/model/working-copy/towards-wd4.html
http://dvcs.w3.org/hg/prov/raw-file/default/model/working-copy/prov-dm-constraints.html
http://dvcs.w3.org/hg/prov/raw-file/default/model/working-copy/prov-asn.html
as indicated on http://www.w3.org/2011/prov/wiki/ProvDMWorkingDraft4
Regards,
Luc
On 02/23/2012 01:16 PM, Graham Klyne wrote:
> Reviewing:
> http://dvcs.w3.org/hg/prov/raw-file/7aadc6332722/model/ProvenanceModel.html
>
>
> Summary: I'm sorry to say that I don't think the document even starts
> to bring in the kind of simplification discussed at the F2F meeting,
> which is required if this spec is to gain traction with web developers.
>
> I find the document is still difficult to read, and in a full morning
> of reviewing it I've only got as far as section 5. I think further
> *radical* simplification is required for the data model description,
> and I think it's possible without losing any essential information
> about the model.
>
> ...
>
> (Nit: when I load this document from a local copy of the repository, I
> get an error reported indicating a problem with fetching the CSS. It
> loads OK from the above URI. Is there a problematic relative URI
> reference in the source document?)
>
> ...
>
> I thought we'd agreed at F2F to provide a simple "scruffy"
> introduction to the DM (part 1), then introduce the requirement and
> refinements for more formally tractable provenance expressions that
> can be used to build accurate historical records over multiple related
> artifacts (part 2). The document I'm reading does very little that I
> can see to make the prov-dm more approachable, as was indicated that
> we need to do at the F2F. As far as I can tell, the only thing that
> has been in this direction is to *add* a new section on
> interpretation. This, of itself, does nothing to simplify the DM
> description.
>
> I think we should be placing far more emphasis on making it a simple
> as we possibly can for information providers to publish provenance.
> Consider that the primary beneficiaries of provenance information are
> the *consumers* of published information, not the *publishers*, so if
> we make life unnecessarily hard for publishers we're shooting
> ourselves in the collective foot. From this, I think the initial
> introduction to the DM needs to be radically simplified to the extent
> that a developer can spend 10-15 minutes glancing at it and think "oh
> yes, I can easily add this to my output data". If necessary, we push
> some of the work of understanding what needs to be done to harmonize
> the data to make it more suitable for building a historical record
> towards the consumer.
>
> ...
>
> With this in mind:
>
> Section 2:
>
> The introductory material in section 2.1 is unhelpful, and I propose
> it be removed from the introduction. Most of this material is not
> important until we come to consider the more formal aspects of the
> DM. With the exception of 2.1.2.1 about events, which I think should
> be introduced in the PROV-DM core model section. Similarly sections
> 2.2 and 2.3 (maybe moving the two introductory sentences of 2.2 into
> section 2.4). Thus section 2 would become just a very brief intro to
> the notation used for describing ASN, and maybe this could be moved
> into the PROV-DM core section (sect 5).
>
> Section 3 looks generally useful. But it still mentions an "account
> record", which I understood was being dropped. It also mentions
> "alternateOf" and "specializationOf" which are not necessary for a
> "scruffy" introduction to provenance, so I suggest mention of these is
> dropped from here. I suggest dropping the sentence about core and
> common relations - it's just noise. With the removal of accounts, I
> think the whole purpose of notes/annotation records *as part of the
> provenance model* has become moot, and suggest that these be dropped
> from the spec. There's nothing to prevent annotations being added to
> the provenance data as rdfs:comment or rdfs:label values. I suggest
> dropping the mention of extensibility points: again, it's just noise
> at this point.
>
> Section 4: to my mind, this example section adds no useful
> information and doesn't help understanding of the (on account of being
> harder to follow than the ASN model description), and suggest that it
> be dropped. Alternatively, I suggest moving it to an appendix.
>
> Section 5: this is the vital core of this document. Section 3
> provides a very useful high-level overview, so this section can just
> get down to describing the constructs.
>
> I note that ASN is mis-named: it's not really an *abstract* syntax
> notation; it's quite concrete, so it's more like a
> (technology-neutral) functional syntax notion. @@raise separate issue
> for this?
>
> Section 5.1: prov-dm is a data model, not an implementation, right?
> So why do we need to introduce "housekeeping constructs ... to
> facilitate their interchange"? Suggest dropping most of the
> discussion of "record container", and simply introduce the
> "recordContainer" and "namespaceDeclaration" productions along with
> production for "record".
>
>
> Section 5.2.1: Entity record
>
> Suggest drop "In PROV-DM, " - it's redundant.
>
> Suggest the examples focus more on web documents, with "car" as more
> of an afterthought. Primary use will probably be to describe web
> documents, sop lets keep this at front-of-mind?
>
> Suggest dropping all mentions of "asserters viewpoint" and "situation
> in the world" - these don't matter for the "scruffy" view of provenance.
>
> Suggest dropping the idea that the attributes somehow define the
> entity ("whose situation in the world is represented by the
> attribute-value pairs"). They're just there to provide information
> about the entity, and as hooks for interoperability. (I argued
> previously for dropping attributes completely, but was persuaded
> otherwise by the interoperability argument from the provenance
> challenges - don't try to make more of them.)
>
> Suggest drop issue mentioning "characterization interval" - I think
> it's now a non-issue.
>
> I think the issue of uniqueness of identifiers should be dealt with in
> the introduction to ASN, not under the individual elements.
>
> Under "further considerations", suggest dropping all but 3rd and 6th
> bullets. In the 6th bullet, I don't understand the stuff about "a
> namespace also declares the number of occurrences...". I have deep
> concern about what this might be trying to say. In any case,
> shouldn't this be covered under a description of the namespace, if
> needed?
>
> I think the material about "activities" and "plans" really doesn't
> belong in this section.
>
>
> Section 5.2.2 Activity record
>
> Suggest drop "In PROV-DM, " - it's redundant.
>
> Didn't we discuss replacing the start, end times by events? I don't
> recall the outcome - I'm just mentioning this in case it's been missed.
>
> For the example, I suggest leading on something to do with information
> on the web.
>
> It was a surprise to me to learn that PROV-DM has reserved
> attributes. If attributes are in the model to support
> interoperability with other provenance frameworks (which is my
> understanding from previous discussions), this feels like a poor
> design choice. Maybe it should be a separate parameter? In any case,
> I think the intent of this "subtyping" needs to be explained.
>
> If this is to be a "scruffy" introduction, I think the reference to
> start-view-end is not needed here. In any case, the cross-reference
> is almost impossible to locate in a printed copy of the spec.
>
> I think the issue of uniqueness of identifiers should be dealt with in
> the introduction to ASN, not under the individual elements.
>
> Suggest dropping the "further considerations bullets."
>
> Did we not agree that activities *would* be allowable as entities
> (especially if entities are just stuff that can identified).?
>
>
> Section 5.2.3, Agent record
>
> Having introduced a framework for subtyping for activities, why not
> use the same approach for different types of agents ... especially
> considering that two major agent types are defined by reference to
> existing foaf definitions? I suggest not asserting the claim that the
> agent types are mutually exclusive.
>
> Suggest drop reference to "situation in the world".
>
> Suggest drop discussion of inferences of agent records - if needed,
> they should come later along with a more formal ("non-scruffy")
> treatment of the data model.
>
>
> Section 5.2.4, Note record
>
> I think this should be dropped from the data model. I don't see that
> it serves any needed *provenance* function. "extra information" can
> be added by format-specific extensions. As such, this record type
> only adds noise to the specification.
>
>
> Section 5.3.1.1 generation record
>
> I believe the ASN syntax given verges on being ambiguous, and is
> unnecessarily tricky to parse by a human or machine consumer; e.g.
> consider:
>
> wasGeneratedBy(a,b)
> wasGeneratedBy(a,b,)
>
> The presence of the trailing comma in the second example completely
> changes the parse tree productions associated with a and b. I think
> it would be much easier if ASN simply required a dummy activity
> identifier to be provided; i.e. don't make aidentifier optional.
> Indeed, rather than allowing optional identifiers anywhere in the ASN,
> one might use a placeholder (e.g. '_') for any unspecified identifier,
> which would make the overall syntax much more regular.
>
> Since the id is used only for annotations, I suggest dropping it (see
> section 5.2.4 comment above).
>
> If this is to be a "scruffy" introduction, I think the reference to
> generation-within-activity is not needed here. In any case, the
> cross-reference is almost impossible to locate in a printed copy of
> the spec. Suggest drop this.
>
> Similarly, suggest dropping the structural constraint here.
>
>
> Section 5.3.1.2 Usage record
>
> Suggest drop "In PROV-DM, " - it's redundant.
>
> Why is there an identifier for a usage record?
>
> Suggest lead with example of consuming a web resource.
>
> Suggest drop reference to annotation record (see above note about 5.2.4)
>
> Suggest drop reference to interpretation here
>
>
> Section 5.3.2.1 Association record
>
> Para 3: Suggest drop first sentence, and simplify; i.e. just say;
> "Activities may reflect the execution of a plan..."
>
> Para 4, there quite a bit of redundancy redundancy here. Suggest:
> [[
> A plan is the description of a set of actions or steps intended by one
> or more agents to achieve some goal. PROV-DM is not prescriptive about
> the nature of plans, their representation, the actions and steps they
> consist of, and their intended goals. A plan can be a workflow for a
> scientific experiment, a recipe for a cooking activity, or a list of
> instructions for a micro-processor execution. Plans are entities,
> which may have associated provenance. An activity may be associated
> with multiple plans, allowing for descriptions of activities initially
> associated with a plan, which was changed, on the fly, as the activity
> progresses. Plans can be successfully executed or they can fail. We
> expect applications to exploit PROV-DM extensibility mechanisms to
> capture the rich nature of plans and associations between activities
> and plans.
> ]]
>
> Para 5: I see no value in cross-referencing the responsibility record
> here. Suggest dropping this paragraph.
>
> Why is there an identifier for an association record?
>
>
> Section 5.3.2.2 Start and End records
>
> This seems to overlap with start, end parameters on an activity.
> It's not immediately clear how they play together.
>
> Should this record not describe an "event"? Then the id should
> identify the start/end event, not the record. cf. Issue 207.
>
> Identifiers should denote activities and agents, *not records*.
>
>
> Section 5.3.3.1 Responsibility record
>
> Suggest drop "To promote take-up... " and instead lead with a simple
> introduction of what the record describes.
>
> Para 3: It seems to me that the responsibility record should stand
> independently of any association record. Suggest drop "Given an
> activity association record... (...)"
>
> Why is there an identifier for an responsibility record?
>
>
> Section 5.3.3.2 Derivation record
>
> Suggest drop "In PROV-DM, "
>
> This whole section seems way to complicated. My understanding is that
> the "Common relations" section is intended to cover those useful
> short-cut expressions that can be expressed with less convenience in
> the core model. As such, I think the derivation record should be a
> "common" rather than a "core" relation.
>
> Aside from that, I really don't see the utility of all this stuff
> about precise and imprecise derivations. I think there is just one
> useful relation to define, roughly corresponding to "imprecise
> n-derivation record" here:
>
> - I note that the "imprecise 1-derivation record" and "imprecise
> n-derivation record" are not syntactically distingushable, so there's
> no point in discussing the difference.
>
> - the "precise 1-derivation record" can be expressed using an
> activity, usage and generation record: I'm not convinced this
> alternative syntax is really buying anything worthwhile.
>
> Suggest radical simplification along these lines, and move to section
> 6. Don't introduce all the formal stuff until a later section
> handling more formal treatments.
>
>
> Section 5.3.3.3 Alternate and Specialization records
>
> In considering a "scruffy" view of provenance, these relations aren't
> really needed. However, they do underpin a more formal treatment in
> the face of dynamic resources.
>
> I would give serious consideration to introducing these later, when
> the more formal treatment of dynamic resources is considered.
>
>
> Section 5.3.4. Annotation record
>
> I think this serves no needed purpose, and should be dropped. (See
> earlier comments for section 5.2.4.)
>
>
> Section 5.4.1 Account record
>
> I understood we'd agreed to drop this.
>
>
> Section 5.4.2 Record container
>
> I think this is mainly an artifact of the ASN syntax, and should be
> introduced more briefly in the introductory section 5.1 (see previous
> comments)
>
>
> Section 5.5.1 Attribute
>
> I think the "optional-attribute-value" productions covered in section
> 5.2.1 (Entity) should be covered here since they apply to multiple
> record types.
>
> I would prefer to see attribute names presented as being IRIs in the
> data model, with the namespace-qualified CURIE syntax available as a
> convenience in the ASN presentation.
>
> I think the predefined attribute names should be dealt with in a
> separate section. I'm actually not convinced this is the best design
> choice for properties with DM-defined meaning, as opposed to (say)
> using separate record parameters, but that's more of a style issue
> than a fundamental objection.
>
> As indicated earlier, I think the whole discussion of derivation steps
> is too much detail, and I don't see the value, and would suggest
> dropping the prov:steps attribute.
>
> For attribute prov:label: why not just use rdfs:label?
>
>
> Section 5.5.2 Identifiers
>
> The text says they are *qualified* names, but in most of the example
> they are not. Also, some identifiers are described as having local
> scope: this is not compatible with using *qualified* names which are
> essentially IRIs.
>
> The text describes identifiers as denoting *records* (e.g. entity
> record) - I think this is wrong, and in any case is inconsistent with
> text elsewhere in the document. They should demote "entity",
> "activity", "agent", etc.
>
>
> Section 5.5.3 Literal
>
> "A PROV-DM Literal represents a value whose interpretation is outside
> the scope of PROV-DM." What a Terrible Failure... the whole point of
> languages introducing literals is precvisely that their interpretation
> *is* defined by the language. If not, they might as well be names.
>
> I think the intent is that their interpretation is defined by
> reference to the corresponding xsd datatype definition, or some other
> datatype definition, that is effectively incorporated by reference.
>
> I'd suggest that an interpretation of literals is provided by:
> - http://www.w3.org/TR/rdf-mt/#gddenot
> - http://www.w3.org/TR/rdf-mt/#DTYPEINTERP
>
> Section 5.5.4 Time
>
> No syntax production provided or indicated.
>
> I think it's unnecessary and inappropriate to indicate where time is
> used. It's just something to go wrong as the document evolves.
>
>
> Section 5.5.5 Asserter
>
> Do we really still need this (now accounts are gone). Suggest dropping.
>
>
> Section 5.5.6 Namespace
>
> I'd suggest covering this with the introduction of the record
> container syntax production
>
>
> Section 5.5.7 Location
>
> Do we have any explicit use of this? if not, I'd suggest dropping it.
>
> ...
>
> I'm out of time and stopping my review here. There's a general
> pattern here that I'd also apply to section 6.
>
> I'd then take section 7 and (probably) exp[and it into several
> sections ("Part 2") introducing and describing a more formal treatment
> of provenance that can be used to bridge from and refine the "scruffy"
> view to something that can be assembled and processed according to
> inferences that flow from the formal semantics. A key point to
> introduce here would be that it is possible to create provenance
> statements that cannot possibly satisfy the formal semantics, and to
> indicate what additional constraints and disciplines should be applied
> to ensure that they can (and hence to make the inferences that flow
> from those semantics valid).
>
> #g
> --
>
>
--
Professor Luc Moreau
Electronics and Computer Science tel: +44 23 8059 4487
University of Southampton fax: +44 23 8059 2865
Southampton SO17 1BJ email: l.moreau@ecs.soton.ac.uk
United Kingdom http://www.ecs.soton.ac.uk/~lavm
Received on Thursday, 23 February 2012 13:52:23 UTC