Re: ISSUE-72: Provenance Data Category

Hi Chase,
Thanks for getting back to us on this.

In relation to ordering of its:provenanceRecord I propose therefore to 
add the following sentence to the provenance section, after we introduce 
this element:

"The order of its:provenanceRecord elements within a 
its:provenanceRecords element should reflect the order with which they 
were added to the document, with the most recently added one listed first."

Can signal whether you are happy with this?

Then given, your comments also on the time annotation issue below, I 
think I will be able to close this issue.

thanks again for this comment,
Regards,
Dave

On 23/01/2013 18:17, Chase Tingley wrote:
> Hi Dave & Pablo,
>
> Thanks for the responses.  Comments inline
>
> On Tue, Jan 22, 2013 at 5:39 PM, Dave Lewis <dave.lewis@cs.tcd.ie 
> <mailto:dave.lewis@cs.tcd.ie>> wrote:
>
>     Hi Chase, Kevin, all,
>     First thanks to Pablo for his response. Some further responses
>     inline below related to timing:
>
>     On 15/01/2013 17:33, Pablo Nieto Caride wrote:
>>
>>     Hi Felix, all,
>>
>>     >ii) Similarly, does the ordering of provenance records within a <provenanceRecords>
>>     element make a statement about the (temporal) order in which the
>>     records were created?  If an ordering is implied, it raises
>>     questions about the implied ordering in a document where
>>     provenance records are declared both globally and via local markup.
>>
>>     Certainly the spec does not talk about temporal order, but given
>>     that records cannot be declared both globally and via local
>>     markup for a single element, the way I see it, and to simplify
>>     things, each provenance record should be older than the previous one.
>>
>
>     I think the best we can do is offer best practice advice that the
>     order with which more than one its:provenanceRecord are listed in
>     its:provenanceRecords element should reflect the order they were
>     added to the document rather than the order in which the
>     translation(revision) actually happened.
>
>     Pablo, could you confirm that you intend the oldest one to be
>     listed last?
>
>     I don't think we can mandate that the order indicated the order in
>     which the activity indicated in the record (translation or
>     translation revision) were preformed. This information may not be
>     available to the processor adding the annotation. For example a
>     TMS may add this annotation after receiving translation revisions
>     from two different translators both for multiple elements but
>     without per element timing information, so it wouldn't know the
>     order in which the actual revisions were performed. Alternatively
>     their timings may be known for different elements, but they
>     overlap in time, so there wouldn't be an obvious order for the
>     records.
>
>
> I think this makes sense.  It's more important to me that the overall 
> semantics be clear than that the ordering work one way or another. 
>  Just the knowledge that, for example, provenance records are more 
> like a list than a bag is an important detail.
>
>>     >iii) More generally, we observe that provenance records lack a date/time attribute,
>>     which makes their semantics as a form of history somewhat muddy.
>>      In practice, a single tool/agent may edit a single document
>>     multiple times in succession over an arbitrary period of time.
>>      Should these multiple "sessions" be represented by a single
>>     logical provenance record?  Or is it the intention of the spec
>>     that the agent add a provenance record for each of these sessions
>>     in which a modification is made to the document?
>>
>>     As I said in the previous point any modification of the content
>>     should add a new provenance record, at least is what I had in mind.
>>
>     The original requirements for the proveance data category
>     primarily were intended to identifiy and differentiate the
>     _agents_ involved in translation or revising translations
>     different parts of a document. Its not clear what would be the
>     best form of timing information. Should it be the period over
>     which the agents conducted the translation(revison) or the
>     instance in time at which they completed it. As indicated above,
>     even just determining the ordering, let alone the absolute timing
>     of the activity, can be complicated, and would require collection
>     of this information to be pushed downstream to CAT tools that
>     aren't otherwise ITS aware. This might present an implementation
>     barrier if correct timing was mandated.
>
>
> Yes, you're right that this gets very messy when you consider 
> aggregating provenance data from multiple agents that may have been 
> processing in parallel.  The main point I wanted to clarify was that 
> the purpose of the data category was to identify agents as opposed to 
> "processing events".  I think this is enough for now.
>
> Thanks!
>

Received on Thursday, 24 January 2013 08:42:29 UTC