W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > January 2013

Re: ISSUE-72: Provenance Data Category - same for locQualityIssue?

From: Phil Ritchie <philr@vistatec.ie>
Date: Sun, 27 Jan 2013 19:25:11 +0000
To: Dave Lewis <dave.lewis@cs.tcd.ie>
Cc: Chase Tingley <chase@spartansoftwareinc.com>, kevin@spartanconsultinginc.com, public-multilingualweb-lt@w3.org, public-multilingualweb-lt-comments@w3.org
Message-ID: <OFB8730429.F17A4001-ON80257B00.006A8649-80257B00.006AAD19@vistatec.ie>
Yes, that's fine with me. We should keep characteristics like these 
consistent across categories.

Phil.





From:   Dave Lewis <dave.lewis@cs.tcd.ie>
To:     Chase Tingley <chase@spartansoftwareinc.com>, 
Cc:     public-multilingualweb-lt-comments@w3.org, 
public-multilingualweb-lt@w3.org, kevin@spartanconsultinginc.com
Date:   26/01/2013 12:51
Subject:        Re: ISSUE-72: Provenance Data Category - same for 
locQualityIssue?



Thanks Chase.

A logical follow-on question for LocQualityIssue implementors (as the 
other data category with stand off markup with multiple elements): Should 
we make the order of locQualityIssue element within a locQualityIssues 
stand off element reflect the order they were added in the same way?

i.e. after the definition of locQualityIssues we add  text:
"The order of its:locQualityIssue elements within a its:locQualityIssues 
element should reflect the order with which they were added to the 
document, with the most recently added one listed first."

Phil, guys?

Regards,
Dave

On 25/01/2013 19:37, Chase Tingley wrote:
Hi Dave, 

That sounds good.

Thanks

On Thu, Jan 24, 2013 at 12:41 AM, Dave Lewis <dave.lewis@cs.tcd.ie> wrote:
Hi Chase,
Thanks for getting back to us on this.

In relation to ordering of its:provenanceRecord I propose therefore to add 
the following sentence to the provenance section, after we introduce this 
element:

"The order of its:provenanceRecord elements within a its:provenanceRecords 
element should reflect the order with which they were added to the 
document, with the most recently added one listed first."

Can signal whether you are happy with this? 

Then given, your comments also on the time annotation issue below, I think 
I will be able to close this issue.

thanks again for this comment,
Regards,
Dave 


On 23/01/2013 18:17, Chase Tingley wrote:
Hi Dave & Pablo, 

Thanks for the responses.  Comments inline

On Tue, Jan 22, 2013 at 5:39 PM, Dave Lewis <dave.lewis@cs.tcd.ie> wrote:
Hi Chase, Kevin, all,
First thanks to Pablo for his response. Some further responses inline 
below related to timing:

On 15/01/2013 17:33, Pablo Nieto Caride wrote:
Hi Felix, all,
 
 
>ii) Similarly, does the ordering of provenance records within a 
<provenanceRecords> element make a statement about the (temporal) order in 
which the records were created?  If an ordering is implied, it raises 
questions about the implied ordering in a document where provenance 
records are declared both globally and via local markup.
 
Certainly the spec does not talk about temporal order, but given that 
records cannot be declared both globally and via local markup for a single 
element, the way I see it, and to simplify things, each provenance record 
should be older than the previous one.

I think the best we can do is offer best practice advice that the order 
with which more than one its:provenanceRecord are listed in 
its:provenanceRecords element should reflect the order they were added to 
the document rather than the order in which the translation(revision) 
actually happened. 

Pablo, could you confirm that you intend the oldest one to be listed last? 


I don't think we can mandate that the order indicated the order in which 
the activity indicated in the record (translation or translation revision) 
were preformed. This information may not be available to the processor 
adding the annotation. For example a TMS may add this annotation after 
receiving translation revisions from two different translators both for 
multiple elements but without per element timing information, so it 
wouldn't know the order in which the actual revisions were performed. 
Alternatively their timings may be known for different elements, but they 
overlap in time, so there wouldn't be an obvious order for the records. 

I think this makes sense.  It's more important to me that the overall 
semantics be clear than that the ordering work one way or another.  Just 
the knowledge that, for example, provenance records are more like a list 
than a bag is an important detail.
 
>iii) More generally, we observe that provenance records lack a date/time 
attribute, which makes their semantics as a form of history somewhat 
muddy.  In practice, a single tool/agent may edit a single document 
multiple times in succession over an arbitrary period of time.  Should 
these multiple "sessions" be represented by a single logical provenance 
record?  Or is it the intention of the spec that the agent add a 
provenance record for each of these sessions in which a modification is 
made to the document?
 
As I said in the previous point any modification of the content should add 
a new provenance record, at least is what I had in mind.
The original requirements for the proveance data category primarily were 
intended to identifiy and differentiate the _agents_ involved in 
translation or revising translations different parts of a document. Its 
not clear what would be the best form of timing information. Should it be 
the period over which the agents conducted the translation(revison) or the 
instance in time at which they completed it. As indicated above, even just 
determining the ordering, let alone the absolute timing of the activity, 
can be complicated, and would require collection of this information to be 
pushed downstream to CAT tools that aren't otherwise ITS aware. This might 
present an implementation barrier if correct timing was mandated.

Yes, you're right that this gets very messy when you consider aggregating 
provenance data from multiple agents that may have been processing in 
parallel.  The main point I wanted to clarify was that the purpose of the 
data category was to identify agents as opposed to "processing events".  I 
think this is enough for now.

Thanks!






************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the sender immediately by e-mail.

www.vistatec.com
************************************************************
Received on Sunday, 27 January 2013 19:25:46 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:08:26 UTC