W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > January 2013

RE: ISSUE-72: Provenance Data Category - same for locQualityIssue?

From: Pablo Nieto Caride <pablo.nieto@linguaserve.com>
Date: Mon, 28 Jan 2013 11:24:52 +0100
To: "'Dave Lewis'" <dave.lewis@cs.tcd.ie>, "'Phil Ritchie'" <philr@vistatec.ie>
Cc: "'Chase Tingley'" <chase@spartansoftwareinc.com>, <kevin@spartanconsultinginc.com>, <public-multilingualweb-lt@w3.org>, <public-multilingualweb-lt-comments@w3.org>
Message-ID: <042501cdfd41$b5c34fb0$2149ef10$@linguaserve.com>


I agree with you guys, we should be consistent so the same approach seems
valid to me. Dave, I assume that we should add a note or something about
this in the spec, shouldn't we?






Absolutely. I think we'd been assuming tis when we discussed it for
provenance records at the face to face last week, but I realised we hadn't
sought explicit consensus on this.


On 27/01/2013 19:25, Phil Ritchie wrote:

Yes, that's fine with me. We should keep characteristics like these
consistent across categories. 


From:        Dave Lewis  <mailto:dave.lewis@cs.tcd.ie>
To:        Chase Tingley  <mailto:chase@spartansoftwareinc.com>
Cc:        public-multilingualweb-lt-comments@w3.org,
public-multilingualweb-lt@w3.org, kevin@spartanconsultinginc.com 
Date:        26/01/2013 12:51 
Subject:        Re: ISSUE-72: Provenance Data Category - same for


Thanks Chase.

A logical follow-on question for LocQualityIssue implementors (as the other
data category with stand off markup with multiple elements): Should we make
the order of locQualityIssue element within a locQualityIssues stand off
element reflect the order they were added in the same way?

i.e. after the definition of locQualityIssues we add  text:
"The order of its:locQualityIssue elements within a its:locQualityIssues
element should reflect the order with which they were added to the document,
with the most recently added one listed first."

Phil, guys?


On 25/01/2013 19:37, Chase Tingley wrote: 
Hi Dave, 

That sounds good. 


On Thu, Jan 24, 2013 at 12:41 AM, Dave Lewis <dave.lewis@cs.tcd.ie> wrote: 
Hi Chase,
Thanks for getting back to us on this.

In relation to ordering of its:provenanceRecord I propose therefore to add
the following sentence to the provenance section, after we introduce this

"The order of its:provenanceRecord elements within a its:provenanceRecords
element should reflect the order with which they were added to the document,
with the most recently added one listed first."

Can signal whether you are happy with this? 

Then given, your comments also on the time annotation issue below, I think I
will be able to close this issue.

thanks again for this comment,

On 23/01/2013 18:17, Chase Tingley wrote: 
Hi Dave & Pablo, 

Thanks for the responses.  Comments inline 

On Tue, Jan 22, 2013 at 5:39 PM, Dave Lewis <dave.lewis@cs.tcd.ie> wrote: 
Hi Chase, Kevin, all,
First thanks to Pablo for his response. Some further responses inline below
related to timing:

On 15/01/2013 17:33, Pablo Nieto Caride wrote: 

Hi Felix, all, 



>ii) Similarly, does the ordering of provenance records within a
<provenanceRecords> element make a statement about the (temporal) order in
which the records were created?  If an ordering is implied, it raises
questions about the implied ordering in a document where provenance records
are declared both globally and via local markup. 


Certainly the spec does not talk about temporal order, but given that
records cannot be declared both globally and via local markup for a single
element, the way I see it, and to simplify things, each provenance record
should be older than the previous one. 

I think the best we can do is offer best practice advice that the order with
which more than one its:provenanceRecord are listed in its:provenanceRecords
element should reflect the order they were added to the document rather than
the order in which the translation(revision) actually happened. 

Pablo, could you confirm that you intend the oldest one to be listed last? 

I don't think we can mandate that the order indicated the order in which the
activity indicated in the record (translation or translation revision) were
preformed. This information may not be available to the processor adding the
annotation. For example a TMS may add this annotation after receiving
translation revisions from two different translators both for multiple
elements but without per element timing information, so it wouldn't know the
order in which the actual revisions were performed. Alternatively their
timings may be known for different elements, but they overlap in time, so
there wouldn't be an obvious order for the records. 

I think this makes sense.  It's more important to me that the overall
semantics be clear than that the ordering work one way or another.  Just the
knowledge that, for example, provenance records are more like a list than a
bag is an important detail. 

>iii) More generally, we observe that provenance records lack a date/time
attribute, which makes their semantics as a form of history somewhat muddy.
In practice, a single tool/agent may edit a single document multiple times
in succession over an arbitrary period of time.  Should these multiple
"sessions" be represented by a single logical provenance record?  Or is it
the intention of the spec that the agent add a provenance record for each of
these sessions in which a modification is made to the document? 


As I said in the previous point any modification of the content should add a
new provenance record, at least is what I had in mind. 
The original requirements for the proveance data category primarily were
intended to identifiy and differentiate the _agents_ involved in translation
or revising translations different parts of a document. Its not clear what
would be the best form of timing information. Should it be the period over
which the agents conducted the translation(revison) or the instance in time
at which they completed it. As indicated above, even just determining the
ordering, let alone the absolute timing of the activity, can be complicated,
and would require collection of this information to be pushed downstream to
CAT tools that aren't otherwise ITS aware. This might present an
implementation barrier if correct timing was mandated. 

Yes, you're right that this gets very messy when you consider aggregating
provenance data from multiple agents that may have been processing in
parallel.  The main point I wanted to clarify was that the purpose of the
data category was to identify agents as opposed to "processing events".  I
think this is enough for now. 


This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the sender immediately by e-mail.


Received on Monday, 28 January 2013 10:25:24 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:32:00 UTC