ISSUE-72: Provenance Data Category

Hi Felix, all,

 

>i) Can an element have both local provenance data (either inline or via
local standoff markup) and also reference global provenance data (declared
via global standoff markup) using the attribute specified globally via
provenanceRecordsRefPointer?  The draft does not specify.

 

The answer is NO, as Yves stated “overriding rules takes care of that”, that
is, any provenanceRecordsRefPointer of a global rule would be overridden by
the local provenanceRecordsRef or the inline local markup. Maybe a note
about overriding, or a reference to
http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#s
election-precedence is needed on every metadata section just as a reminder?

 

>ii) Similarly, does the ordering of provenance records within a
<provenanceRecords> element make a statement about the (temporal) order in
which the records were created?  If an ordering is implied, it raises
questions about the implied ordering in a document where provenance records
are declared both globally and via local markup.

 

Certainly the spec does not talk about temporal order, but given that
records cannot be declared both globally and via local markup for a single
element, the way I see it, and to simplify things, each provenance record
should be older than the previous one.

 

>iii) More generally, we observe that provenance records lack a date/time
attribute, which makes their semantics as a form of history somewhat muddy.
In practice, a single tool/agent may edit a single document multiple times
in succession over an arbitrary period of time.  Should these multiple
"sessions" be represented by a single logical provenance record?  Or is it
the intention of the spec that the agent add a provenance record for each of
these sessions in which a modification is made to the document?

 

As I said in the previous point any modification of the content should add a
new provenance record, at least is what I had in mind.

 

>iv) We would also note the complexity of implementing this data category
correctly.  For example, consider an example based on Example 63.  In this
example, an XML document contains two pieces of text, each of which has been
affected by a previous tool.  A single provenance record is encoded using
global standoff notation:

 
<text xmlns:dc=" <http://purl.org/dc/elements/1.1/>
http://purl.org/dc/elements/1.1/"
  xmlns:its=" <http://www.w3.org/2005/11/its> http://www.w3.org/2005/11/its"
its:version="2.0">
  <dc:creator>John Doe</dc:creator>
  <its:provenanceRecords xml:id="pr1">
    <its:provenanceRecord
      toolRef=" <http://www.onlinemtex.com/2012/7/25/wsdl/>
http://www.onlinemtex.com/2012/7/25/wsdl/"
      org="acme-CAT-v2.3"
      revToolRef=" <http://www.mycat.com/v1.0/download>
http://www.mycat.com/v1..0/download"
      revOrg="acme-CAT-v2.3"
      provRef="
<http://www.examplelsp.com/excontent987/production/prov/e6354>
http://www.examplelsp.com/excontent987/production/prov/e6354"/>
  </its:provenanceRecords>
  <its:rules version="2.0">
    <its:provRule selector="//*[@ref]" provenanceRecordsRefPointer="@ref"/>
  </its:rules>
  <title>Translation Revision Provenance Agent: Global Test in XML</title>
  <body>
    <par ref="#pr1"> This paragraph was translated from the machine.</par>
    <legalnotice ref="#pr1">This text was also translated from the
machine.</legalnotice>
  </body>
</text>

Now, a second agent modifies the file, affecting only the <legalnotice>
content.  In this case, the shared provenance record must be forked into a
duplicate record to which the second agent can be added:

 
<text xmlns:dc=" <http://purl.org/dc/elements/1.1/>
http://purl.org/dc/elements/1.1/"
  xmlns:its=" <http://www.w3.org/2005/11/its> http://www.w3.org/2005/11/its"
its:version="2.0">
  <dc:creator>John Doe</dc:creator>
  <its:provenanceRecords xml:id="pr1">
    <its:provenanceRecord
      toolRef=" <http://www.onlinemtex.com/2012/7/25/wsdl/>
http://www.onlinemtex.com/2012/7/25/wsdl/"
      org="acme-CAT-v2.3"
      revToolRef=" <http://www.mycat.com/v1.0/download>
http://www.mycat.com/v1.0/download"
      revOrg="acme-CAT-v2.3"
      provRef="
<http://www.examplelsp.com/excontent987/production/prov/e6354>
http://www.examplelsp.com/excontent987/production/prov/e6354"/>
  </its:provenanceRecords>
  <its:provenanceRecords xml:id="pr2">
 
    <its:provenanceRecord
      toolRef=" <http://www.onlinemtex.com/2012/7/25/wsdl/>
http://www.onlinemtex.com/2012/7/25/wsdl/"
      org="acme-CAT-v2.3"
      revToolRef=" <http://www.mycat.com/v1.0/download>
http://www.mycat.com/v1..0/download"
      revOrg="acme-CAT-v2.3"
      provRef="
<http://www.examplelsp.com/excontent987/production/prov/e6354>
http://www.examplelsp.com/excontent987/production/prov/e6354"/>
 
 
 
<its:provenanceRecord
      revPerson="John Smith"
      revOrgRef=" <http://john-smith.qa.example.com/>
http://john-smith.qa.example.com"/>
 
 
 
</its:provenanceRecords>
  <its:rules version="2.0">
    <its:provRule selector="//*[@ref]" provenanceRecordsRefPointer="@ref"/>
  </its:rules>
  <title>Translation Revision Provenance Agent: Global Test in XML</title>
  <body>
    <par ref="#pr1"> This paragraph was translated from the machine.</par>
    <legalnotice ref="#pr2">This text was translated by machine and then
post-edited..</legalnotice>
  </body>
</text>

>In this case, the tool would have the option of leaving the shared global
record and then using local standoff markup to encode the second record
(assuming that this combination of global & local records is permissible --
see bove).  However, there are other cases in which the agent would need to
perform complex markup manipulations, such as a scenario in which local
inline markup (encoding a single provenance record) must be replaced with
local standoff markup that contains multiple records.

This complexity may present a barrier to consistent implementation.  It may
be worth examining whether it's possible for a newly-created provenance
record to reference previously existing provenance records (forming a
"chain") in order to minimize the amount of markup that would need to be
rewritten by compliant implementations.

 

I see the point and I agree this adds a lot of complexity to the
implementations, but  as far as I’m concerned I think it’s the best way to
do it and it would leave the spec as it is, besides IMO a chain or a tree
approach with several references to other records would complicate things
more if possible. The only problem I see with this but from a code viewer
point of view, is the scenario where hundreds of modification are performed
over one or more elements.

 

Cheers,

__________________________________

Pablo Nieto Caride

Dpto. Técnico/I+D+i

Linguaserve Internacionalización de Servicios, S.A.

Tel.: +34 91 761 64 60 ext. 0422
Fax: +34 91 542 89 28 

E-mail:  <mailto:pablo.nieto@linguaserve.com> pablo.nieto@linguaserve.com

 <http://www.linguaserve.com/> www.linguaserve.com

 

«En cumplimiento con lo previsto con los artículos 21 y 22 de la Ley
34/2002, de 11 de julio, de Servicios de la Sociedad de Información y
Comercio Electrónico, le informamos que procederemos al archivo y
tratamiento de sus datos exclusivamente con fines de promoción de los
productos y servicios ofrecidos por LINGUASERVE INTERNACIONALIZACIÓN DE
SERVICIOS, S.A. En caso de que Vdes. no deseen que procedamos al archivo y
tratamiento de los datos proporcionados, o no deseen recibir comunicaciones
comerciales sobre los productos y servicios ofrecidos, comuníquenoslo a
clients@linguaserve.com, y su petición será inmediatamente cumplida.»

 

"According to the provisions set forth in articles 21 and 22 of Law 34/2002
of July 11 regarding Information Society and eCommerce Services, we will
store and use your personal data with the sole purpose of marketing the
products and services offered by LINGUASERVE INTERNACIONALIZACIÓN DE
SERVICIOS, S.A. If you do not wish your personal data to be stored and
handled, or you do not wish to receive further information regarding
products and services offered by our company, please e-mail us to
clients@linguaserve.com. Your request will be processed immediately.”

__________________________________

Received on Tuesday, 15 January 2013 17:33:58 UTC