Re: [ACTION-81] consider consolidation of author, revisionAgent and translationAgent

Hi Dave,

2012/5/9 Dave Lewis <dave.lewis@cs.tcd.ie>

>  Dear all,
> Here are some notes on how we might consolidate author, revisionAgent and
> translationAgent by alignment with the work of the W3C Provenance WG
> http://www.w3.org/2011/prov/wiki/Main_Page.
>
> The model of this working group is most simply summarised by figure
> http://www.w3.org/TR/prov-dm/#prov-dm-overview in the core Data Model
> specification. Essentially the provenance model is intended to allow
> recording of how *entities* were used and generated by *activities* which
> are conducted through the action of *agents*. There are then a bunch of
> relation that tie these together, such as 'wasGeneratedBy',
> 'wasDerivedFrom', 'used' see:
> http://www.w3.org/TR/prov-dm/#prov-dm-types-and-relations
>
> The model specifies the format of provenance records, but unlike most ITS
> tags, the intent is for these records to be maintained in a dedicated
> store. Like ITS, it defines an abstract notation (PROV-DM) for such
> records, and then defines different implementations, namely a text-file
> format (PROV-N), an ontology version that can be mapped into RDF (PROV-O),
> a restful access and query mechanism returning records as HTML (PROV-AQ),
> and an XML binding (PROV-XML). These specs are going through the W3C
> process currently, with the aim of reaching recommendations status by
> Jan'13.
>

Currently this group is chartered only until October this year
https://www.w3.org/Member/Mail/
Again I may have missed things, but are you sure about the progress of this
or could you take an action to talk to the provenance co-chairs about their
timeline?


>
> Briefly, the most direct mapping to ITS would be some sort of binding
> between host document and their elements and entities as recorded in
> provenance records. The binding will depend on the implementation used for
> the provenance, e.g. just a URL, an XPOINTER, or a file URL and an entitiy
> record ID within that file. Using the last of these we could imagine:
>
> <span its-prov-ref="http://www.eg.org/prov-ex1.txt"<http://www.eg.org/prov-ex1.txt>its-prov-ent="e1">My
> hovercraft is full of eels.</span>
> <span its-prov-ref="http://www.eg.org/prov-ex1.txt"<http://www.eg.org/prov-ex1.txt>its-prov-ent="e2">Mon
> aéroglisseur est plein d'anguilles.</span>
>
> where http://www.eg.org/provex1.txt would contain something like:
>
> entity(e1)
> entity(e2)
>
> which in turn could be referenced by an activities a1:
>
> wasGeneratedBy(e1, a1, 2011-11-16T16:05:30) -- specifies that an entity
> was generated y an activity at a specific time
>
> activity(a1, 2011-11-16T16:05:00, 2011-11-16T16:06:00,
> [its-prov-process-type="authorContent", its-source-lang="en"] ) --
> identifies an activity, its start and stop time and other relevant
> attributes
>
> -- similarly we can define that e2 was generated by a machine translation
> process
> wasGeneratedBy(e2, a2, 2011-11-16T16:07:30)
> activity(a1, 2011-11-16T16:07:00, 2011-11-16T16:08:00,
> [its-prov-process-type="mTranslate"] )
>
> -- then we can define agents associated with these activities
> agent(Trevor, [ prov:type="Person", its-prov-agent-type="author" ] )
> agent(matrex-eng1234, [ prov:type="SoftwareAgent",
> its-prov-agent-type="smt", its-prov-src-lang="en", its-prov-tgt-lang="fr" ]
> )
>
> wasAssociatedWith(a1, Trevor)
> wasAssociatedWith(a2, matrex-eng1234)
>
> So you can see that this stand-off meta data approach based on the PROV
> model means we can also record things like the suggested qualityError data
> category (
> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#qualityError
> )
>

This worries me a bit - we agreed not to conflate data categories, and what
you suggest created a dependency between qualityError and provenance.
Wouldn't it be better to keep them separately?


>
> entity(e3, [its-ent-type="qa-error-report", its-qa-err-severity="0.5",
> its-qa-err-note="suspect terminology") --actually PROV has an annotation
> structure that could be used instead of its-qa-err-note
>
> wasGeneratedBy(g1, e3, a3, 2011-11-16T16:08:30)
> wasDerivedFrom(e3, e2, a3, g1)
> activity(a3, 2011-11-16T16:08:00, 2011-11-16T16:09:00,
> [its-prov-process-type="translateQA", its-prov-qa-ruleset="LISAQA"] )
> wasAssociatedWith(a3, Pierre)
> agent(Pierre, [ prov:type="Person", its-prov-agent-type="trans-QA-checker"
> ] )
>
> This approach makes it easy to have several different provenance entities
> associated with any particular doc, element or span, and heads off the
> likely high level of ITS markup overhead that may occur if  several
> provenance records are applied.
>
> What is required in tersm of specification is the set of additional
> attribtue we want to use and their value. Essentially this would be a
> profile of the PROV specs. We may need to liaise with that working group on
> how to do this since I can't see that they have addressed this yet.
>
> Note that its-prov-process-type should result from the consideration given
> to section
> http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#Process_Model
>
> In this way we can replace the author, revisionAgent, translationAgent and
> perhaps also the quality data categories by the 'its-prov-ent' data
> category to reference the entity representing the doc/element/span and then
> through profiling let the PROV spec do the rest.
>


Like above, I am worried by combining data categories. I assume that you
see a benefit in merging them, but it may create a lot of complexity for
people not interested in provenance.

Felix



>
> all comments welcome, we can discuss this more on thursday's call.
> cheers,
> Dave
>
>


-- 
Felix Sasaki
DFKI / W3C Fellow

Received on Wednesday, 9 May 2012 06:55:39 UTC