- From: Felix Sasaki <fsasaki@w3.org>
- Date: Wed, 9 May 2012 08:55:11 +0200
- To: Dave Lewis <dave.lewis@cs.tcd.ie>
- Cc: "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>
- Message-ID: <CAL58czoYOPaYmhfp-+ibnWhNHWXTCCvrrO9+rZk8z_5KBKv97A@mail.gmail.com>
Hi Dave, 2012/5/9 Dave Lewis <dave.lewis@cs.tcd.ie> > Dear all, > Here are some notes on how we might consolidate author, revisionAgent and > translationAgent by alignment with the work of the W3C Provenance WG > http://www.w3.org/2011/prov/wiki/Main_Page. > > The model of this working group is most simply summarised by figure > http://www.w3.org/TR/prov-dm/#prov-dm-overview in the core Data Model > specification. Essentially the provenance model is intended to allow > recording of how *entities* were used and generated by *activities* which > are conducted through the action of *agents*. There are then a bunch of > relation that tie these together, such as 'wasGeneratedBy', > 'wasDerivedFrom', 'used' see: > http://www.w3.org/TR/prov-dm/#prov-dm-types-and-relations > > The model specifies the format of provenance records, but unlike most ITS > tags, the intent is for these records to be maintained in a dedicated > store. Like ITS, it defines an abstract notation (PROV-DM) for such > records, and then defines different implementations, namely a text-file > format (PROV-N), an ontology version that can be mapped into RDF (PROV-O), > a restful access and query mechanism returning records as HTML (PROV-AQ), > and an XML binding (PROV-XML). These specs are going through the W3C > process currently, with the aim of reaching recommendations status by > Jan'13. > Currently this group is chartered only until October this year https://www.w3.org/Member/Mail/ Again I may have missed things, but are you sure about the progress of this or could you take an action to talk to the provenance co-chairs about their timeline? > > Briefly, the most direct mapping to ITS would be some sort of binding > between host document and their elements and entities as recorded in > provenance records. The binding will depend on the implementation used for > the provenance, e.g. just a URL, an XPOINTER, or a file URL and an entitiy > record ID within that file. Using the last of these we could imagine: > > <span its-prov-ref="http://www.eg.org/prov-ex1.txt"<http://www.eg.org/prov-ex1.txt>its-prov-ent="e1">My > hovercraft is full of eels.</span> > <span its-prov-ref="http://www.eg.org/prov-ex1.txt"<http://www.eg.org/prov-ex1.txt>its-prov-ent="e2">Mon > aéroglisseur est plein d'anguilles.</span> > > where http://www.eg.org/provex1.txt would contain something like: > > entity(e1) > entity(e2) > > which in turn could be referenced by an activities a1: > > wasGeneratedBy(e1, a1, 2011-11-16T16:05:30) -- specifies that an entity > was generated y an activity at a specific time > > activity(a1, 2011-11-16T16:05:00, 2011-11-16T16:06:00, > [its-prov-process-type="authorContent", its-source-lang="en"] ) -- > identifies an activity, its start and stop time and other relevant > attributes > > -- similarly we can define that e2 was generated by a machine translation > process > wasGeneratedBy(e2, a2, 2011-11-16T16:07:30) > activity(a1, 2011-11-16T16:07:00, 2011-11-16T16:08:00, > [its-prov-process-type="mTranslate"] ) > > -- then we can define agents associated with these activities > agent(Trevor, [ prov:type="Person", its-prov-agent-type="author" ] ) > agent(matrex-eng1234, [ prov:type="SoftwareAgent", > its-prov-agent-type="smt", its-prov-src-lang="en", its-prov-tgt-lang="fr" ] > ) > > wasAssociatedWith(a1, Trevor) > wasAssociatedWith(a2, matrex-eng1234) > > So you can see that this stand-off meta data approach based on the PROV > model means we can also record things like the suggested qualityError data > category ( > http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#qualityError > ) > This worries me a bit - we agreed not to conflate data categories, and what you suggest created a dependency between qualityError and provenance. Wouldn't it be better to keep them separately? > > entity(e3, [its-ent-type="qa-error-report", its-qa-err-severity="0.5", > its-qa-err-note="suspect terminology") --actually PROV has an annotation > structure that could be used instead of its-qa-err-note > > wasGeneratedBy(g1, e3, a3, 2011-11-16T16:08:30) > wasDerivedFrom(e3, e2, a3, g1) > activity(a3, 2011-11-16T16:08:00, 2011-11-16T16:09:00, > [its-prov-process-type="translateQA", its-prov-qa-ruleset="LISAQA"] ) > wasAssociatedWith(a3, Pierre) > agent(Pierre, [ prov:type="Person", its-prov-agent-type="trans-QA-checker" > ] ) > > This approach makes it easy to have several different provenance entities > associated with any particular doc, element or span, and heads off the > likely high level of ITS markup overhead that may occur if several > provenance records are applied. > > What is required in tersm of specification is the set of additional > attribtue we want to use and their value. Essentially this would be a > profile of the PROV specs. We may need to liaise with that working group on > how to do this since I can't see that they have addressed this yet. > > Note that its-prov-process-type should result from the consideration given > to section > http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#Process_Model > > In this way we can replace the author, revisionAgent, translationAgent and > perhaps also the quality data categories by the 'its-prov-ent' data > category to reference the entity representing the doc/element/span and then > through profiling let the PROV spec do the rest. > Like above, I am worried by combining data categories. I assume that you see a benefit in merging them, but it may create a lot of complexity for people not interested in provenance. Felix > > all comments welcome, we can discuss this more on thursday's call. > cheers, > Dave > > -- Felix Sasaki DFKI / W3C Fellow
Received on Wednesday, 9 May 2012 06:55:39 UTC