- From: Felix Sasaki <fsasaki@w3.org>
- Date: Tue, 19 Mar 2013 17:13:20 +0100
- To: Phil Ritchie <philr@vistatec.ie>
- CC: "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>
- Message-ID: <51488EA0.700@w3.org>
Hi Phil, Am 19.03.13 16:56, schrieb Phil Ritchie: > Felix, All, > > A question: does the id of an enclosing <script /> element need to be > the same as the ITS element it encloses? e.g. I think: yes, see http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#lqissue-implementation "In HTML the standoff markup MUST be stored inside a script element. It MUST have a type attribute with the value application/its+xml. Its id attribute MUST be set to the same value as the xml:id attribute of the locQualityIssues element it contains." For provenance we have the same. Best, Felix > > <script type="application/its+xml" id="*lq0*"> > <its:locQualityIssues > xmlns:its="http://www.w3.org/2005/11/its" xml:id="*lq0*"> > <its:locQualityIssue locQualityIssueType="non-conformance" > locQualityIssueSeverity="75.7961783439491"></its:locQualityIssue> > </its:locQualityIssues> > </script> > > I suspect not. > > That being the case, I'm not convinced that having the script enclosed > metadata point to the span's saves a significant amount of serialized > footprint. > > Phil. > > > > > > From: Felix Sasaki <fsasaki@w3.org> > To: "public-multilingualweb-lt@w3.org" > <public-multilingualweb-lt@w3.org>, > Date: 25/02/2013 18:02 > Subject: Standoff experiment plus observations > ------------------------------------------------------------------------ > > > > Hi all, > > Christian, Marcis and Tadej know this (apologies for the repetition) - > but I thought others might be interested too. > > I played a bit with the NERD API _ > __http://nerd.eurecom.fr/documentation#nerdapi_ > > 1) I generated ITS "tan" via 4 annotation engines that can be accessed > through the api: dpbedia spotlight, extractiv, lupedia, yahoo. > > 2 a) I also created a **non** ITS "tan" standoff version, see > multiple-ann-with-id-plus-standoff.html . It relies on ID attributes, > and the standoff annotations point to the IDs. This is the approach > that we had discussed a while ago on the mailing list. > 2 b) The file multiple-ann-with-standoff-refs-script.html uses our > current localization quality issue and provenance standoff approach, > that is: pointing from the content to annotations, here via an > artificial x-t-ref attribute. > > > From 2), I learned various things: > > - Making sure that standoff works requires a known workflow, "know" > esp. with regards to white space handling. Otherwise the multiple > annotation engines create multiple character offsets. So from this > having a recommendation to leave standoff *processing* to NIF makes a > lot of sense. > > - The non ITS standoff *representation* (see > multiple-ann-with-id-plus-standoff.html) has the merit that a human > consumer who doesn't know anything about NIF et al. (= somebody in an > XML based localization workflow or looking at an HTML document) can > look into the annotations and choose: Hover over the green spans of > text, e.g. over "St Peter" as part of " held in St Peter's Basilica. > ". the annotation from extractiv holds a more specific "its-class-ref" > than the one from dbpedia spotlight. But only dbpedia spotlight holds > an "its-ident-ref". So a human user consuming these annotations has > the most value if he combines them. > > - Developing applications based on the output of multiple engines is > pretty straightforward for non NLP / NIF people if you have the output > represented in an easy to digest format (JSON, XML, ...). I won't > argue for standardizing that format and creating ITS "tan" standoff > (we had that discussion). I'm mentioning this just because the merit > of the annotations in a long term might grow if Web developers face a > low barrier for wide spread app development. > > - A thought I had during today's discussion of the XLIFF mapping: > having the external standoff pointing to IDs might be a way to solve > the XLIFF representation issue of "mrk": here the issue is again (it > seems) that you want to apply multiple annotations to the same span of > text (the content of "mrk") - but you can't since the "type" attribute > can be only used once. Externalizing the annotations solves that problem. > > - During the discussion of multiple annotations a while ago we also > touched upon the "direction" of the standoff: from outside to IDs (see > multiple-ann-with-id-plus-standoff.html and 2a), or from the document > to the standoff (current loc quality issue / provenance, see 2b) > above). Pointint from the document (= 2b) has the drawback in HTML > that you need a separate "script" element for each target - whereas in > the case of 2a) you only need one script element. So for 2a) in total > there are 58 elements, and 2b) has 101 elements. > > FYI: with the above observations I won't push for anything - just > sharing my experience to see what others think. > > Best, > > Felix [attachment "multiple-ann-with-id-plus-standoff.html" deleted by > Phil Ritchie/VISTATEC] [attachment > "multiple-ann-with-standoff-refs-script.html" deleted by Phil > Ritchie/VISTATEC] > > > ************************************************************ > VistaTEC Ltd. Registered in Ireland 268483. > Registered Office, VistaTEC House, 700, South Circular Road, > Kilmainham. Dublin 8. Ireland. > > The information contained in this message, including any accompanying > documents, is confidential and is intended only for the addressee(s). > The unauthorized use, disclosure, copying, or alteration of this > message is strictly forbidden. If you have received this message in > error please notify the sender immediately. > ************************************************************ >
Received on Tuesday, 19 March 2013 16:13:52 UTC