Re: Standoff experiment plus observations from Felix Sasaki on 2013-03-19 (public-multilingualweb-lt@w3.org from March 2013)

From: Felix Sasaki <fsasaki@w3.org>
Date: Tue, 19 Mar 2013 17:13:20 +0100
To: Phil Ritchie <philr@vistatec.ie>
CC: "public-multilingualweb-lt@w3.org" <public-multilingualweb-lt@w3.org>
Message-ID: <51488EA0.700@w3.org>
Hi Phil,

Am 19.03.13 16:56, schrieb Phil Ritchie:
> Felix, All,
>
> A question: does the id of an enclosing <script /> element need to be 
> the same as the ITS element it encloses? e.g.

I think: yes, see
http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#lqissue-implementation

"In HTML the standoff markup MUST be stored inside a script element. It 
MUST have a type attribute with the value application/its+xml. Its id 
attribute MUST be set to the same value as the xml:id attribute of the 
locQualityIssues element it contains."

For provenance we have the same.

Best,

Felix

>
> <script type="application/its+xml" id="*lq0*">
>         <its:locQualityIssues 
> xmlns:its="http://www.w3.org/2005/11/its" xml:id="*lq0*">
>       <its:locQualityIssue locQualityIssueType="non-conformance" 
> locQualityIssueSeverity="75.7961783439491"></its:locQualityIssue>
>         </its:locQualityIssues>
> </script>
>
> I suspect not.
>
> That being the case, I'm not convinced that having the script enclosed 
> metadata point to the span's saves a significant amount of serialized 
> footprint.
>
> Phil.
>
>
>
>
>
> From: Felix Sasaki <fsasaki@w3.org>
> To: "public-multilingualweb-lt@w3.org" 
> <public-multilingualweb-lt@w3.org>,
> Date: 25/02/2013 18:02
> Subject: Standoff experiment plus observations
> ------------------------------------------------------------------------
>
>
>
> Hi all,
>
> Christian, Marcis and Tadej know this (apologies for the repetition) - 
> but I thought others might be interested too.
>
> I played a bit with the NERD API _
> __http://nerd.eurecom.fr/documentation#nerdapi_
>
> 1) I generated ITS "tan" via 4 annotation engines that can be accessed 
> through the api: dpbedia spotlight, extractiv, lupedia, yahoo.
>
> 2 a) I also created a **non** ITS "tan" standoff version, see 
> multiple-ann-with-id-plus-standoff.html . It relies on ID attributes, 
> and the standoff annotations point to the IDs. This is the approach 
> that we had discussed a while ago on the mailing list.
> 2 b) The file multiple-ann-with-standoff-refs-script.html uses our 
> current localization quality issue and provenance standoff approach, 
> that is: pointing from the content to annotations, here via an 
> artificial x-t-ref attribute.
>
>
> From 2), I learned various things:
>
> - Making sure that standoff works requires a known workflow, "know" 
> esp. with regards to white space handling. Otherwise the multiple 
> annotation engines create multiple character offsets. So from this 
> having a recommendation to leave standoff *processing* to NIF makes a 
> lot of sense.
>
> - The non ITS standoff *representation* (see 
> multiple-ann-with-id-plus-standoff.html) has the merit that a human 
> consumer who doesn't know anything about NIF et al. (= somebody in an 
> XML based localization workflow or looking at an HTML document) can 
> look into the annotations and choose: Hover over the green spans of 
> text, e.g. over "St Peter" as part of " held in St Peter's Basilica. 
> ". the annotation from extractiv holds a more specific "its-class-ref" 
> than the one from dbpedia spotlight. But only dbpedia spotlight holds 
> an "its-ident-ref". So a human user consuming these annotations has 
> the most value if he combines them.
>
> - Developing applications based on the output of multiple engines is 
> pretty straightforward for non NLP / NIF people if you have the output 
> represented in an easy to digest format (JSON, XML, ...). I won't 
> argue for standardizing that format and creating ITS "tan" standoff 
> (we had that discussion). I'm mentioning this just because the merit 
> of the annotations in a long term might grow if Web developers face a 
> low barrier for wide spread app development.
>
> - A thought I had during today's discussion of the XLIFF mapping: 
> having the external standoff pointing to IDs might be a way to solve 
> the XLIFF representation issue of "mrk": here the issue is again (it 
> seems) that you want to apply multiple annotations to the same span of 
> text (the content of "mrk") - but you can't since the "type" attribute 
> can be only used once. Externalizing the annotations solves that problem.
>
> - During the discussion of multiple annotations a while ago we also 
> touched upon the "direction" of the standoff: from outside to IDs (see 
> multiple-ann-with-id-plus-standoff.html and 2a), or from the document 
> to the standoff (current loc quality issue / provenance, see 2b) 
> above). Pointint from the document (= 2b) has the drawback in HTML 
> that you need a separate "script" element for each target - whereas in 
> the case of 2a) you only need one script element. So for 2a) in total 
> there are 58 elements, and 2b) has 101 elements.
>
> FYI: with the above observations I won't push for anything - just 
> sharing my experience to see what others think.
>
> Best,
>
> Felix [attachment "multiple-ann-with-id-plus-standoff.html" deleted by 
> Phil Ritchie/VISTATEC] [attachment 
> "multiple-ann-with-standoff-refs-script.html" deleted by Phil 
> Ritchie/VISTATEC]
>
>
> ************************************************************
> VistaTEC Ltd. Registered in Ireland 268483.
> Registered Office, VistaTEC House, 700, South Circular Road,
> Kilmainham. Dublin 8. Ireland.
>
> The information contained in this message, including any accompanying
> documents, is confidential and is intended only for the addressee(s).
> The unauthorized use, disclosure, copying, or alteration of this
> message is strictly forbidden. If you have received this message in
> error please notify the sender immediately.
> ************************************************************
>
Received on Tuesday, 19 March 2013 16:13:52 UTC