W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > October 2012

Re: ACTION-233: Update quality issue example to use the solution (XML in "script" tag) for standoff markup

From: Phil Ritchie <philr@vistatec.ie>
Date: Tue, 2 Oct 2012 13:03:20 +0100
To: Felix Sasaki <fsasaki@w3.org>
Cc: public-multilingualweb-lt@w3.org
Message-ID: <OF1D2B611F.24089B05-ON80257A8B.0041D35C-80257A8B.00423983@vistatec.ie>
My approach relies on javascript manipulating the DOM and constructing 
standoff but from a quick hack it looks as though I can construct it 
within the script tag.

Phil.





From:   Felix Sasaki <fsasaki@w3.org>
To:     Phil Ritchie <philr@vistatec.ie>, 
Cc:     public-multilingualweb-lt@w3.org
Date:   02/10/2012 12:33
Subject:        Re: ACTION-233: Update quality issue example to use the 
solution (XML  in "script" tag) for standoff markup





2012/10/2 Phil Ritchie <philr@vistatec.ie>
OK, understood. Hmm. I think use of the script element will break my 
implementation.

Just to be sure - does your implementation rely on javascript processing 
with this standoff approach:

<span its-loc-quality-issue=its-loc-quality-issue 
its-loc-quality-issue-coment="Sentence without capitalization" 
its-loc-quality-issue-severity=30 
its-loc-quality-issue-type=typographical></span>

FYI, the change in the toy example 
http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/html5/qaissues.js
basically meant: adding a call of the XML parser and using different names 
for getting attributes, e.g. its:locQualityIssueSeverity instead of 
its-loc-quality-issue-severity. See the diff here:

[
-    var qielem = document.getElementById(qiref.substr(1));
-    var issues = qielem.childNodes;
     var issueslist = new String;
-    for(i=0; i<issues.length; i++) {
-                if(issues[i].nodeType==1) { issueslist = issueslist +
- issues[i].getAttribute('its-loc-quality-issue-type') + " "; } }
+    var parser = new DOMParser();
+    var standoffits = 
document.getElementById('its-standoff-1').textContent;
+    var doc = parser.parseFromString(standoffits,'application/xml');
+    var locqualityissues = doc.getElementsByTagNameNS('
http://www.w3.org/2005/11/its','locQualityIssues');
+    for(i=0; i<locqualityissues.length; i++)
+    {
+                if (locqualityissues[i].getAttribute('xml:id') == 
qiref.substr(1));
+                { 
+                    var issues = locqualityissues[i].childNodes;}
+                var issueslist = new String;
+        for(i=0; i<issues.length; i++) {
+                    if(issues[i].nodeType==1) { issueslist = issueslist +
+  issues[i].getAttribute('locQualityIssueType') + " "; } }
+    }
]


Felix

 
I'll have to check. 

Phil.





From:        Felix Sasaki <fsasaki@w3.org> 
To:        Phil Ritchie <philr@vistatec.ie>, 
Cc:        public-multilingualweb-lt@w3.org 
Date:        02/10/2012 11:04 
Subject:        Re: ACTION-233: Update quality issue example to use the 
solution (XML in "script" tag) for standoff markup 





2012/10/2 Phil Ritchie <philr@vistatec.ie> 
Felix 

Before I can answer the question can you tell me what the motivation for 
using the script tags is? 

There are two motivations. One is based on 
https://www.w3.org/International/multilingualweb/lt/wiki/LSP_Localization_Chain_Side_Use_Case_Demonstration 

here you have ITS rules files inside HTML5. It seems that this is a 
requirement from Linguaserve: not rules linked, but inside HTML5. So far 
Linguaserve has put the rules files "just somewhere". That makes it 
invalid HTML5. With the rules in the "script" element, it becomes valid 
again. 
The other motivation is that the standoff we had so far for HTML5 looked 
like this: 

 <span its-loc-quality-issues-ref=#lq1>c'es</span> le contenu</p> 
                <span id=lq1 
its-loc-quality-issues=its-loc-quality-issues> 
                    <span 
                        its-loc-quality-issue=its-loc-quality-issue 
                        its-loc-quality-issue-coment="Sentence without 
capitalization" 
                        its-loc-quality-issue-severity=30 
                        its-loc-quality-issue-type=typographical></span> 
                    <span 
                        its-loc-quality-issue=its-loc-quality-issue 
                        its-loc-quality-issue-coment="'c'es' is unknown. 
Could be 'c'est'" 
                        its-loc-quality-issue-severity=50 
                        its-loc-quality-issue-type=misspelling></span> 
                </span> 


 "span" is mis-used to "transport" standoff metadata in the "body" 
element. It works, but is not very clean. Hence "script" which is defined 
for that purpose, see 
http://dev.w3.org/html5/spec/the-script-element.html  
about "application/xml" and other types: 
"These types are explicitly listed here because they are poorly-defined 
types that are nonetheless likely to be used as formats for data blocks, 
and it would be problematic if they were suddenly to be interpreted as 
script by a user agent." 
Jirka had mentioned this solution afternonn 26 
http://www.w3.org/2012/09/26-mlw-lt-minutes.html 
search for "current recommendation is to put the tool info xml into script 
in html" 
and pointed us to the related DOM methods 
https://developer.mozilla.org/en-US/docs/DOM/DOMParser 

Felix 


My demo in Prague used standoff without needing to wrap them in script 
tags. 

Phil.





From:        Felix Sasaki <fsasaki@w3.org> 
To:        public-multilingualweb-lt@w3.org, 
Date:        02/10/2012 09:17 
Subject:        ACTION-233: Update quality issue example to use the 
solution (XML in  "script" tag) for standoff markup 




Hi all, 

I updated the qaissue example to use XML in the script element, see 
http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-locQualityIssue-html5-local-2 

the standoff metadata is now in a dedicated "script" element. See also 
http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/html5/EX-locQualityIssue-html5-local-2.html 

http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/html5/qaissues.js 


So this works, but I have a question to the implementors using HTML5 as an 
input for processing outside the browser. 
If you process 
http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/html5/EX-locQualityIssue-html5-local-2.html 

with the validator.nu HTML5 parser, the content of "script" is not "seen" 
as XML. The output then is 

<html xmlns="http://www.w3.org/1999/xhtml">... 
<script type="application/xml" id="its-standoff-1"> 
  &lt;its:locQualityIssues xml:id="lq1" xmlns:its="
http://www.w3.org/2005/11/its"&gt; 
   &lt;its:locQualityIssue 
    locQualityIssueType="misspelling" 
    locQualityIssueComment="'c'es' is unknown. Could be 'c'est'" 
    locQualityIssueSeverity="50"/&gt; 
   &lt;its:locQualityIssue 
    locQualityIssueType="typographical" 
    locQualityIssueComment="Sentence without capitalization" 
    locQualityIssueSeverity="30"/&gt; 
  &lt;/its:locQualityIssues&gt; 
</script>...</html> 

So if we would have an XML-based tool that wants to pick up the ITS 
standoff information, it won't work.  
Currently, Linguaserve is using this approach 
https://www.w3.org/International/multilingualweb/lt/wiki/LSP_Localization_Chain_Side_Use_Case_Demonstration 

to embed ITS rules into an HTML file. I had hoped that the "script" 
element would have been an alternative - is it? 
I'm sure this is not a difficult problem, but we probably need some 
guidance for implementors who are not used to process HTML5. 

Felix 

-- 
Felix Sasaki 
DFKI / W3C Fellow 

************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the sender immediately by e-mail. 
www.vistatec.com
************************************************************ 



-- 
Felix Sasaki 
DFKI / W3C Fellow 


************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the sender immediately by e-mail.
www.vistatec.com
************************************************************



-- 
Felix Sasaki
DFKI / W3C Fellow


************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the sender immediately by e-mail.

www.vistatec.com
************************************************************
Received on Tuesday, 2 October 2012 12:03:50 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:31:55 UTC