Re: ACTION-233: Update quality issue example to use the solution (XML in "script" tag) for standoff markup

2012/10/2 Phil Ritchie <philr@vistatec.ie>

> Felix
>
> Before I can answer the question can you tell me what the motivation for
> using the script tags is?


There are two motivations. One is based on
https://www.w3.org/International/multilingualweb/lt/wiki/LSP_Localization_Chain_Side_Use_Case_Demonstration
here you have ITS rules files inside HTML5. It seems that this is a
requirement from Linguaserve: not rules linked, but inside HTML5. So far
Linguaserve has put the rules files "just somewhere". That makes it invalid
HTML5. With the rules in the "script" element, it becomes valid again.
The other motivation is that the standoff we had so far for HTML5 looked
like this:

 <span its-loc-quality-issues-ref=#lq1>c'es</span> le contenu</p>

                <span id=lq1 its-loc-quality-issues=its-loc-quality-issues>

                    <span

                        its-loc-quality-issue=its-loc-quality-issue

                        its-loc-quality-issue-coment="Sentence without
capitalization"

                        its-loc-quality-issue-severity=30

                        its-loc-quality-issue-type=typographical></span>

                    <span

                        its-loc-quality-issue=its-loc-quality-issue

                        its-loc-quality-issue-coment="'c'es' is unknown.
Could be 'c'est'"

                        its-loc-quality-issue-severity=50

                        its-loc-quality-issue-type=misspelling></span>

                </span>


 "span" is mis-used to "transport" standoff metadata in the "body" element.
It works, but is not very clean. Hence "script" which is defined for that
purpose, see
http://dev.w3.org/html5/spec/the-script-element.html
about "application/xml" and other types:
"These types are explicitly listed here because they are poorly-defined
types that are nonetheless likely to be used as formats for data blocks,
and it would be problematic if they were suddenly to be interpreted as
script by a user agent."
Jirka had mentioned this solution afternonn 26
http://www.w3.org/2012/09/26-mlw-lt-minutes.html
search for "current recommendation is to put the tool info xml into script
in html"
and pointed us to the related DOM methods
https://developer.mozilla.org/en-US/docs/DOM/DOMParser

Felix


My demo in Prague used standoff without needing to wrap them in script tags.
>
> Phil.
>
>
>
>
>
> From:        Felix Sasaki <fsasaki@w3.org>
> To:        public-multilingualweb-lt@w3.org,
> Date:        02/10/2012 09:17
> Subject:        ACTION-233: Update quality issue example to use the
> solution (XML in  "script" tag) for standoff markup
> ------------------------------
>
>
>
> Hi all,
>
> I updated the qaissue example to use XML in the script element, see
> *
> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-locQualityIssue-html5-local-2
> *<http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#EX-locQualityIssue-html5-local-2>
> the standoff metadata is now in a dedicated "script" element. See also
> *
> http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/html5/EX-locQualityIssue-html5-local-2.html
> *<http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/html5/EX-locQualityIssue-html5-local-2.html>
> *
> http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/html5/qaissues.js
> *<http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/html5/qaissues.js>
>
> So this works, but I have a question to the implementors using HTML5 as an
> input for processing outside the browser.
> If you process
> *
> http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/html5/EX-locQualityIssue-html5-local-2.html
> *<http://www.w3.org/International/multilingualweb/lt/drafts/its20/examples/html5/EX-locQualityIssue-html5-local-2.html>
> with the *validator.nu* <http://validator.nu/> HTML5 parser, the content
> of "script" is not "seen" as XML. The output then is
>
> <html xmlns="*http://www.w3.org/1999/xhtml* <http://www.w3.org/1999/xhtml>
> ">...
> <script type="application/xml" id="its-standoff-1">
>   &lt;its:locQualityIssues xml:id="lq1" xmlns:its="*
> http://www.w3.org/2005/11/its* <http://www.w3.org/2005/11/its>"&gt;
>    &lt;its:locQualityIssue
>     locQualityIssueType="misspelling"
>     locQualityIssueComment="'c'es' is unknown. Could be 'c'est'"
>     locQualityIssueSeverity="50"/&gt;
>    &lt;its:locQualityIssue
>     locQualityIssueType="typographical"
>     locQualityIssueComment="Sentence without capitalization"
>     locQualityIssueSeverity="30"/&gt;
>   &lt;/its:locQualityIssues&gt;
> </script>...</html>
>
> So if we would have an XML-based tool that wants to pick up the ITS
> standoff information, it won't work.
> Currently, Linguaserve is using this approach
> *
> https://www.w3.org/International/multilingualweb/lt/wiki/LSP_Localization_Chain_Side_Use_Case_Demonstration
> *<https://www.w3.org/International/multilingualweb/lt/wiki/LSP_Localization_Chain_Side_Use_Case_Demonstration>
> to embed ITS rules into an HTML file. I had hoped that the "script"
> element would have been an alternative - is it?
> I'm sure this is not a difficult problem, but we probably need some
> guidance for implementors who are not used to process HTML5.
>
> Felix
>
> --
> Felix Sasaki
> DFKI / W3C Fellow
>
>
> ************************************************************
> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they
> are addressed. If you have received this email in error please notify
> the sender immediately by e-mail.
>
> www.vistatec.com
> ************************************************************
>



-- 
Felix Sasaki
DFKI / W3C Fellow

Received on Tuesday, 2 October 2012 10:04:28 UTC