Re: ITS reset for inherited metadata

Hi Nathan,

Am 08.08.13 02:27, schrieb Nathan Glenn:
> Thanks for the terminology fix (I've been wondering how to say that 
> properly)! Thanks also for the pointer to Kosek's converters. They do 
> look useful, however they aren't suitable for my application. Kosek's 
> converters are for converting between XHTML and HTML5, but I need to 
> convert general XML to valid HTML (for the purpose of later displaying 
> ITS information with the document).

Jirka's converter is available for general XML as part of the docbook 
stylesheets, see documentation at
http://xmlguru.cz/2013/05/docbook-and-its2
and the relevant stylesheet at
http://snapshots.docbook.org/xsl-ns/html/its.xsl
So I think you can use that in an XSLT transformation to implement your 
use case - without creating invalid HTML.

> This means creating legal HTML representations of illegal HTML that 
> has ITS info attached (e.g. turning attributes into elements and 
> sticking them in the parent for display). The output HTML has to be 
> processable by general ITS/HTML tools (or at least the one we're 
> making is supposed to be general), so I can't use a homemade standoff 
> markup. 

I think the requirement " The output HTML has to be processable by 
general ITS/HTML tools " is easier to accomodate by using a general XML 
converter than by changing the ITS2.x spec itself. And again: your 
approach of "turning attributes into elements and sticking them in the 
parent for display" is just one tool specific approach to "flatten" ITS 
information - there may be many others.


> I'm just documenting that my solution as partial for now.
> I'll see about adding this feature suggestion to the wiki.

Thanks. When you do that, could you add a link to this thread
http://lists.w3.org/Archives/Public/public-i18n-its-ig/2013Aug/0001.html
?

Best,

Felix

>
> Nathan
>
>
>
> On Tue, Aug 6, 2013 at 11:53 PM, Felix Sasaki <fsasaki@w3.org 
> <mailto:fsasaki@w3.org>> wrote:
>
>     Hi Nathan,
>
>     thank you very much for your comment. A general remark: if you or
>     others propose new features for ITS 2.x, I would suggest that you
>     add them to the interest group wiki
>     http://www.w3.org/International/its/wiki
>     in order to do so you will need a w3c account, see
>     http://www.w3.org/Help/Account/
>
>     Some comments below.
>
>     Am 06.08.13 20:51, schrieb Nathan Glenn:
>>     Hello,
>>     In the course of the WICS project I have found need for a
>>     mechanism to undo metadata inheritance for a given element. I'd
>>     like to suggest it for the next version of ITS.
>>
>>     *Use case:* I am converting ITS-decorated XML into HTML while
>>     keeping ITS markup intact.
>
>     I think above sentence is something else than you are actually
>     doing: what you describe below is not "keeping ITS markup intact",
>     but keeping the ITS *information* (= a serialization of ITS
>     information that has been created via local markup / global rules
>     / inheritance defaults, like e.g. in the test suite
>     https://github.com/finnle/ITS-2.0-Testsuite/ITS-2.0-Testsuite/its2.0/expected/idvalue/html/idvalue1htmloutput.txt
>     ) intact.
>     "keeping ITS markup intact" during a conversion from XML to HTML
>     is relatively simple: convert local its:* attributes to its-*.
>     Jirka Kosek has created a set of stylesheets that realize
>     roundtripping HTML <> XML
>     https://github.com/kosek/html5-its-tools
>     Global rules need to be adapted manually. But this should not be a
>     big issue: normally you will not create one  rule set per XML
>     document but rather per XML format (or per template in a given
>     format).
>
>
>>     Any non-element nodes that are somehow involved with ITS markup
>>     (attributes selected via global rules, comments/PIs etc. pointed
>>     to with *Pointers) are being converted into elements and pasted
>>     near the nodes they represent.
>
>     In your case, you realize "near the nodes" as "converted into
>     child elements". But nobody forces you to store the ITS
>     information that way. You could also have a tool specific
>     mechanism that converts the ITS markup from XML into ITS
>     information in HTML into something like this:
>
>     <html>...
>      <script type=application/xml>
>       <itsInfos>
>        <itsInfo idref="#p1">
>         <itsMetadata translate="yes" locNote="..:" ...></itsMetadata>
>          <attribute name="id">
>           <itsMetadata idValue="p1" translate="no"/>
>         </attribute>
>        </itsInfo>
>       </itsInfo>
>      </script>
>     </head>
>     <body>
>     <p id="p1">...</p>
>     </body></html>
>
>
>     The "p" element contains an "id" attribute. Inside "script" there
>     is an "itsInfo" element. Its "idref" attribute points to the
>     idvalue. In this way the serialized ITS information is not inline,
>     but "standoff". Note that this is a tool specific standoff
>     mechanism, not ITS 2.0 standoff like for provenance or
>     localization quality issue. The advantage compared to your
>     approach is that putting the serialized ITS information into
>     "script" keeps the file HTML valid.
>
>
>>     In the original document, comments/PIs and usually attributes
>>     don't inherit ITS metadata. However, as elements these nodes /do/
>>     inherit ITS metadata. So in order to make the ITS metadata in the
>>     new document completely match the original, I have to undo this
>>     inheritance. This is possible for categories with default values
>>     (I can set dir="ltr", or localeFilter="*") but not for ones that
>>     don't. I could set locNote="", but that is not the same as the
>>     default (which is the non-existence of locNote). The affected
>>     categories are: localization note, language information, domain,
>>     provenance, localization quality issue, localization quality
>>     rating, MT confidence, and allowed characters.
>>
>>     *Desired ITS markup: *The markup that I really need to make this
>>     work is a selective metadata reset. In other words, I need to be
>>     able to say "don't inherit the following categories". Something
>>     like its:reset="mtConfidence;locNote;language-information". It
>>     would also be useful to have a shorthand to prevent inheritance
>>     of anything: its:reset="all". A global version would be nice, too.
>
>
>     I think you can implement your use case in two ways:
>
>     1) When you convert to HTML, do not conversion of ITS information
>     (i.e. your processing instructions or comments) into HTML, but
>     rather convert the ITS markup (see above). An ITS 2 processor that
>     understands HTML will then process your files without any issues.
>
>     2) If 1) doesn't work for you and you need to serialize the ITS
>     information created in XML within the HTML file, choose an
>     approach that does not require direct embedding of the ITS
>     information inside HTML. Embedding the information via elements,
>     btw., might not only break ITS inheritance (as you describe), but
>     also other tools that check HTML (validators) or want to process
>     it (inside or outside the browers).
>
>     Best,
>
>     Felix
>
>

Received on Thursday, 8 August 2013 07:17:22 UTC