Re: ITS reset for inherited metadata

Thanks Felix. Will do!


On Thu, Aug 8, 2013 at 12:16 AM, Felix Sasaki <fsasaki@w3.org> wrote:

>  Hi Nathan,
>
> Am 08.08.13 02:27, schrieb Nathan Glenn:
>
> Thanks for the terminology fix (I've been wondering how to say that
> properly)! Thanks also for the pointer to Kosek's converters. They do look
> useful, however they aren't suitable for my application. Kosek's converters
> are for converting between XHTML and HTML5, but I need to convert general
> XML to valid HTML (for the purpose of later displaying ITS information with
> the document).
>
>
> Jirka's converter is available for general XML as part of the docbook
> stylesheets, see documentation at
> http://xmlguru.cz/2013/05/docbook-and-its2
> and the relevant stylesheet at
> http://snapshots.docbook.org/xsl-ns/html/its.xsl
> So I think you can use that in an XSLT transformation to implement your
> use case - without creating invalid HTML.
>
>
>  This means creating legal HTML representations of illegal HTML that has
> ITS info attached (e.g. turning attributes into elements and sticking them
> in the parent for display). The output HTML has to be processable by
> general ITS/HTML tools (or at least the one we're making is supposed to be
> general), so I can't use a homemade standoff markup.
>
>
> I think the requirement " The output HTML has to be processable by general
> ITS/HTML tools " is easier to accomodate by using a general XML converter
> than by changing the ITS2.x spec itself. And again: your approach of
> "turning attributes into elements and sticking them in the parent for
> display" is just one tool specific approach to "flatten" ITS information -
> there may be many others.
>
>
>
> I'm just documenting that my solution as partial for now.
> I'll see about adding this feature suggestion to the wiki.
>
>
> Thanks. When you do that, could you add a link to this thread
> http://lists.w3.org/Archives/Public/public-i18n-its-ig/2013Aug/0001.html
> ?
>
> Best,
>
> Felix
>
>
>
>  Nathan
>
>
>
> On Tue, Aug 6, 2013 at 11:53 PM, Felix Sasaki <fsasaki@w3.org> wrote:
>
>>  Hi Nathan,
>>
>> thank you very much for your comment. A general remark: if you or others
>> propose new features for ITS 2.x, I would suggest that you add them to the
>> interest group wiki
>> http://www.w3.org/International/its/wiki
>> in order to do so you will need a w3c account, see
>> http://www.w3.org/Help/Account/
>>
>> Some comments below.
>>
>> Am 06.08.13 20:51, schrieb Nathan Glenn:
>>
>> Hello,
>> In the course of the WICS project I have found need for a mechanism to
>> undo metadata inheritance for a given element. I'd like to suggest it for
>> the next version of ITS.
>>
>>  *Use case:* I am converting ITS-decorated XML into HTML while keeping
>> ITS markup intact.
>>
>>
>>  I think above sentence is something else than you are actually doing:
>> what you describe below is not "keeping ITS markup intact", but keeping the
>> ITS *information* (= a serialization of ITS information that has been
>> created via local markup / global rules / inheritance defaults, like e.g.
>> in the test suite
>>
>> https://github.com/finnle/ITS-2.0-Testsuite/ITS-2.0-Testsuite/its2.0/expected/idvalue/html/idvalue1htmloutput.txt
>> ) intact.
>> "keeping ITS markup intact" during a conversion from XML to HTML is
>> relatively simple: convert local its:* attributes to its-*. Jirka Kosek has
>> created a set of stylesheets that realize roundtripping HTML <> XML
>> https://github.com/kosek/html5-its-tools
>> Global rules need to be adapted manually. But this should not be a big
>> issue: normally you will not create one  rule set per XML document but
>> rather per XML format (or per template in a given format).
>>
>>
>>  Any non-element nodes that are somehow involved with ITS markup
>> (attributes selected via global rules, comments/PIs etc. pointed to with
>> *Pointers) are being converted into elements and pasted near the nodes they
>> represent.
>>
>>
>>  In your case, you realize "near the nodes" as "converted into child
>> elements". But nobody forces you to store the ITS information that way. You
>> could also have a tool specific mechanism that converts the ITS markup from
>> XML into ITS information in HTML into something like this:
>>
>> <html>...
>>  <script type=application/xml>
>>   <itsInfos>
>>    <itsInfo idref="#p1">
>>     <itsMetadata translate="yes" locNote="..:" ...></itsMetadata>
>>      <attribute name="id">
>>       <itsMetadata idValue="p1" translate="no"/>
>>     </attribute>
>>    </itsInfo>
>>   </itsInfo>
>>  </script>
>> </head>
>> <body>
>> <p id="p1">...</p>
>> </body></html>
>>
>>
>> The "p" element contains an "id" attribute. Inside "script" there is an
>> "itsInfo" element. Its "idref" attribute points to the idvalue. In this way
>> the serialized ITS information is not inline, but "standoff". Note that
>> this is a tool specific standoff mechanism, not ITS 2.0 standoff like for
>> provenance or localization quality issue. The advantage compared to your
>> approach is that putting the serialized ITS information into "script" keeps
>> the file HTML valid.
>>
>>
>>  In the original document, comments/PIs and usually attributes don't
>> inherit ITS metadata. However, as elements these nodes *do* inherit ITS
>> metadata. So in order to make the ITS metadata in the new document
>> completely match the original, I have to undo this inheritance. This is
>> possible for categories with default values (I can set dir="ltr", or
>> localeFilter="*") but not for ones that don't. I could set locNote="", but
>> that is not the same as the default (which is the non-existence of
>> locNote). The affected categories are: localization note, language
>> information, domain, provenance, localization quality issue, localization
>> quality rating, MT confidence, and allowed characters.
>>
>>  *Desired ITS markup: *The markup that I really need to make this work
>> is a selective metadata reset. In other words, I need to be able to say
>> "don't inherit the following categories". Something like
>> its:reset="mtConfidence;locNote;language-information". It would also be
>> useful to have a shorthand to prevent inheritance of anything:
>> its:reset="all". A global version would be nice, too.
>>
>>
>>
>>  I think you can implement your use case in two ways:
>>
>> 1) When you convert to HTML, do not conversion of ITS information (i.e.
>> your processing instructions or comments) into HTML, but rather convert the
>> ITS markup (see above). An ITS 2 processor that understands HTML will then
>> process your files without any issues.
>>
>> 2) If 1) doesn't work for you and you need to serialize the ITS
>> information created in XML within the HTML file, choose an approach that
>> does not require direct embedding of the ITS information inside HTML.
>> Embedding the information via elements, btw., might not only break ITS
>> inheritance (as you describe), but also other tools that check HTML
>> (validators) or want to process it (inside or outside the browers).
>>
>> Best,
>>
>> Felix
>>
>
>
>

Received on Thursday, 8 August 2013 17:52:56 UTC