- From: Nathan Glenn <garfieldnate@gmail.com>
- Date: Thu, 8 Aug 2013 10:52:28 -0700
- To: Felix Sasaki <fsasaki@w3.org>
- Cc: public-i18n-its-ig@w3.org
- Message-ID: <CACs83pgEBnjpzN71u75ioVZ9xwD_B2-OZ7G9Rv=DUp3crFbpqg@mail.gmail.com>
Thanks Felix. Will do! On Thu, Aug 8, 2013 at 12:16 AM, Felix Sasaki <fsasaki@w3.org> wrote: > Hi Nathan, > > Am 08.08.13 02:27, schrieb Nathan Glenn: > > Thanks for the terminology fix (I've been wondering how to say that > properly)! Thanks also for the pointer to Kosek's converters. They do look > useful, however they aren't suitable for my application. Kosek's converters > are for converting between XHTML and HTML5, but I need to convert general > XML to valid HTML (for the purpose of later displaying ITS information with > the document). > > > Jirka's converter is available for general XML as part of the docbook > stylesheets, see documentation at > http://xmlguru.cz/2013/05/docbook-and-its2 > and the relevant stylesheet at > http://snapshots.docbook.org/xsl-ns/html/its.xsl > So I think you can use that in an XSLT transformation to implement your > use case - without creating invalid HTML. > > > This means creating legal HTML representations of illegal HTML that has > ITS info attached (e.g. turning attributes into elements and sticking them > in the parent for display). The output HTML has to be processable by > general ITS/HTML tools (or at least the one we're making is supposed to be > general), so I can't use a homemade standoff markup. > > > I think the requirement " The output HTML has to be processable by general > ITS/HTML tools " is easier to accomodate by using a general XML converter > than by changing the ITS2.x spec itself. And again: your approach of > "turning attributes into elements and sticking them in the parent for > display" is just one tool specific approach to "flatten" ITS information - > there may be many others. > > > > I'm just documenting that my solution as partial for now. > I'll see about adding this feature suggestion to the wiki. > > > Thanks. When you do that, could you add a link to this thread > http://lists.w3.org/Archives/Public/public-i18n-its-ig/2013Aug/0001.html > ? > > Best, > > Felix > > > > Nathan > > > > On Tue, Aug 6, 2013 at 11:53 PM, Felix Sasaki <fsasaki@w3.org> wrote: > >> Hi Nathan, >> >> thank you very much for your comment. A general remark: if you or others >> propose new features for ITS 2.x, I would suggest that you add them to the >> interest group wiki >> http://www.w3.org/International/its/wiki >> in order to do so you will need a w3c account, see >> http://www.w3.org/Help/Account/ >> >> Some comments below. >> >> Am 06.08.13 20:51, schrieb Nathan Glenn: >> >> Hello, >> In the course of the WICS project I have found need for a mechanism to >> undo metadata inheritance for a given element. I'd like to suggest it for >> the next version of ITS. >> >> *Use case:* I am converting ITS-decorated XML into HTML while keeping >> ITS markup intact. >> >> >> I think above sentence is something else than you are actually doing: >> what you describe below is not "keeping ITS markup intact", but keeping the >> ITS *information* (= a serialization of ITS information that has been >> created via local markup / global rules / inheritance defaults, like e.g. >> in the test suite >> >> https://github.com/finnle/ITS-2.0-Testsuite/ITS-2.0-Testsuite/its2.0/expected/idvalue/html/idvalue1htmloutput.txt >> ) intact. >> "keeping ITS markup intact" during a conversion from XML to HTML is >> relatively simple: convert local its:* attributes to its-*. Jirka Kosek has >> created a set of stylesheets that realize roundtripping HTML <> XML >> https://github.com/kosek/html5-its-tools >> Global rules need to be adapted manually. But this should not be a big >> issue: normally you will not create one rule set per XML document but >> rather per XML format (or per template in a given format). >> >> >> Any non-element nodes that are somehow involved with ITS markup >> (attributes selected via global rules, comments/PIs etc. pointed to with >> *Pointers) are being converted into elements and pasted near the nodes they >> represent. >> >> >> In your case, you realize "near the nodes" as "converted into child >> elements". But nobody forces you to store the ITS information that way. You >> could also have a tool specific mechanism that converts the ITS markup from >> XML into ITS information in HTML into something like this: >> >> <html>... >> <script type=application/xml> >> <itsInfos> >> <itsInfo idref="#p1"> >> <itsMetadata translate="yes" locNote="..:" ...></itsMetadata> >> <attribute name="id"> >> <itsMetadata idValue="p1" translate="no"/> >> </attribute> >> </itsInfo> >> </itsInfo> >> </script> >> </head> >> <body> >> <p id="p1">...</p> >> </body></html> >> >> >> The "p" element contains an "id" attribute. Inside "script" there is an >> "itsInfo" element. Its "idref" attribute points to the idvalue. In this way >> the serialized ITS information is not inline, but "standoff". Note that >> this is a tool specific standoff mechanism, not ITS 2.0 standoff like for >> provenance or localization quality issue. The advantage compared to your >> approach is that putting the serialized ITS information into "script" keeps >> the file HTML valid. >> >> >> In the original document, comments/PIs and usually attributes don't >> inherit ITS metadata. However, as elements these nodes *do* inherit ITS >> metadata. So in order to make the ITS metadata in the new document >> completely match the original, I have to undo this inheritance. This is >> possible for categories with default values (I can set dir="ltr", or >> localeFilter="*") but not for ones that don't. I could set locNote="", but >> that is not the same as the default (which is the non-existence of >> locNote). The affected categories are: localization note, language >> information, domain, provenance, localization quality issue, localization >> quality rating, MT confidence, and allowed characters. >> >> *Desired ITS markup: *The markup that I really need to make this work >> is a selective metadata reset. In other words, I need to be able to say >> "don't inherit the following categories". Something like >> its:reset="mtConfidence;locNote;language-information". It would also be >> useful to have a shorthand to prevent inheritance of anything: >> its:reset="all". A global version would be nice, too. >> >> >> >> I think you can implement your use case in two ways: >> >> 1) When you convert to HTML, do not conversion of ITS information (i.e. >> your processing instructions or comments) into HTML, but rather convert the >> ITS markup (see above). An ITS 2 processor that understands HTML will then >> process your files without any issues. >> >> 2) If 1) doesn't work for you and you need to serialize the ITS >> information created in XML within the HTML file, choose an approach that >> does not require direct embedding of the ITS information inside HTML. >> Embedding the information via elements, btw., might not only break ITS >> inheritance (as you describe), but also other tools that check HTML >> (validators) or want to process it (inside or outside the browers). >> >> Best, >> >> Felix >> > > >
Received on Thursday, 8 August 2013 17:52:56 UTC