- From: Robin Berjon <robin@w3.org>
- Date: Wed, 10 Jul 2013 15:36:41 +0200
- To: Daniel Glazman <daniel.glazman@disruptive-innovations.com>
- CC: HTML WG <public-html@w3.org>, Richard Ishida <ishida@w3.org>, Philippe Le Hegaret <plh@w3.org>, "public-multilingualweb-lt-comments@w3.org" <public-multilingualweb-lt-comments@w3.org>
On 10/07/2013 14:59 , Daniel Glazman wrote: > The issue we are hitting is related to the parsing of such a document. > In the html serialization of html5, the DOM will show a text > node inside the script element, that node containing the textual > representation of the whole contents of the script element; on another > hand, the parsing of the xml serialization of the same document will > generate a script element containing a its-namespaced subtree... > > I see this as problematic for two reasons: > > 1. I don't think the OM should change depending on the serialization > used > 2. this has an impact on implementations forced to use html-flavor > switches for creation/edition/manipulation/serialization of inline > ITS rules.... That ship has sailed. There is code relying on the content of scripts being text (that requires parsing) in HTML. The only way of aligning the two that *might* work would be to require XHTML processors to treat markup inside <script> to be kept as text in the DOM. I'm not sure anyone wants to go there. What you're actually looking for is XML data islands. It's something that IE supports (supported?) using an <xml> element inside of which it switches to XML parsing. I don't believe that there's overwhelming interest in supporting that. > We would like to have your opinion on the above. Do you think the OM > for both html and xml serialization of a html5 document containing > inline ITS 2.0 rules should be the same or you don't see it as an > issue? I won't dispute that it's unpleasant; but the alternatives are worse. One possible way of aligning everything would be to have a JSON serialisation for ITS. Given the language, it might not be all that hard. <script type='application/its+json'> { "namespaces": { "tei": "http://blah/tei" } , "rules": [ { "selector": "//tei:term", "translate": "yes" } ] } </script> That pretty much just works, and you can define the ITS-JSON spec as a simple mapping from JSON to XML. I realise that might not be practical, just saying it's actually a viable option. > If you think it should be the same, do you think encapsulating > inline 2.0 rules inside a CDATA section is a workable solution or do > you have another suggestion? That's one option, but you have to keep in mind that you still won't get the same result in both serialisations. For <![CDATA[foo]]> XML parsing will give you a node containing "foo", whereas for HTML parsing you'll get "<![CDATA[foo]]>". Easy to strip, but still requires special-casing (at which point I reckon you're no better off than you are now). Also keep in mind that CDATA sections don't nest. ITS isn't text-heavy so the risks are low, but if someone uses <its:param name='whevs'><![CDATA[foo]]></its:param> then it won't embed. Another option is comments. But they don't nest either. For interop, I reckon that the best option you have (short of a JSON serialisation) would be to keep things as they are today, but to write a clear algorithm that processes the content properly in all cases. -- Robin Berjon - http://berjon.com/ - @robinberjon
Received on Wednesday, 10 July 2013 13:36:55 UTC