- From: Ivan Herman <ivan@w3.org>
- Date: Tue, 06 Nov 2007 16:22:04 +0100
- To: Niklas Lindström <lindstream@gmail.com>
- Cc: Manu Sporny <msporny@digitalbazaar.com>, W3C RDFa task force <public-rdf-in-xhtml-tf@w3.org>
- Message-ID: <4730869C.2020305@w3.org>
Thanks! A full canonicalization in Python is also not that easy. Getting the repeated white space characters out and stripping the first and last whitespace is a breeze. The rest becomes a real headache unless the underlying XML library does it (eg, ordering the attributes). I wonder whether we should really require that. What do we gain? Ivan Niklas Lindström wrote: > Hi! > > I believe there are more options (and pitfalls) to explore in IE7. > > For instance, setting style="white-space: pre" on the h1 element makes > innerHTML return a value with *most* space preserved. Added newlines > do seem to disappear entirely though (e.g. if I replace the space in > "The Most" with only a newline, the innerHTML presents that part as > "TheMost". Not in the rendered page, but in the returned value..). > > Furthermore, getting at the content with pure DOM calls (and provided > "white-space" is set to "pre" as above) makes things somewhat cleaner > -- e.g. newlines are preserved. Although the *first* newline in the > first (text) childNode of the h1 is missing.. Also note that > programmatically setting "someElem.style.whiteSpace = 'pre'" happens > asynchronously, so any code that would try that to get at the original > (well, sort of..) white space have to "wait for it".. :/ > > I just did these quick tests in case the capabilities of IE7 is what > would put an end to any hope of keeping non-canonicalized XMLLiterals. > There seems to be some possibilities, but perhaps not stable enough? > So if nothing else, it should be noted that requiring normalized space > from RDFa parsers in such a case would require manual processing (DOM > walking + normalizing) in some (at least non-XHTML-aware..) client > implementations. > > Oh, and of course, IE (including IE7) has the bad habit of > upper-casing any HTML in both innerHTML and nodeName values, so this > has to be accounted for as well. > > FWIW, this is a piece of what I added in Manu's test code > (<http://rdfa.digitalbazaar.com/tests/xmlliteral.html>) while testing: > ---- > var title = document.getElementById('dc-title'); > alert(title.innerHTML); // upper-cased "SUP" > alert(title.firstChild.nextSibling.nodeName); // "SUP" here too > title.style.whiteSpace = 'pre'; > alert("wait for it.."); > alert(escape(title.firstChild.nodeValue)); > ---- > > Finally, if one *really* wanted to, I suppose using XMLHttpRequest to > get the current document (i.e. refetching it) as proper XML is also a > workaround for IE. Not exactly a great solution, but it might work.. > > For my personal opinion, I think the ideal would be for XMLLiterals in > RDFa to be given as they are in the source document. But if existing > (client) implementations are to be of concern, including IE (at least > IE7), I can understand if this ideal may have to be abandoned > (specifically for XHTML 1.1 + RDFa). However, achieving > canonicalization (of white space, element names as lower case, > attribute ordering..) may be a bit of a headache anyway (in *at least* > IE7). > > Best regards, > Niklas > > > On 11/6/07, Ivan Herman <ivan@w3.org> wrote: >> I think we have to yield on this issue:-( I have update pyRdfa to >> canonicalize XML Literals, too... >> >> Test #11 is passed now... >> >> Ivan >> >> Ivan Herman wrote: >>> Ouch, ouch, ouch! That hurts... >>> >>> If your findings are confirmed than indeed we have much less choice than >>> before. I hate that!:-) >>> >>> Ivan >>> >>> P.S. I never liked programming in javascript:-( >>> >>> Manu Sporny wrote: >>>> Ivan Herman wrote: >>>>>> In other words, the following XHTML (Test Case #11): >>>>>> >>>>>> <div about=""> >>>>>> Author: <span property="dc:creator">Albert Einstein</span> >>>>>> <h2 property="dc:title"> >>>>>> E = mc<sup>2</sup>: The Most Urgent Problem of Our Time >>>>>> </h2> >>>>>> </div> >>>>>> >>>>>> Should produce the following triples: >>>>>> >>>>>> @prefix _5: >>>>>> <http://www.w3.org/2006/07/SWD/RDFa/testsuite/xhtml1-testcases/0011.>. >>>>>> @prefix dc: <http://purl.org/dc/elements/1.1/>. >>>>>> >>>>>> _5:xhtml dc:creator "Albert Einstein"; >>>>>> dc:title """E = mc<sup>2</sup>: The Most Urgent Problem of Our Time""" >>>>>> ^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral>. >>>>>> >>>>>>> So I believe we should either refer to these two ideas, or even import >>>>>>> the prose as is, if we have to. >>>>> Wait, that is a different issue. It is still undecided whether the >>>>> canonicalization should apply on XML Literals. Mark's proposal is to use >>>>> XPath for the definition of canonicalization, not (yet) on what exactly >>>>> it applies to! >>>> If only we had a choice, Ivan :) >>>> >>>> I took some time last night to do some research on how XMLLiterals could >>>> be implemented in Javascript. Here are the results for RDFa Test Case #11: >>>> >>>> http://rdfa.digitalbazaar.com/tests/xmlliteral.html >>>> >>>> If you use Firefox's DOM and Javascript implementation to get the >>>> contents of the H2 element, here are the results on the node: >>>> >>>> outerHTML: 'undefined' >>>> innerHTML: >>>> '\n E = mc<sup>2</sup>: The Most Urgent Problem of Our Time\n >>>> ' (there are extra spaces after the last \n) >>>> innerText: 'undefined' >>>> >>>> If you use Internet Explorer 7's DOM and Javascript implementation to >>>> get the contents of the "E = mc^2: The Most Urgent Problem of Our Time", >>>> here are the results on the node: >>>> >>>> outerHTML: '\r\n<H2 id=dc-title property="dc:title">E = mc<SUP>2</SUP>: >>>> The Most Urgent Problem of Our Time </H2>' >>>> innerHTML: 'E = mc<SUP>2</SUP>: The Most Urgent Problem of Our Time ' >>>> innerText: 'E = mc2: The Most Urgent Problem of Our Time ' >>>> >>>> In short - Firefox's implementation allows you to retrieve the original >>>> whitespace and line breaks using Javascript. IE7 does not. >>>> >>>> IE7 normalizes all of the whitespace before inserting it into the DOM, >>>> which means that Javascript does not have access to the original text in >>>> the XHTML file. >>>> >>>> This means that the same canonacalization rules should be used for >>>> regular strings and XMLLiterals for RDFa-in-XHTML. >>>> >>>> Somebody please correct me if they have a different understanding of the >>>> IE7 DOM. >>>> >>>> -- manu >>>> >> -- >> >> Ivan Herman, W3C Semantic Web Activity Lead >> Home: http://www.w3.org/People/Ivan/ >> PGP Key: http://www.ivan-herman.net/pgpkey.html >> FOAF: http://www.ivan-herman.net/foaf.rdf >> >> -- Ivan Herman, W3C Semantic Web Activity Lead Home: http://www.w3.org/People/Ivan/ PGP Key: http://www.ivan-herman.net/pgpkey.html FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Tuesday, 6 November 2007 15:22:09 UTC