Re: References to CSS rules in RDFa syntax document from Ivan Herman on 2007-11-06 (public-rdf-in-xhtml-tf@w3.org from November 2007)

From: Ivan Herman <ivan@w3.org>
Date: Tue, 06 Nov 2007 16:22:04 +0100
To: Niklas Lindström <lindstream@gmail.com>
Cc: Manu Sporny <msporny@digitalbazaar.com>, W3C RDFa task force <public-rdf-in-xhtml-tf@w3.org>
Message-ID: <4730869C.2020305@w3.org>
Thanks!

A full canonicalization in Python is also not that easy. Getting the
repeated white space characters out and stripping the first and last
whitespace is a breeze. The rest becomes a real headache unless the
underlying XML library does it (eg, ordering the attributes). I wonder
whether we should really require that. What do we gain?

Ivan

Niklas Lindström wrote:
> Hi!
> 
> I believe there are more options (and pitfalls) to explore in IE7.
> 
> For instance, setting style="white-space: pre" on the h1 element makes
> innerHTML return a value with *most* space preserved. Added newlines
> do seem to disappear entirely though (e.g. if I replace the space in
> "The Most" with only a newline, the innerHTML presents that part as
> "TheMost". Not in the rendered page, but in the returned value..).
> 
> Furthermore, getting at the content with pure DOM calls (and provided
> "white-space" is set to "pre" as above) makes things somewhat cleaner
> -- e.g. newlines are preserved. Although the *first* newline in the
> first (text) childNode of the h1 is missing.. Also note that
> programmatically setting "someElem.style.whiteSpace = 'pre'" happens
> asynchronously, so any code that would try that to get at the original
> (well, sort of..) white space have to "wait for it".. :/
> 
> I just did these quick tests in case the capabilities of IE7 is what
> would put an end to any hope of keeping non-canonicalized XMLLiterals.
> There seems to be some  possibilities, but perhaps not stable enough?
> So if nothing else, it should be noted that requiring normalized space
> from RDFa parsers in such a case would require manual processing (DOM
> walking + normalizing) in some (at least non-XHTML-aware..) client
> implementations.
> 
> Oh, and of course, IE (including IE7) has the bad habit of
> upper-casing any HTML in both innerHTML and nodeName values, so this
> has to be accounted for as well.
> 
> FWIW, this is a piece of what I added in Manu's test code
> (<http://rdfa.digitalbazaar.com/tests/xmlliteral.html>) while testing:
> ----
>    var title = document.getElementById('dc-title');
>    alert(title.innerHTML); // upper-cased "SUP"
>    alert(title.firstChild.nextSibling.nodeName); // "SUP" here too
>    title.style.whiteSpace = 'pre';
>    alert("wait for it..");
>    alert(escape(title.firstChild.nodeValue));
> ----
> 
> Finally, if one *really* wanted to, I suppose using XMLHttpRequest to
> get the current document (i.e. refetching it) as proper XML is also a
> workaround for IE. Not exactly a great solution, but it might work..
> 
> For my personal opinion, I think the ideal would be for XMLLiterals in
> RDFa to be given as they are in the source document. But if existing
> (client) implementations are to be of concern, including IE (at least
> IE7), I can understand if this ideal may have to be abandoned
> (specifically for XHTML 1.1 + RDFa). However, achieving
> canonicalization (of white space, element names as lower case,
> attribute ordering..) may be a bit of a headache anyway (in *at least*
> IE7).
> 
> Best regards,
> Niklas
> 
> 
> On 11/6/07, Ivan Herman <ivan@w3.org> wrote:
>> I think we have to yield on this issue:-( I have update pyRdfa to
>> canonicalize XML Literals, too...
>>
>> Test #11 is passed now...
>>
>> Ivan
>>
>> Ivan Herman wrote:
>>> Ouch, ouch, ouch! That hurts...
>>>
>>> If your findings are confirmed than indeed we have much less choice than
>>> before. I hate that!:-)
>>>
>>> Ivan
>>>
>>> P.S. I never liked programming in javascript:-(
>>>
>>> Manu Sporny wrote:
>>>> Ivan Herman wrote:
>>>>>> In other words, the following XHTML (Test Case #11):
>>>>>>
>>>>>> <div about="">
>>>>>>    Author: <span property="dc:creator">Albert Einstein</span>
>>>>>>    <h2 property="dc:title">
>>>>>>         E = mc<sup>2</sup>: The Most Urgent Problem of Our Time
>>>>>>    </h2>
>>>>>> </div>
>>>>>>
>>>>>> Should produce the following triples:
>>>>>>
>>>>>> @prefix _5:
>>>>>> <http://www.w3.org/2006/07/SWD/RDFa/testsuite/xhtml1-testcases/0011.>.
>>>>>> @prefix dc: <http://purl.org/dc/elements/1.1/>.
>>>>>>
>>>>>> _5:xhtml dc:creator "Albert Einstein";
>>>>>>   dc:title """E = mc<sup>2</sup>: The Most Urgent Problem of Our Time"""
>>>>>>           ^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral>.
>>>>>>
>>>>>>> So I believe we should either refer to these two ideas, or even import
>>>>>>> the prose as is, if we have to.
>>>>> Wait, that is a different issue. It is still undecided whether the
>>>>> canonicalization should apply on XML Literals. Mark's proposal is to use
>>>>> XPath for the definition of canonicalization, not (yet) on what exactly
>>>>> it applies to!
>>>> If only we had a choice, Ivan :)
>>>>
>>>> I took some time last night to do some research on how XMLLiterals could
>>>> be implemented in Javascript. Here are the results for RDFa Test Case #11:
>>>>
>>>> http://rdfa.digitalbazaar.com/tests/xmlliteral.html
>>>>
>>>> If you use Firefox's DOM and Javascript implementation to get the
>>>> contents of the H2 element, here are the results on the node:
>>>>
>>>> outerHTML: 'undefined'
>>>> innerHTML:
>>>> '\n        E = mc<sup>2</sup>: The Most Urgent Problem of Our Time\n
>>>>  ' (there are extra spaces after the last \n)
>>>> innerText: 'undefined'
>>>>
>>>> If you use Internet Explorer 7's DOM and Javascript implementation to
>>>> get the contents of the "E = mc^2: The Most Urgent Problem of Our Time",
>>>> here are the results on the node:
>>>>
>>>> outerHTML: '\r\n<H2 id=dc-title property="dc:title">E = mc<SUP>2</SUP>:
>>>> The Most Urgent Problem of Our Time </H2>'
>>>> innerHTML: 'E = mc<SUP>2</SUP>: The Most Urgent Problem of Our Time '
>>>> innerText: 'E = mc2: The Most Urgent Problem of Our Time '
>>>>
>>>> In short - Firefox's implementation allows you to retrieve the original
>>>> whitespace and line breaks using Javascript. IE7 does not.
>>>>
>>>> IE7 normalizes all of the whitespace before inserting it into the DOM,
>>>> which means that Javascript does not have access to the original text in
>>>> the XHTML file.
>>>>
>>>> This means that the same canonacalization rules should be used for
>>>> regular strings and XMLLiterals for RDFa-in-XHTML.
>>>>
>>>> Somebody please correct me if they have a different understanding of the
>>>> IE7 DOM.
>>>>
>>>> -- manu
>>>>
>> --
>>
>> Ivan Herman, W3C Semantic Web Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>
>>

-- 

Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Tuesday, 6 November 2007 15:22:09 UTC