- From: Karl Dubost <karl@w3.org>
- Date: Tue, 27 Feb 2007 20:30:45 +0900
This is an interesting post from John Udell about the two extremes of authoring HTML with pros and cons and bridges he developed. February 19, 2007 Blogging from Word 2007, crossing the chasm [1] To that end, I?m developing some Python code to help me wrangle Word?s default .docx format, which is a zip file containing the document in WordML and a bunch of other stuff. At the end of this entry you can see what I?ve got so far. I?m using this code to explore what kind of XML I can inject programmatically into a Word 2007 document, what kind comes back after a round trip through the application, how that XML relates to the HTML that gets published to WordPress, and which of these representations will be the canonical one that I?ll want to store and process. So far my conclusion is that none of these representations will be the canonical one, and that I?ll need to find (or more likely create) a transform to and from the canonical representation where I?ll store and process all my stuff. We?ll see how it goes. [1] http://blog.jonudell.net/2007/02/19/blogging-from-word-2007- crossing-the-chasm/ I like the mention of the canonical form. Not exactly the same canonical form than his, but that would be good to have an html canonical form for editing. It would help building tools like for example htmldiff, tidy serialization, and source code visualizer in editing tools. It would help authors also to work the way they want with their files and still communicate files between parties. my source code layout <-- T1 --> canonical form <-- T2 --> your source code layout T1 and T2 being formatting transformation. -- Karl Dubost - http://www.w3.org/People/karl/ W3C Conformance Manager, QA Activity Lead QA Weblog - http://www.w3.org/QA/ *** Be Strict To Be Cool ***
Received on Tuesday, 27 February 2007 03:30:45 UTC