- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Sat, 23 May 2009 14:34:07 +0200
- To: Philip Taylor <pjt47@cam.ac.uk>
- CC: Sam Ruby <rubys@intertwingly.net>, Shane McCarron <shane@aptest.com>, RDFa Community <public-rdfa@w3.org>, "public-rdf-in-xhtml-tf.w3.org" <public-rdf-in-xhtml-tf@w3.org>, HTML WG <public-html@w3.org>
Philip Taylor wrote: > ... >> That being said, I wouldn't hurt to have a section that defines >> special aspects of processing RDFa from a DOM instead of a HTML >> document (as a series of bytes/characters). > > I think it would hurt if some RDFa implementations (that used a DOM) > extracted one set of triples, and some other implementations (that don't > use a DOM) extracted a different set of triples, so if there are > multiple sections defining different styles of processing then it'll > have to be very careful to produce identical results. Yes. >> Is it still underspecified once we require a valid HTML5 document as >> input? > > Probably not. But I wouldn't consider it acceptable to require a valid > document as input - people make mistakes all the time, and I want them > to get consistent (and hopefully predictable) RDF triples out of it > regardless of what implementation they use, so the specification has to > deal precisely with invalid input. See > http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009May/0156.html > for an example of someone with precisely this kind of error. Understood; I just wanted to understand the scope of the problem. >>> For this to make sense in real HTML implementations, the definition >>> should be in terms of the document layer rather than the byte layer. >> >> Disagreed. Many implementations never build a DOM. We're not only >> talking about browsers here. > > By "DOM" I generally mean any kind of tree structure of elements and > attributes, either as an explicit data structure (DOM, XOM, ElementTree) > or implicit (SAX). Would any RDFa implementation *not* parse the input > HTML into that kind of structure and operate over the elements and > attributes as distinct objects? (e.g. would they just use regular > expressions over the input byte stream? That seems quite infeasible to > me...) Depends on the definition of "tree structure". I've been involved in code that just uses a tokenizer and specialized stack, and implementations like these will not do the re-arranging of elements the HTML5 spec specifies for some kinds of broken input. >>> How are xmlns:* attributes meant to be processed? E.g. what is the >>> expected output in the following cases: >>> >>> <div xmlns:T="test:"> >>> <span typeof="t:x" property="t:y">Test</span> >>> </div> >>> >>> <div XMLNS:t="test:"> >>> <span typeof="t:x" property="t:y">Test</span> >>> </div> >>> [...] >> >> I would expect the results to be the same for XHTML and HTML >> serializations. > > It would be good to be the same as far as possible, but in general that > is impossible to implement in a browser-based environment (or anything > built on any HTML parser I'm familiar with), because the case of > attributes is lost when parsing. We want to allow implementations in > browser-based environments, and we want them to match any other > implementations, so implementations in any other environment must handle > case-sensitivity in the same way. That's impossible, at least for now as RDFa-in-XHTML relies on XML-NS-wellformedness (so XMLNS:* would be recognized as namespace declaration, right?). BR, Julian
Received on Saturday, 23 May 2009 12:35:11 UTC