- From: Sam Ruby <rubys@intertwingly.net>
- Date: Thu, 28 May 2009 10:58:04 -0400
- To: Manu Sporny <msporny@digitalbazaar.com>
- CC: RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>, HTMLWG WG <public-html@w3.org>
Manu Sporny wrote: > Sam Ruby wrote: >>> Perhaps what we need is a test harness that will take the document >>> contents of each 110+ XHTML+RDFa test cases and shoves them into a >>> series of different (X)HTML DOCTYPES to determine which test cases >>> result in different triples based on the underlying RDFa parser and >>> DOCTYPE? >>> >>> It would give us some idea of where triples deviate most often based on >>> DOCTYPE as well as helping developers create more robust RDFa processor >>> implementations. >> I've left a comment on your wiki page. Whether or not it is the >> dominant usage or not, I'm presuming that the ultimate definition of >> RDFa is not intended to preclude Javascript implementations running in >> the browser from ever being fully compliant. Is that a fair assumption? > > Yes, I think that is a fair assumption to make. We've tried to be as > careful as possible to ensure that a wide selection of technologies are > able to implement RDFa in XHTML. The same care should be taken when > addressing RDFa in HTML. > >> The reason why I say this is that browsers have settled on a behavior >> where the MIME type is the primary signaling mechanism, and the DOCTYPE >> is at *most* a secondary signaling mechanism. In particular, look at >> >> http://dev.w3.org/html5/spec/Overview.html#the-initial-insertion-mode > > While I agree that this is the reality in the browser-world, we must > also take into account that documents can and will be processed offline, > where access to the MIME type is not possible at all times. > > For in-browser implementations, one could make the argument that a DOM > that differs from its HTML or XHTML representation on disk, even if the > on-disk representation is corrupt, constitutes a different document and > therefore may produce a different set of triples. > > Put another way, if somebody applies a transformation to an XHTML > document via XSLT, and produces a different XHTML document, we wouldn't > expect the set of triples to necessarily be the same between the pre- > and post-transformed document. > > Similarly, if a browser applies a variety of transformations to an HTML4 > document to create the HTML DOM, we shouldn't expect the same set of > triples to always be generated from the modified document. This concept > could be applied to harmonize RDFa with most things that html5lib generates. > > This same high-level approach could be taken in the (X)HTML+RDFa case - > but, of course, the devil is in the details. We are not talking about a corruption or a bork filter[1] here. The intent of the HTML Working Group is to describe how HTML documents are to be faithfully, and interoperabily, and consistently interpreted. Your question above concerns different DOCTYPEs. Some RDFa toolkit implementations are layered on top of browsers or other HTML parsers. Assuming DOCTYPE is a major factor is how content is to be interpreted is not consistent with current practice or the general direction that such toolkits are heading. If you accept that the mime type may not always be available (certainly seems reasonable to me), then it makes sense to pursue one of the following: 1) get all browsers to change (likelihood: zero) 2) accept that RDFa can never be implemented in a browser (suboptimal) 3) require all RDFa in XHTML to be served as application/xml (likely to be ignored) 4) define a polyglot language that works well in either case, and supplement that with rules and tools that help people produce content that maximizes interoperability. (very hard) 5) [any other suggestions?] I would like to see a test harness that varies based on MIME type, and then for the various differences to be cataloged and discussed. That would help in deciding which of the above paths to pursue. > -- manu - Sam Ruby [1] http://en.wikipedia.org/wiki/Swedish_Chef
Received on Thursday, 28 May 2009 14:58:39 UTC