- From: Philip Taylor <pjt47@cam.ac.uk>
- Date: Sat, 23 May 2009 13:22:37 +0100
- To: Julian Reschke <julian.reschke@gmx.de>
- CC: Sam Ruby <rubys@intertwingly.net>, Shane McCarron <shane@aptest.com>, RDFa Community <public-rdfa@w3.org>, "public-rdf-in-xhtml-tf.w3.org" <public-rdf-in-xhtml-tf@w3.org>, HTML WG <public-html@w3.org>
Julian Reschke wrote: > Philip Taylor wrote: >> [...] >> What is "the @xml:lang attribute"? Is it the attribute with local name > > It's unambiguous as long as we talk about a stream of characters, right? Yes (assuming it's clear that it means the sequence of characters that matches the 'attribute name' part of whatever grammar defines the HTML syntax and ASCII-case-insensitively matches the string "xml:lang", and assuming we don't worry about e.g. multiple xml:lang attributes being (invalidly) specified on the same element). >> "xml:lang" in no namespace (as would be produced by an HTML 5 parser >> (and by current HTML browser parser implementations))? or the >> attribute with local name "lang" in the namespace >> "http://www.w3.org/XML/1998/namespace" (as would be produced by an XML >> parser, and could be inserted in an HTML document via DOM APIs)? or >> both (in which case both could be specified on one element, in >> addition to "lang" in no namespace)? > > Both can only be specified in the DOM, but not in a serialization (or am > I missing something?). I think that's roughly correct: In an XML serialisation with no scripting, you can only get the attribute "lang" in "http://www.w3.org/XML/1998/namespace". In a HTML5 text/html serialisation with no scripting, you can only get the attribute "xml:lang" in no namespace. It would be easy to invent a new serialisation that does let you declare both attributes, e.g. http://simon.html5.org/specs/sdf > That being said, I wouldn't hurt to have a section that defines special > aspects of processing RDFa from a DOM instead of a HTML document (as a > series of bytes/characters). I think it would hurt if some RDFa implementations (that used a DOM) extracted one set of triples, and some other implementations (that don't use a DOM) extracted a different set of triples, so if there are multiple sections defining different styles of processing then it'll have to be very careful to produce identical results. >> [...] >> <table> >> <tr some-attributes-to-say-this-element-outputs-an-XMLLiteral> >> <td> This text goes inside the table </td> >> This text gets parsed to *outside* the table >> <td> This text goes inside the table </td> >> </tr> >> </table> >> [...] > Is it still underspecified once we require a valid HTML5 document as input? Probably not. But I wouldn't consider it acceptable to require a valid document as input - people make mistakes all the time, and I want them to get consistent (and hopefully predictable) RDF triples out of it regardless of what implementation they use, so the specification has to deal precisely with invalid input. See http://lists.w3.org/Archives/Public/public-rdf-in-xhtml-tf/2009May/0156.html for an example of someone with precisely this kind of error. >> For this to make sense in real HTML implementations, the definition >> should be in terms of the document layer rather than the byte layer. > > Disagreed. Many implementations never build a DOM. We're not only > talking about browsers here. By "DOM" I generally mean any kind of tree structure of elements and attributes, either as an explicit data structure (DOM, XOM, ElementTree) or implicit (SAX). Would any RDFa implementation *not* parse the input HTML into that kind of structure and operate over the elements and attributes as distinct objects? (e.g. would they just use regular expressions over the input byte stream? That seems quite infeasible to me...) >> How are xmlns:* attributes meant to be processed? E.g. what is the >> expected output in the following cases: >> >> <div xmlns:T="test:"> >> <span typeof="t:x" property="t:y">Test</span> >> </div> >> >> <div XMLNS:t="test:"> >> <span typeof="t:x" property="t:y">Test</span> >> </div> >> [...] > > I would expect the results to be the same for XHTML and HTML > serializations. It would be good to be the same as far as possible, but in general that is impossible to implement in a browser-based environment (or anything built on any HTML parser I'm familiar with), because the case of attributes is lost when parsing. We want to allow implementations in browser-based environments, and we want them to match any other implementations, so implementations in any other environment must handle case-sensitivity in the same way. -- Philip Taylor pjt47@cam.ac.uk
Received on Saturday, 23 May 2009 12:23:22 UTC