- From: Philip Taylor <pjt47@cam.ac.uk>
- Date: Thu, 17 Sep 2009 17:17:21 +0100
- To: Manu Sporny <msporny@digitalbazaar.com>
- CC: HTMLWG WG <public-html@w3.org>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>
Manu Sporny wrote: > The 3rd draft of the HTML+RDFa specification has been released and is > available here: > > http://html5.digitalbazaar.com/specs/rdfa.html I found a few things to comment on while reading through this. (I'm mostly ignoring any high-level issues about the design of the language, and just looking at how it's being specified.) First, the more substantive issues: "a tree-based model" -- is that tree-based model defined anywhere? (What data types does it consist of? e.g. is an attribute just a name string plus a value string, or is it a namespace URI plus a local name plus a value string? Does an element just have a list of attributes, or does it also have a separate list of namespace declarations? The questions seem important in determing how a DOM or Infoset or XOM tree or SAX stream etc maps onto the tree-based model.) I'm not sure what the point of section 2.1 (Modifying the Input Document) is. Section 2 already says HTML5 defines how to get from a document to a DOM, and says it's obvious how to get from a DOM to RDFa's tree-based model, so the first paragraph of section 2.1 seems redundant. As an underlying concept throughout the HTML5 spec, implementations are free to do whatever they want as long as the output is exactly the same as what HTML5 specifies, so it's already true that an HTML+RDFa implementation could internally use e.g. SAX as long as the output is equal to what's specified, so the second paragraph of 2.1 seems unnecessary. It might still be useful to explicitly state that underlying concept, e.g. "Note: Although HTML5 is specified in terms of a DOM, HTML+RDFa processors are free to use any implementation approach as long as their RDF output matches the output specified in this document." (I think that's more general than what's in 2.1, since it doesn't talk about details like HTML5 parser data structures - all that's important is the input and output. Also it avoids questions about what "a data structure equivalent to the HTML5 or XHTML5 DOM" really means (is a stream of SAX events an equivalent data structure? (is it even a data structure?))) "There may be a link element contained in the head element that contains profile for the the rel attribute and http://www.w3.org/1999/xhtml/vocab for the href attribute." -- that's a broken definition, e.g. it doesn't seem to allow <link rel="PROFILE" href=...> or <link rel="profile next" href=...>. It also conflicts with section 5.2. This line should probably just be removed, since section 5.2 is enough to allow documents to use profile. "The lang attribute must be processed in the same manner as the xml:lang attribute is [...]" -- that is confusing since the xml:lang attribute (in HTML5 text/html) is not processed in the same manner as in XHTML. (For example, <div xml:lang="en">...</div> in text/html has no language). It would be clearer to replace this with something like "Where the XHTML+RDFa specification refers to the xml:lang attribute, the language of an element must instead be determined as in the section titled The lang and xml:lang attributes in the HTML5 specification." "When generating literals of type XMLLiteral, the processor must ensure that the output XMLLiteral is a namespace well-formed XML fragment." -- I don't see why this requirement needs to be explicitly specified for HTML+RDFa, or described with such verbosity, given that XHTML+RDFa doesn't specify it explicitly. Any processor generating RDF triples must generate valid triples, which means XMLLiterals must have a lexical form that is exclusive canonical XML (hence namespace well-formed etc), and the RDFa spec does not need to repeat any of those requirements. Given RDF's use of exclusive canonical XML, there is only a single valid serialisation of a given input tree. So I think there's no need for HTML+RDFa to discuss various ways of getting a value - it just needs to define what that single valid serialisation is. So I think the whole section could simply require: "When generating literals of type XMLLiteral, the lexical form of the literal must be equal to the result of applying the [Coercing an HTML DOM into an infoset] rules to the child nodes of the current element, then serialising the resulting nodes to an octet stream with the [exclusive XML canonicalization method] (with comments, with empty InclusiveNamespaces PrefixList), then decoding the octet stream as UTF-8 into a Unicode string." (with some non-normative explanations of the implications, and examples, etc, but no other conformance requirements). "Hyperlink" -- <link rel=profile> sounds more like an "External Resource", since it augments the current document. http://whatwg.org/html5#linkTypes defines the link-type table to be non-normative. Is the link type table extension in HTML+RDFa meant to be non-normative or normative? If the former, the Hyperlink/External Resource thing needs to be specified in normative text and not just the table. "For documents conforming to this specification, attributes with names that have the case insensitive prefix "xmlns:" are conforming in both HTML5 and XHTML5." -- is it intentional that <div XMLNS:foo="..."/> in XHTML will be conforming? Surely that markup would break any RDFa processors, because they don't do case-insensitive attribute lookups in XHTML, so it should not be permitted. Also, attribute names in HTML5 are always lowercase (ignoring script modifications etc), because the concept of "attribute name" refers to the name in the DOM (not the bytes in the text/html syntax), and the parser converts names to lowercase. So only lowercase attribute names need to be made conforming. Also, according to this, attributes like xmlns:="..." and xmlns:0="..." will be conforming in HTML5, but authors will be confused if they use such attributes (because they'll try to use the CURIE "0:foo" and it will be ignored because it's invalid), so they should be non-conforming to alert authors to their errors. Only attributes whose names match the PrefixedAttName production from XML Namespaces should be conforming. And some minor issues about wording etc: "RDF in XHTML: Syntax and Processing" -- s/RDF/RDFa/ (in Abstract, and again in History). "The latest stable version of the editor's draft of this specification is always available on [the W3C CVS server]. The [latest editor's working copy] (which may contain unfinished text in the process of being prepared) is also available." -- the first link is to an old version (July) that looks much less stable or complete than this version; the second link is a 404. "By design, the possibility of [...] was squarely in the realm of possibility." -- seems tautological; maybe remove the "the possibility of". "heeding the minor changes in this section" -- s/section/document/ (or specification or something). "Section 5.5: Sequence, of the [XHTML+RDFa] specification defines [...]" -- remove unnecessary comma. "The HTML5 and XHTML5 DOM, or equivalent data structure, should be used as input to the RDFa processing rules." -- s/should/must/ (I don't see any reason why someone ought to be allowed to violate this requirement, and still claim to be a conforming HTML+RDFa processor). "element nesting issues in HTML documents may be corrected" -- s/may/can/ (or something similar) (use of normative RFC2119 keyword "may" in a non-normative section seems undesirable). "Any mechanism that generates a data structure equivalent to the HTML5 or XHTML5 DOM, such as the html5lib library" -- it seems weird for a specification to refer to a specific implementation. The reference doesn't even provide any information to the reader, unless they already know the details of html5lib tree builders. "Any mechanism [...] may be used" -- s/may/can/ (same reason as above). "a XML mode document" -- s/a/an/ "In future versions of RDFa, the value of the profile may trigger different processing rules in RDFa Processors." -- s/may/might/ (I don't think that's meant to be a normative conformance requirement). "While it is specified that HTML5 must preserve these attributes in the DOM" -- s/must/will/ (I don't think that's meant to be a normative conformance requirement here, and it's confusing to use RFC2119 keywords when referring to the consequences of requirements in other specs). -- Philip Taylor pjt47@cam.ac.uk
Received on Thursday, 17 September 2009 16:18:01 UTC