- From: Philip Jägenstedt <philip@foolip.org>
- Date: Sat, 2 Jul 2011 23:03:08 +0200
- To: public-rdfa-wg@w3.org
I've been following RDFa and Microdata for a while now, and have toyed around a bit with things like <https://gitorious.org/microdatajs/microdatajs> and <http://foolip.org/microdatajs/live/>. As you might guess, I'm rather interested in DOM APIs, so I thought I'd take a look at <http://dev.w3.org/rdfa/specs/rdfa-dom-api.html> and provide some feedback. (Although I work for Opera Software, I'm not representing Opera in any way in this feedback.) Since I'm not subscribed to public-rdfa-wg, please try to CC me in replies. == HTML == The spec only references the 2008 RDFa in XHTML REC, not <http://dev.w3.org/html5/rdfa/>. Is this an oversight? Note that just deferring to the two RDFa specs implies different processing requirements depending on XHTML and HTML. Different behavior of API's depending on XHTML/HTML is not going to be well received by browser implementors, as it creates all kinds of problems. (Consider, for example, what happens when a DOM is created entirely by script or when a subtree of a text/html document is moved by script to a application/xhtml+xml document. It just doesn't make sense to switch between different modes here.) I'm going to assume in the following that the intention is for there to be a single API spec covering both serializations. == getTriplesByType type? == Some underlying assumptions about the model appear to be unstated here. Specifically, is it type as in @datatype or as in @typeof? (I'd guess it's @typeof.) What if a <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> predicate is used explicitly, not using the @typeof shorthand? == getTriplesByType order == Which order should triples be returned in? This must be well-defined in order for the API to be interoperably implementable. == Merging of triples == If the same triple is expressed several times in the document, are they merged, or will two instances of the same triple returned by getTriplesByType? == Triple.language == Is the language normalized, and how? If the language is given with lang="sv_FI" in markup, what does .language return? If there is any merging of triples, does that happen before or after language normalization? == Dynamic changes == getTriplesByType returns a static array of Triples. In contrast, a lot of HTML APIs return a live NodeList, so that changes to the document are reflected in that NodeList. This is the case with e.g. getElementsByTagName and the Microdata getItems API. If returning a static array is intentional, can you make that more explicit by saying that it is the triples that are in the document at the time of invocation that are returned? == RDF graph vs DOM disconnect == The API is readonly and seems to completely disconnects the RDF graph from the DOM from which it is parsed. This makes it impossible to use the API to, e.g., change the style of all elements that declare a subject with type <http://xmlns.com/foaf/0.1/Person>, which would seem to be one of the main use cases for having an API at all. There's another serious issue here, best illustrated by an example: 1. getTriplesByType() is called. If this is the first time it is called, the entire document must be traversed to build an RDF graph. 2. Element/attributes are added/removed by script. 3. getTriplesByType() is called again. At step 3, does the entire document need to be traversed again? In other words, is it possible to efficiently cache the graph? Caching would amount to storing bindings between each element/attribute and the role it plays in the graph. Consider for example if the @lang attribute of some element is changed. To update the graph, it's necessary to know which triples have their language sourced from the element or any of its children. Adding/removing/updating xmlns attributes would be similarly messy. With many attributes influencing the graph, there's going to be a *lot* of bindings to keep track of, and in practice the graph is going to be extremely tightly coupled to the DOM. Note that browsers have some infrastructure for updating collections dynamically for things like getElementsByTagName, but it's usually a lot simpler as it's only a single aspect of the element that is considered and it maintains a collection of elements directly, not a separate structure (RDF graph) parsed from them. IMO, a better approach would be an API returning a live NodeList where the criteria for inclusion/exclusion are much simpler. == Triples.children == children seems a very strange thing to call all triples involving the same subject, is this really intentional? Regardless, my main question is related to the above. If getTriplesByType returns a static array, what does later inspecting triple.children return? Is it the "children" that triple had at the time getTriplesByType was called, or something else? If it is the former, then it implies that each call to getTriplesByType must find all "children" up-front, as it's not possible to wait until the children IDL attribute is actually read to find the "children". This seems extremely wasteful. If it is something else, then the result array isn't static at all. Either way, this must be defined. == RDFa Profiles == Is it intended that the DOM API work with RDFa Profiles? Supporting it in browsers seems fairly problematic. 1. Consider this script: var e = document.querySelector("[profile]"); e.profile = "http://example.com/previously-unseen-profile"; document.getTriplesByType("http://examples.com/some-type"); This clearly will not work, since the browser won't synchronously download and parse the profile while the script is running. Given this, how is a script to know when the API is safe to use? 2. Should browsers preemptively fetch/parse all profiles in a document, even though 99% of documents won't use the getTriplesByType API? Should that delay the document load event? (related to the above question) 3. If a profile actually becomes widely used, aren't you worried about the DDoS that will result? Compare to the problems of DTD outlined in <http://lists.w3.org/Archives/Public/public-html/2008Jul/0269.html>. 4. Should browsers have Turtle and RDF/XML parsers to handle the case where the profile is using those syntaxes? MAY is a keyword for interoperability disaster, at least in the context of web browsers... == The End == Thanks for reading all the way through! If I discover more issues, I'll follow up with more mail. Finally, I'd like to invite everyone to provide technical feedback about issues with Microdata, if you haven't already. -- Philip Jägenstedt
Received on Monday, 4 July 2011 12:21:07 UTC