Re: RDFa DOM API feedback from Philip Jägenstedt on 2011-07-04 (public-rdfa-wg@w3.org from July 2011)

From: Philip Jägenstedt <philip@foolip.org>
Date: Tue, 5 Jul 2011 00:02:24 +0200
To: Manu Sporny <msporny@digitalbazaar.com>
Cc: public-rdfa-wg@w3.org
Message-ID: <CAKHWUkY23rtu=w0E9nFk6D8iyhLs+d0b45YODpZw1FDyPrortw@mail.gmail.com>
On Mon, Jul 4, 2011 at 15:41, Philip Jägenstedt <philip@foolip.org> wrote:

It appears I spoke too quickly. Some of the things that appeared to
have been removed have been moved to
<http://www.w3.org/2010/02/rdfa/sources/rdfa-api/> and the issues
quoted below remain:

>> == RDF graph vs DOM disconnect ==
>>
>> The API is readonly and seems to completely disconnects the RDF graph
>> from the DOM from which it is parsed. This makes it impossible to use
>> the API to, e.g., change the style of all elements that declare a
>> subject with type <http://xmlns.com/foaf/0.1/Person>, which would seem
>> to be one of the main use cases for having an API at all.
>>
>> There's another serious issue here, best illustrated by an example:
>>
>> 1. getTriplesByType() is called. If this is the first time it is
>> called, the entire document must be traversed to build an RDF graph.
>> 2. Element/attributes are added/removed by script.
>> 3. getTriplesByType() is called again.
>>
>> At step 3, does the entire document need to be traversed again? In
>> other words, is it possible to efficiently cache the graph? Caching
>> would amount to storing bindings between each element/attribute and
>> the role it plays in the graph. Consider for example if the @lang
>> attribute of some element is changed. To update the graph, it's
>> necessary to know which triples have their language sourced from the
>> element or any of its children. Adding/removing/updating xmlns
>> attributes would be similarly messy. With many attributes influencing
>> the graph, there's going to be a *lot* of bindings to keep track of,
>> and in practice the graph is going to be extremely tightly coupled to
>> the DOM.
>>
>> Note that browsers have some infrastructure for updating collections
>> dynamically for things like getElementsByTagName, but it's usually a
>> lot simpler as it's only a single aspect of the element that is
>> considered and it maintains a collection of elements directly, not a
>> separate structure (RDF graph) parsed from them.
>>
>> IMO, a better approach would be an API returning a live NodeList where
>> the criteria for inclusion/exclusion are much simpler.
>
> This issue is no longer relevant.

I was mistaken, some of this is still problematic with the
DataDocument interface, which has getElementsByType,
getElementsBySubject and getElementsByProperty methods. These now
return NodeLists, but is it intentional that these collections are not
live?

For all three methods, the order must also be defined.

All of the mess mess I originally outlined also applies to
DocumentData.getSubjects or getValues. Unless the information can be
cached, implementation is not feasible.

>> == getTriplesByType type? ==
>>
>> Some underlying assumptions about the model appear to be unstated
>> here. Specifically, is it type as in @datatype or as in @typeof? (I'd
>> guess it's @typeof.)
>>
>> What if a <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> predicate
>> is used explicitly, not using the @typeof shorthand?

This question still applies to getElementsByType

>> == RDFa Profiles ==
>>
>> Is it intended that the DOM API work with RDFa Profiles? Supporting it
>> in browsers seems fairly problematic.
>>
>> 1. Consider this script:
>>
>> var e = document.querySelector("[profile]");
>> e.profile = "http://example.com/previously-unseen-profile";
>> document.getTriplesByType("http://examples.com/some-type");
>>
>> This clearly will not work, since the browser won't synchronously
>> download and parse the profile while the script is running. Given
>> this, how is a script to know when the API is safe to use?
>>
>> 2. Should browsers preemptively fetch/parse all profiles in a
>> document, even though 99% of documents won't use the getTriplesByType
>> API? Should that delay the document load event? (related to the above
>> question)
>
> These two sub-issues would appear to be solved by the async nature of
> DataParser.parse.

Unfortunately not. The problem is very much there, because of the
methods on DataDocument.

Should I perhaps just file individual bugs? Discussing so many issues
in a single email thread is probably going to be messy...

-- 
Philip Jägenstedt
Received on Monday, 4 July 2011 22:03:22 UTC