Re: RDFa DOM API feedback from Philip Jägenstedt on 2011-07-04 (public-rdfa-wg@w3.org from July 2011)

From: Philip Jägenstedt <philip@foolip.org>
Date: Mon, 4 Jul 2011 15:41:41 +0200
To: Manu Sporny <msporny@digitalbazaar.com>
Cc: public-rdfa-wg@w3.org
Message-ID: <CAKHWUkbWBH_wMm17PPx=WaAEnKOz1=QA2moD+k5vP+RMKkggKw@mail.gmail.com>
On Mon, Jul 4, 2011 at 14:59, Manu Sporny <msporny@digitalbazaar.com> wrote:
> On 07/02/11 17:03, Philip Jägenstedt wrote:
>> I thought I'd take a look at
>> <http://dev.w3.org/rdfa/specs/rdfa-dom-api.html> and provide some
>> feedback.
>
> Hi Philip, unfortunately, this isn't the latest document, and should
> probably be removed from source control. I had forgotten that it
> existed. Here is the latest set of specs on the RDFa DOM API:
>
> The RDFa API (Level 3):
> http://www.w3.org/2010/02/rdfa/sources/rdfa-api/
>
> The Structured Data API (Level 2):
> http://www.w3.org/2010/02/rdfa/sources/rdf-api/
>
> The RDF Interfaces (Level 1):
> http://www.w3.org/2010/02/rdfa/sources/rdf-interfaces/
>
> Keep in mind that we've separated each layer such that a
> developer/browser vendor could choose to just implement Level 3, or
> Level 3 and Level 2, or just Level 1. At the moment, the Level 3 and
> Level 2 stuff is a bit mixed together as we're in the middle of
> rearranging the document contents. We can chat more about this during
> our conversation tomorrow.

I'd prefer to discuss this in public email threads, so that anyone
that has something to add can do so. (I'm not looking for an official
response from the WG, as I doubt everyone in it could possibly agree
on all issues.)

<http://www.w3.org/2010/02/rdfa/sources/rdf-interfaces/> is the
document that is the most similar to
<http://dev.w3.org/rdfa/specs/rdfa-dom-api.html>, and most of the
issues can be mapped directly.

I'll look over the new documents and provide additional feedback, but
for now I'll annotate the original questions in the new context:

> == HTML ==
>
> The spec only references the 2008 RDFa in XHTML REC, not
> <http://dev.w3.org/html5/rdfa/>. Is this an oversight?
>
> Note that just deferring to the two RDFa specs implies different
> processing requirements depending on XHTML and HTML. Different
> behavior of API's depending on XHTML/HTML is not going to be well
> received by browser implementors, as it creates all kinds of problems.
> (Consider, for example, what happens when a DOM is created entirely by
> script or when a subtree of a text/html document is moved by script to
> a application/xhtml+xml document. It just doesn't make sense to switch
> between different modes here.)
>
> I'm going to assume in the following that the intention is for there
> to be a single API spec covering both serializations.

AFAICT, the issue remains.

> == getTriplesByType type? ==
>
> Some underlying assumptions about the model appear to be unstated
> here. Specifically, is it type as in @datatype or as in @typeof? (I'd
> guess it's @typeof.)
>
> What if a <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> predicate
> is used explicitly, not using the @typeof shorthand?
>
> == getTriplesByType order ==
>
> Which order should triples be returned in? This must be well-defined
> in order for the API to be interoperably implementable.

The same issue now applies to Graph.toArray, although it now says
"Note: the order of the Triples within the returned sequence is
arbitrary, since a Graph is an unordered set."

As an anecdote, the ECMAScript spec has always said that when
enumerating properties of objects, the order is undefined. (Properties
are conceptually an unordered set.) In practice, implementations do
use a particular order (insertion order, more or less) and this has
required reverse-engineering between browsers because scripts rely on
that order. The same thing will happen with Graph.toArray if it
becomes widely deployed.

> == Merging of triples ==
>
> If the same triple is expressed several times in the document, are
> they merged, or will two instances of the same triple returned by
> getTriplesByType?

Issue still applies to DataParser.parse

> == Triple.language ==
>
> Is the language normalized, and how? If the language is given with
> lang="sv_FI" in markup, what does .language return? If there is any
> merging of triples, does that happen before or after language
> normalization?

Triple.language is no more, it seems.

> == Dynamic changes ==
>
> getTriplesByType returns a static array of Triples. In contrast, a lot
> of HTML APIs return a live NodeList, so that changes to the document
> are reflected in that NodeList. This is the case with e.g.
> getElementsByTagName and the Microdata getItems API. If returning a
> static array is intentional, can you make that more explicit by saying
> that it is the triples that are in the document at the time of
> invocation that are returned?

It appears that DataParser.parse is used to generate a Graph from a
Document, which would "solve" the problem of dynamic changes. I don't
think that it is acceptable solution because it is extremely expensive
to have to re-parse the entire document after any change, but it does
answer my original question.

> == RDF graph vs DOM disconnect ==
>
> The API is readonly and seems to completely disconnects the RDF graph
> from the DOM from which it is parsed. This makes it impossible to use
> the API to, e.g., change the style of all elements that declare a
> subject with type <http://xmlns.com/foaf/0.1/Person>, which would seem
> to be one of the main use cases for having an API at all.
>
> There's another serious issue here, best illustrated by an example:
>
> 1. getTriplesByType() is called. If this is the first time it is
> called, the entire document must be traversed to build an RDF graph.
> 2. Element/attributes are added/removed by script.
> 3. getTriplesByType() is called again.
>
> At step 3, does the entire document need to be traversed again? In
> other words, is it possible to efficiently cache the graph? Caching
> would amount to storing bindings between each element/attribute and
> the role it plays in the graph. Consider for example if the @lang
> attribute of some element is changed. To update the graph, it's
> necessary to know which triples have their language sourced from the
> element or any of its children. Adding/removing/updating xmlns
> attributes would be similarly messy. With many attributes influencing
> the graph, there's going to be a *lot* of bindings to keep track of,
> and in practice the graph is going to be extremely tightly coupled to
> the DOM.
>
> Note that browsers have some infrastructure for updating collections
> dynamically for things like getElementsByTagName, but it's usually a
> lot simpler as it's only a single aspect of the element that is
> considered and it maintains a collection of elements directly, not a
> separate structure (RDF graph) parsed from them.
>
> IMO, a better approach would be an API returning a live NodeList where
> the criteria for inclusion/exclusion are much simpler.

This issue is no longer relevant.

> == Triples.children ==
>
> children seems a very strange thing to call all triples involving the
> same subject, is this really intentional? Regardless, my main question
> is related to the above. If getTriplesByType returns a static array,
> what does later inspecting triple.children return? Is it the
> "children" that triple had at the time getTriplesByType was called, or
> something else? If it is the former, then it implies that each call to
> getTriplesByType must find all "children" up-front, as it's not
> possible to wait until the children IDL attribute is actually read to
> find the "children". This seems extremely wasteful. If it is something
> else, then the result array isn't static at all. Either way, this must
> be defined.

This issue is no longer relevant.

> == RDFa Profiles ==
>
> Is it intended that the DOM API work with RDFa Profiles? Supporting it
> in browsers seems fairly problematic.
>
> 1. Consider this script:
>
> var e = document.querySelector("[profile]");
> e.profile = "http://example.com/previously-unseen-profile";
> document.getTriplesByType("http://examples.com/some-type");
>
> This clearly will not work, since the browser won't synchronously
> download and parse the profile while the script is running. Given
> this, how is a script to know when the API is safe to use?
>
> 2. Should browsers preemptively fetch/parse all profiles in a
> document, even though 99% of documents won't use the getTriplesByType
> API? Should that delay the document load event? (related to the above
> question)

These two sub-issues would appear to be solved by the async nature of
DataParser.parse.

> 3. If a profile actually becomes widely used, aren't you worried about
> the DDoS that will result? Compare to the problems of DTD outlined in
> <http://lists.w3.org/Archives/Public/public-html/2008Jul/0269.html>.
>
> 4. Should browsers have Turtle and RDF/XML parsers to handle the case
> where the profile is using those syntaxes? MAY is a keyword for
> interoperability disaster, at least in the context of web browsers...

There two sub-issues are still relevant.

--
Philip Jägenstedt
Received on Monday, 4 July 2011 13:42:45 UTC