Re: RDFa API for browsers

Hi Toby,

On Wed, Oct 21, 2009 at 9:48 AM, Toby Inkster <tai@g5n.co.uk> wrote:
> On Tue, 2009-10-20 at 23:57 -0400, Manu Sporny wrote:
>> The conversation started when I pointed out that we might want to
>> start focusing on an RDFa API for Javascript running in browsers since
>> Mozilla seems to be open to implementing the Microdata API[3].
>
> Just some ideas...
>
> // Query the union of all data found on the page:
> var r = document.meta().query('SELECT ?foo WHERE ...');
>
> // Just query the data found in RDFa:
> var r = document.meta('rdfa').query('SELECT ?foo WHERE ...');

This is interesting...why would you prefer to keep the triples from
different formats separate?

I'm not saying you shouldn't. :)

It's just that I've always worked on the assumption that everyone
would want all the metadata to be bundled into one common, queryable
location.

There are two other partitioning mechanisms that I've been playing
with. The first is the idea of a store, which seems to be pretty
common to a number of RDF parsers.

If you did want to keep the output of the different syntaxes separate,
then the store could be the unit of partition, retaining 'meta' for
the general interface into this and all future metadata APIs:

  document.meta.store['rdfa'].query( ... );
  document.meta.store['microformats'].query( ... );

I.e., this is almost exactly the same as your syntax, but the
partitioning is pushed down one level. This is probably slightly
easier to implement, too, since the first thing a parser will do is to
create its own store:

  document.meta.store["rdfa"] = document.meta.createStore();


The second unit of partition I've been looking at is of course the named graph.

This is quite handy, because it means that you can manage groups of
triples separately, but still query across all of them if you want.

At the moment I've been playing with named graphs within a store, but
we might consider making a store a referenceable named graph, too:

  document.meta.store['http://www.w3.org/1999/xhtml/vocab#rdfa'].add( ... );
  document.meta.store['http://www.w3.org/1999/xhtml/vocab#microformats'].add(
... );
  document.meta.query( 'SELECT ?foo FROM
http://www.w3.org/1999/xhtml/vocab#rdfa ...' );

Or alternatively, we could use the first technique:

  document.meta.store['rdfa'].add( ... );
  document.meta.store['microformats'].add( ... );

but then say that you can either query each individual store:

  document.meta.store['rdfa'].query( ... );
  document.meta.store['microformats'].query( ... );

or you can have your query passed to each store, and the result is the
union of the queries:

  document.meta.query( );

A general rule I think we should follow is that from a JS programmers
perspective, the default behaviour is that there is one store,
containing all of the metadata derived from the page. The fact that it
came via RDFa or uF is not important. If it is important for them,
then they can query the corresponding store directly.


> for (var i in r)
> {
>  // r[i].foo typeof 'RDFNode'.
>  if (r[i].foo.type == 'literal')
>    window.alert(r[i].foo.datatype);
> }

My preference here is for the default mode to be JSON objects. If you
look at it from the point of view of a JS programmer, then a query is
essentially a request to construct an array of JSON objects, that are
based on a certain template.

For example, a query for "?name" is really a request to create an
array of objects, each with the single property "name":

  [
    {
      name: "Toby Inkster"
    },
    {
      name: "Manu Sporny"
    }
  ]

There's no need for our JavaScript authors to be testing types, etc.
-- they should get back an immediately usable set of objects.

I know people are going to say "what about the difference between URIs
and literals?", "what about data types?"...and so on.

Yes..of course, we'll need to take all of those things into account. :)

But my argument is that we don't want to begin with a design model
that creates an API that looks no different to one that we would
create in Java or C, or that simply mimic triples in a data structure.
I think we should be creating a set of interfaces that take advantage
of JavaScript as a language, and therefore one that feels natural to
JS programmers.

For example, in my library I have a backward-chaining feature, that
from a JavaScript perspective simply adds properties to an object, if
they are missing. It's for handling the usual "if engine = true and
wheels = 4 then type = car" kind of examples -- but it's a very
powerful technique for programmers, and works really nicely if you do
it at the JS level rather than merely at the triple level.

Anyway, this part of the discussion -- the triple to/from JSON mapping
-- is worthy of a separate thread, I think.


> // Get the RDFa data as a RDF/JSON-like object:
> var data = document.meta('rdfa').data;
>
> // Get the RDFa data as an array of triples:
> var triples = document.meta('rdfa').triples;
> for (var i in triples)
> {
>  // each triple has subject, object, predicate and graph properties
>  var g = triples[i].graph; // named graph URI
>  var s = triples[i].subject;

Or we could just use query() as the interface mechanism, and specify
the output format we want:

  document.meta.store["rdfa"].query("SELECT ?foo ...", "raw");
  document.meta.store["rdfa"].query("SELECT ?foo ...", "JSON");
  etc.

It would be good to make the default values those that JS programmers
would expect, so that again, any idea of triples, etc., is hidden from
them. We could say that


>  // RDFNode.token returns a Turtle-like token
>  // (i.e. URIs in <>, literals in "", bnodes start _:).
>  if (s.type != 'bnode')
>    window.alert(s.token);
> }

This is how I've done things by default in my library. I think this is
better than having extra properties for types, since the most common
type for a JS programmer will be the plain string. However, the twist
in my library is that if the data is of a type that JS supports, then
it gets converted on output. For example:

  [
    {
      name: "Toby Inkster",
      foo: 17
    },
    {
      name: "Manu Sporny",
      bar: 93.25,
      action: function () {
        ...
      }
    }
  ]

As you can see, the principle is always to make the object feel
'natural' to a JS programmer, and in a sense to conceal its RDF
origins.


> // Can also grab data from Microdata and GRDDL if the browser
> // supports those.
> var data = document.meta('grddl').data;
> var r = document.meta('items').query('SELECT ?foo WHERE ...');

Assuming we do decide that we should allow access to the data
separately (and you are probably right here, that we do), then as I
said above, I think an array of stores would be slightly easier to
manipulate:

  var data = document.meta.store['grddl'].data;
  var r = document.meta.store['items'].query('SELECT ?foo WHERE ...');

Great thoughts though Toby! Hopefully we can keep this moving.

Regards,

Mark

--
Mark Birbeck, webBackplane

mark.birbeck@webBackplane.com

http://webBackplane.com/mark-birbeck

webBackplane is a trading name of Backplane Ltd. (company number
05972288, registered office: 2nd Floor, 69/85 Tabernacle Street,
London, EC2A 4RR)

Received on Wednesday, 21 October 2009 10:13:22 UTC