Re: UPDATED Telecon Agenda - 6th May 2010, 1400 UTC from Mark Birbeck on 2010-05-06 (public-rdfa-wg@w3.org from May 2010)

From: Mark Birbeck <mark.birbeck@webbackplane.com>
Date: Thu, 6 May 2010 12:18:07 +0100
To: Ivan Herman <ivan@w3.org>
Cc: Manu Sporny <msporny@digitalbazaar.com>, RDFa WG <public-rdfa-wg@w3.org>
Message-ID: <o2i640dd5061005060418j446c0131w1d1a3891bb316f2@mail.gmail.com>
Hi Ivan,

Great...thanks for the comments.

> a few notes on the document you posted. I guess more discussions this afternoon...
>
> 1. I think having an explicit concept for a store and different parsers on the store is a
> good idea. I must admit that the constructions you have seem to be a bit too convoluted
> for my taste. Being obviously influenced by RDFLib (that have a similar concept), I think
> something like
>
> data = document.data
> data.parse("rdfa")
> data.query(...)
>
> should be enough. Yes, there could be a separate set of 'registration' for parsers on the
> data object if one wants to register, say, a direct turtle parser, but going through all this
> dance of creation of a query, and a parser all the time for all users seems to be an
> unnecessary drag:-( I think there is room for simplification there.

Lack of time meant that I haven't explained this very well, but the
key thing is that we need to create the interfaces that are necessary
to control parsing, storing triples and then querying them, and then
assemble those interfaces in a useful way.

The reason I feel it's important to at least agree on what a store
will look like (from an API point of view), is because then people can
get on and do clever things with the store, without having to worry
about whether it will work with the parser interface.

For example, if I create a Store that is connected to the HTML5 Web
Storage facility, I can be confident that it will work with any
Parser, provided it exposes the add() method (and a few other bits and
bobs).

However, if we don't define what a Store looks like, and don't specify
Parser or Query interfaces either, then people can only extend this
API with more methods...and that will become a right mess!

Also, it's important to break things down in the way I've done so that
you can get a handle on the life-cycle of our objects; at some point,
people who know far more about browser technologies than we do are
going to come along and ask when *exactly* does parsing take place?
And when *exactly* are the triples available for the author to query?
And how do I get notified if a profile didn't load (Jeni's issue)?
Etc.

We are in a better position to answer these questions -- or add
functionality for the things we've missed -- if we have broken
everything down into its components and have a clear idea of the
lifecycle.

Now, having said all of that, once you've got the Lego ready, there is
no reason not to offer the end-user one or two simple interfaces. For
example, as I say in my document, when implemented in a browser, the
author should only see this:

  var el = document.data.getElementsByType( foaf.Person );

As far as programmers are concerned all necessary initialisation has
been done for them, in just the same way that all initialisation has
been done for us when we call:

  var el = document.getElementsById( "me" );

But as I keep stressing to Manu in our conversations, simplification
is easy once you have the architecture right. :) But the converse is
not the case -- start with simple methods, but don't define what is
going on under the hood, and it becomes very difficult to extend, and
worse, very ambiguous to implement.


> 2. Obviously, the real issue is the specification of the query; that is the complex piece.

Yes...sort of. But it's only possible on top of a clearly defined architecture.


> First of all, from an RDF point of view, it is 'subject' oriented. Ie, I can get various triples for a
> specific subject, but you do not seem to allow for the search of subjects via patterns.

That was only to do with lack of time.

My solution to this is actually a describe() method, which will create
an property object for a particular identifier.

(Yes...SPARQL-inspired...more on that below.)


> That might
> cover a number of cases for RDFa, but if we really have a general concept of a store where I
> could also plug, say, a turtle parser, then this approach breaks down (because there is no
> reference to a DOM node any more...)

I don't quite follow that, but I'm sure it will come out in our discussions.

(I'm not sure if it's what you are talking about, but the DOM node
goes into the triple store; the add method used in Store takes a
triple plus a pointer to any object, and that will usually be the DOM
node that is in scope when the triple is generated.)


> That issue put aside, what you seem to have is a select with
>
> { property1 : object1, property2 : object2 }
>
> patterns. But there are a couple of questions:
>
> - can I have a variable for a property? Ie, can I search for a property, too?

Yes...definitely. You can put "?x" anywhere.


> - for the object (or predicate), how do I differentiate among
>   - an object being a fixed literal
>   - an object being a fixed URI reference (ie, a Resource)
>   - an object being a fixed literal whose value happens to be a URI

I think we should debate this...and my guess is that we'll get a
heated debate. :)

Bu in my view (I can feel the flames licking already...) we gain
nothing by differentiating these things.

There may be subtle use-cases that I haven't thought of, but in a
browser environment, the benefits of treating everything as a string
-- to me, at least -- far outweigh the benefits of knowing that this
is a string "http://example.org" and not a resource (or the other way
around).

(Note also, that the OGP initiative has URIs as strings, so there is
going to be a lot of blurring going on, anyway.)

Note that I'm not suggest that the parser doesn't know the difference;
underpinning the API is a triple-store, after all. All I'm saying is
that from the point of view of coming up with a query syntax that
works for the enormous number of web developers that there are out
there, what do we gain by making the distinction?


>   - an object being an unbound variable
>   - an object being an unbound variable for a literal with a fixed or variable language or datatype?
>   - (probably other issues)

Yes, there will be many of these kinds of things. My general
'approach' is to say that we make the simple query language simple,
but allow for more precision for those who need it.

Also, I haven't had time yet to add the datatype stuff, but I would
suggest that basic datatypes like numbers and dates are converted to
native types automatically.


> Note that you use "?summary" as a pattern in general but that is incorrect: what if I have a
> fixed _literal_ whose value is "?summary"?

Again, I'd argue for making the most common use-cases simple. So we
_could_ go the route that says strings and patterns must be distinct,
e.g.:

  { "?p", "?o" }

versus:

  { "?p", "'?o'" }

(Note the apostrophes.)

However, now every string must be escaped.

Alternatively, why not just say that on the odd occasion that your
string begins with a question-mark, use a preceding slash, or a
double-question mark, or some other escape character.

Alternatively, we can have a configuration option:

  document.data.setParam( { "queryPrefix": "$" } );

Anyway...there are lots of ways we can tackle this.


> Answering all those questions leads, in my view, to an API for SPARQL. Ok, we may not
> define the OPTIONAL from a SPARQL pattern...

Why not? I hadn't thought of it before, but why not allow other
characters to occupy the '?' position? So we might have:

  {
    a: "http://rdf.data-vocabulary.org/#Event",
    "http://rdf.data-vocabulary.org/#summary": "?summary",
    "http://rdf.data-vocabulary.org/#startDate": "?start",
    "http://rdf.data-vocabulary.org/#endDate": "*end"
  }

And '*end' would be optional.

I'm not that keen on it...but my point is that we're not constrained
by anything in particular, other than coming up with an easy syntax.



> ... we may not have references to graphs, etc,...

I have implemented a prototype that uses named graphs and 'mappers'
which control what to do with certain patterns, and it makes
XMLHttpRequest programming *much* easier! So in the longer term I
don't see why we wouldn't support this, if we can come up with some
syntax.


> ...so it may be a simplified API, but it is certainly more complex, at least in my view, than
> what you outline. Which also means that it will require more work and the interface will be
> more complicated.

I strongly disagree.

First, on SPARQL: my take is that yes, I am trying to recreate SPARQL,
but only in the sense that it has a good model to follow, and the WG
behind it have through through a lot of issues.

But not in the sense that I want to recreate all parts of the language.

For example, if we ignore the method used (i.e., Query.select()) my
queries look like this:

  {
    a: "http://rdf.data-vocabulary.org/#Event",
    "http://rdf.data-vocabulary.org/#summary": "?summary",
    "http://rdf.data-vocabulary.org/#startDate": "?start",
    "http://rdf.data-vocabulary.org/#endDate": "?end"
  }

Yes, there is a hat-tip to SPARQL, but note that in SPARQL terms this
is only the 'where' clause; I'm placing ?summary, ?start and ?end in
the result-set without the author having to express them. In SPARQL
terms I'm saying that this is my template:

  SELECT *
  WHERE {
    ... our stuff goes here ...
  }

rather than the author laboriously having to do this:

  SELECT ?summary ?start ?end
  WHERE {
    ... our stuff goes here ...
  }

(Of course, authors should be allowed to do this in the RDFa API if
they need to, but I see that as much less common than the scenario
where you use all variables, so I don't see the point in making that
the starting-point.)

But the key difference in the model I'm proposing is that the result
of a query is an *object*. In SPARQL we still think in terms of rows,
a la SQL. But the model that web programmers need is that each row is
actually an object, ready to be used.


> (Actually... I have been there. Some years ago I did develop a core SPARQL engine for RDFLib
> but I was lazy to add a parser to it, so I did it by defining some sort of an API. There is an old
> description that I have just put on the Web[1]. Full disclosure: I made the same mistake at first
> by overloading the string with a "?xx" for a variable until TimBL rubbed my nose into this:-))

I don't believe it's a mistake, at least not in the sense of it being
something I hadn't thought about.

Of course, everyone might argue that strings that begin with a
question mark are so common that we need to differentiate between a
pattern and a string, right from the start -- but for now I'll wait
for those use-cases to emerge.


> 3. All that being said, such a SPARQL API may good to have at some point. But... I fear it is
> way too complicated for those end users who do not really have a feel for RDF triples. So the
> question is where do we find the sweet spot of what is useful and still palatable? I am not sure
> I have the answer, to be sure. But I am afraid of the comparison between the microdata API
> (that is, on the surface, very simple, though some of the complications are hidden in the details)
> and what we are heading for...

Forgive me if I say that you are tilting at windmills here. :)

Your argument goes as follows:

* my current query syntax doesn't take into account lots of RDF-stuff,
like literals v. resources, etc.;
* in order to take those things into account, we need full-blown SPARQL;
* but full-blown SPARQL is too complicated for the average user;
* our API is therefore too complicated.

I would just suggest that we go back a few steps, and establish the
foundation which is that whilst SPARQL has many ideas that we need, it
is essentially a general-purpose language that can be used in many
situations, and we have a constrained environment.

Also, our target audience has a particular mindset when it comes to
programming, so we should be aiming to recreate the features from
SPARQL that we need for this audience and environment, and only
provide those extra features for the more advanced programmer.


> I said 'may' be good to have, because an alternative is to go the PHP-SQL way, which is simply to
> be able to use a full SPARQL query string as an argument and that is it. The implementation may
> be more complicated, because you have to have a proper parser, but that may be cleaner and
> certainly with less headache.

I'm sure someone will write a plug-in for SPARQL that supports the
Query interface, and so works with the Parsers and Stores in our API.
But for our audience I think we can do a lot better than that.


> 'See' you this afternoon!

Yes, looking forward to it! And thanks again for the comments.

Regards,

Mark

--
Mark Birbeck, webBackplane

mark.birbeck@webBackplane.com

http://webBackplane.com/mark-birbeck

webBackplane is a trading name of Backplane Ltd. (company number
05972288, registered office: 2nd Floor, 69/85 Tabernacle Street,
London, EC2A 4RR)
Received on Thursday, 6 May 2010 11:18:46 UTC