- From: Mark Birbeck <mark.birbeck@webbackplane.com>
- Date: Thu, 6 May 2010 12:18:07 +0100
- To: Ivan Herman <ivan@w3.org>
- Cc: Manu Sporny <msporny@digitalbazaar.com>, RDFa WG <public-rdfa-wg@w3.org>
Hi Ivan, Great...thanks for the comments. > a few notes on the document you posted. I guess more discussions this afternoon... > > 1. I think having an explicit concept for a store and different parsers on the store is a > good idea. I must admit that the constructions you have seem to be a bit too convoluted > for my taste. Being obviously influenced by RDFLib (that have a similar concept), I think > something like > > data = document.data > data.parse("rdfa") > data.query(...) > > should be enough. Yes, there could be a separate set of 'registration' for parsers on the > data object if one wants to register, say, a direct turtle parser, but going through all this > dance of creation of a query, and a parser all the time for all users seems to be an > unnecessary drag:-( I think there is room for simplification there. Lack of time meant that I haven't explained this very well, but the key thing is that we need to create the interfaces that are necessary to control parsing, storing triples and then querying them, and then assemble those interfaces in a useful way. The reason I feel it's important to at least agree on what a store will look like (from an API point of view), is because then people can get on and do clever things with the store, without having to worry about whether it will work with the parser interface. For example, if I create a Store that is connected to the HTML5 Web Storage facility, I can be confident that it will work with any Parser, provided it exposes the add() method (and a few other bits and bobs). However, if we don't define what a Store looks like, and don't specify Parser or Query interfaces either, then people can only extend this API with more methods...and that will become a right mess! Also, it's important to break things down in the way I've done so that you can get a handle on the life-cycle of our objects; at some point, people who know far more about browser technologies than we do are going to come along and ask when *exactly* does parsing take place? And when *exactly* are the triples available for the author to query? And how do I get notified if a profile didn't load (Jeni's issue)? Etc. We are in a better position to answer these questions -- or add functionality for the things we've missed -- if we have broken everything down into its components and have a clear idea of the lifecycle. Now, having said all of that, once you've got the Lego ready, there is no reason not to offer the end-user one or two simple interfaces. For example, as I say in my document, when implemented in a browser, the author should only see this: var el = document.data.getElementsByType( foaf.Person ); As far as programmers are concerned all necessary initialisation has been done for them, in just the same way that all initialisation has been done for us when we call: var el = document.getElementsById( "me" ); But as I keep stressing to Manu in our conversations, simplification is easy once you have the architecture right. :) But the converse is not the case -- start with simple methods, but don't define what is going on under the hood, and it becomes very difficult to extend, and worse, very ambiguous to implement. > 2. Obviously, the real issue is the specification of the query; that is the complex piece. Yes...sort of. But it's only possible on top of a clearly defined architecture. > First of all, from an RDF point of view, it is 'subject' oriented. Ie, I can get various triples for a > specific subject, but you do not seem to allow for the search of subjects via patterns. That was only to do with lack of time. My solution to this is actually a describe() method, which will create an property object for a particular identifier. (Yes...SPARQL-inspired...more on that below.) > That might > cover a number of cases for RDFa, but if we really have a general concept of a store where I > could also plug, say, a turtle parser, then this approach breaks down (because there is no > reference to a DOM node any more...) I don't quite follow that, but I'm sure it will come out in our discussions. (I'm not sure if it's what you are talking about, but the DOM node goes into the triple store; the add method used in Store takes a triple plus a pointer to any object, and that will usually be the DOM node that is in scope when the triple is generated.) > That issue put aside, what you seem to have is a select with > > { property1 : object1, property2 : object2 } > > patterns. But there are a couple of questions: > > - can I have a variable for a property? Ie, can I search for a property, too? Yes...definitely. You can put "?x" anywhere. > - for the object (or predicate), how do I differentiate among > - an object being a fixed literal > - an object being a fixed URI reference (ie, a Resource) > - an object being a fixed literal whose value happens to be a URI I think we should debate this...and my guess is that we'll get a heated debate. :) Bu in my view (I can feel the flames licking already...) we gain nothing by differentiating these things. There may be subtle use-cases that I haven't thought of, but in a browser environment, the benefits of treating everything as a string -- to me, at least -- far outweigh the benefits of knowing that this is a string "http://example.org" and not a resource (or the other way around). (Note also, that the OGP initiative has URIs as strings, so there is going to be a lot of blurring going on, anyway.) Note that I'm not suggest that the parser doesn't know the difference; underpinning the API is a triple-store, after all. All I'm saying is that from the point of view of coming up with a query syntax that works for the enormous number of web developers that there are out there, what do we gain by making the distinction? > - an object being an unbound variable > - an object being an unbound variable for a literal with a fixed or variable language or datatype? > - (probably other issues) Yes, there will be many of these kinds of things. My general 'approach' is to say that we make the simple query language simple, but allow for more precision for those who need it. Also, I haven't had time yet to add the datatype stuff, but I would suggest that basic datatypes like numbers and dates are converted to native types automatically. > Note that you use "?summary" as a pattern in general but that is incorrect: what if I have a > fixed _literal_ whose value is "?summary"? Again, I'd argue for making the most common use-cases simple. So we _could_ go the route that says strings and patterns must be distinct, e.g.: { "?p", "?o" } versus: { "?p", "'?o'" } (Note the apostrophes.) However, now every string must be escaped. Alternatively, why not just say that on the odd occasion that your string begins with a question-mark, use a preceding slash, or a double-question mark, or some other escape character. Alternatively, we can have a configuration option: document.data.setParam( { "queryPrefix": "$" } ); Anyway...there are lots of ways we can tackle this. > Answering all those questions leads, in my view, to an API for SPARQL. Ok, we may not > define the OPTIONAL from a SPARQL pattern... Why not? I hadn't thought of it before, but why not allow other characters to occupy the '?' position? So we might have: { a: "http://rdf.data-vocabulary.org/#Event", "http://rdf.data-vocabulary.org/#summary": "?summary", "http://rdf.data-vocabulary.org/#startDate": "?start", "http://rdf.data-vocabulary.org/#endDate": "*end" } And '*end' would be optional. I'm not that keen on it...but my point is that we're not constrained by anything in particular, other than coming up with an easy syntax. > ... we may not have references to graphs, etc,... I have implemented a prototype that uses named graphs and 'mappers' which control what to do with certain patterns, and it makes XMLHttpRequest programming *much* easier! So in the longer term I don't see why we wouldn't support this, if we can come up with some syntax. > ...so it may be a simplified API, but it is certainly more complex, at least in my view, than > what you outline. Which also means that it will require more work and the interface will be > more complicated. I strongly disagree. First, on SPARQL: my take is that yes, I am trying to recreate SPARQL, but only in the sense that it has a good model to follow, and the WG behind it have through through a lot of issues. But not in the sense that I want to recreate all parts of the language. For example, if we ignore the method used (i.e., Query.select()) my queries look like this: { a: "http://rdf.data-vocabulary.org/#Event", "http://rdf.data-vocabulary.org/#summary": "?summary", "http://rdf.data-vocabulary.org/#startDate": "?start", "http://rdf.data-vocabulary.org/#endDate": "?end" } Yes, there is a hat-tip to SPARQL, but note that in SPARQL terms this is only the 'where' clause; I'm placing ?summary, ?start and ?end in the result-set without the author having to express them. In SPARQL terms I'm saying that this is my template: SELECT * WHERE { ... our stuff goes here ... } rather than the author laboriously having to do this: SELECT ?summary ?start ?end WHERE { ... our stuff goes here ... } (Of course, authors should be allowed to do this in the RDFa API if they need to, but I see that as much less common than the scenario where you use all variables, so I don't see the point in making that the starting-point.) But the key difference in the model I'm proposing is that the result of a query is an *object*. In SPARQL we still think in terms of rows, a la SQL. But the model that web programmers need is that each row is actually an object, ready to be used. > (Actually... I have been there. Some years ago I did develop a core SPARQL engine for RDFLib > but I was lazy to add a parser to it, so I did it by defining some sort of an API. There is an old > description that I have just put on the Web[1]. Full disclosure: I made the same mistake at first > by overloading the string with a "?xx" for a variable until TimBL rubbed my nose into this:-)) I don't believe it's a mistake, at least not in the sense of it being something I hadn't thought about. Of course, everyone might argue that strings that begin with a question mark are so common that we need to differentiate between a pattern and a string, right from the start -- but for now I'll wait for those use-cases to emerge. > 3. All that being said, such a SPARQL API may good to have at some point. But... I fear it is > way too complicated for those end users who do not really have a feel for RDF triples. So the > question is where do we find the sweet spot of what is useful and still palatable? I am not sure > I have the answer, to be sure. But I am afraid of the comparison between the microdata API > (that is, on the surface, very simple, though some of the complications are hidden in the details) > and what we are heading for... Forgive me if I say that you are tilting at windmills here. :) Your argument goes as follows: * my current query syntax doesn't take into account lots of RDF-stuff, like literals v. resources, etc.; * in order to take those things into account, we need full-blown SPARQL; * but full-blown SPARQL is too complicated for the average user; * our API is therefore too complicated. I would just suggest that we go back a few steps, and establish the foundation which is that whilst SPARQL has many ideas that we need, it is essentially a general-purpose language that can be used in many situations, and we have a constrained environment. Also, our target audience has a particular mindset when it comes to programming, so we should be aiming to recreate the features from SPARQL that we need for this audience and environment, and only provide those extra features for the more advanced programmer. > I said 'may' be good to have, because an alternative is to go the PHP-SQL way, which is simply to > be able to use a full SPARQL query string as an argument and that is it. The implementation may > be more complicated, because you have to have a proper parser, but that may be cleaner and > certainly with less headache. I'm sure someone will write a plug-in for SPARQL that supports the Query interface, and so works with the Parsers and Stores in our API. But for our audience I think we can do a lot better than that. > 'See' you this afternoon! Yes, looking forward to it! And thanks again for the comments. Regards, Mark -- Mark Birbeck, webBackplane mark.birbeck@webBackplane.com http://webBackplane.com/mark-birbeck webBackplane is a trading name of Backplane Ltd. (company number 05972288, registered office: 2nd Floor, 69/85 Tabernacle Street, London, EC2A 4RR)
Received on Thursday, 6 May 2010 11:18:46 UTC