Re: New Editors draft of RDFa API spec from Ivan Herman on 2010-09-13 (public-rdfa-wg@w3.org from September 2010)

From: Ivan Herman <ivan@w3.org>
Date: Mon, 13 Sep 2010 11:17:48 +0200
To: Manu Sporny <msporny@digitalbazaar.com>
Cc: RDFa WG <public-rdfa-wg@w3.org>
Message-Id: <DA65380F-4C47-497E-93C9-83E95DA4D5E4@w3.org>
Hey Manu,

again, sorry about the noise, I did not catch this mail...

I removed everything from the answer that does not require further comment and where your change are fine with me!

On Sep 13, 2010, at 05:36 , Manu Sporny wrote:
[snip]

> 
>> B.t.w., There is a missing conformance requirement section. The
>> geolocation document seems to contain a good one, also referring to
>> the binding issues on webidl (ie, that the ecmascript binding to
>> webidl must be used).
>> 
>> That also affects a bit the last paragraph in section 2. The WebIDL
>> document includes a binding to Java, so it might be good to refer to
>> the fact that Java implementations should use that one. Nothing can
>> be said about the other languages, though...
> 
> Fixed. I added a conformance section to 4.1. This is fairly late in the
> document, but every section up to that point is non-normative and
> doesn't must MUST, SHOULD, etc. We can always move the conformance
> section to the top of the document if people don't like how this reads.
> I also added the Java clause as well as a clause stating that a best
> effort should be performed for languages other than ECMASCript and Java.

Fine with Java for now, but that might be a slightly different discussion on how to handle the RDFa specific API parts and the generic RDF API issue (Sandro's comments). One approach would be to really really emphasize the ECMAScript part only, not to have some sort of a collision course with all the various RDF toolkit implementations out there in Java already (let alone other languages).

[snip]

> 
>> ---- 2.1. Goals: although there should be a general discussion on
>> this, it may be worth emphasizing that not only the API allows for
>> non-RDFa parsers to be used, but the interface offers some sort of a
>> generic API to RDF...
> 
> Fixed. I added a sub-section called "A Modular RDF API" to try and
> clarify this a bit more.

Actually... in view of the discussion we had last week on the call, maybe it is better not to have that there for now. It was not an original design goal in the first place, it just happened that way. It definitely was not part of the charter, for example... Sorry to have led you to this!

[snip]

> 
>> ---- 2.2 Concept diagram: I am not sure how, but it might be good to
>> have on the diagram and the accompanying text, references to some of
>> the 'sections' of the document. We use, for example, the term 'RDF
>> Interfaces' in the text; maybe using the same term on the diagram
>> would be good (if the diagram is in SVG, it should be a clickable
>> link to the relevant section...). Same for the others and the text
>> itself.
> 
> I agree with you in principle - things start to fall apart after that...
> 
> I tried SVG without all of the extra non-W3C shim code required to make
> SVG work cross-browser. I tried to make native SVG work for 4 hours
> straight one day... couldn't get it to work across all browsers - sizing
> issues. I gave up. The source document is in SVG if someone would like
> to give it a shot.

What I did in the past is to use <object> with a fall back to a png version. Didn't that work?

There might be something in the SVG file itself: many SVG generators do a very stupid thing by putting the width and the height explicitly in the header, instead of a viewBox. That means the 'scaling' aspect of SVG is shut down. One has to remove that to control the size...

(Whenever I generate SVG from AI, I have to manually change that)

[snip]

> 
>> I must admit I had to look up what this foaf:myersBriggs property
>> means. Can't we use a somewhat less esoteric example?
> 
> Could you provide a suggestion? It took me 2 hours to come up with and
> implement that example for an advanced query :). I don't want to
> implement something else unless we have some kind of general agreement
> that the example is not esoteric. That and my brain hurts right now...
> help? :)
> 

using the 'title' attribute because I want to hire people with a 'Dr' degree? It sounds very similar...


>> ---- PlainLiteral definition. Why does one need a stringifier for a
>> value and not for the language attribute? Aren't both strings in the
>> first place?
> 
> 'stringifier' tells the language which value or method should be used if
> the object is converted into a DOMString. In the PlainLiteral's case, if
> one were to convert the PlainLiteral into a string, the value attribute
> would be used to generate the string while the language attribute would
> be ignored.
> 

Ah! O.k., I get it. And...

[snip]
> 
>> (Or is it
>> defined in WebIDL in general and I just do not know it? Maybe worth
>> emphasizing for outsiders like me...)
> 
> Understanding WebIDL is a pre-requisite of reading the spec. I know this
> is somewhat esoteric and is deep WebIDL magic, so I've tried to
> elaborate the RDF Node interface's value attribute to make it more
> clear. Let me know if you think this is good enough.

... I see that one, too! Now it has become fool-proof:-)

[snip]
> 
>> ---- TypedLiteralConverter interface: I do not understand what the
>> targetType parameter is for. Either give a good (and convincing:-)
>> example, or drop it if it has only a very restricted use...
> 
> Added an example and another method to the DataContext to aid
> TypedLiteral conversion. We can't remove this Mark has a plan for
> stating targetTypes for TypedLiteral converters. I don't fully
> understand his plan, so the interface is a bit shoddy as I don't know
> exactly how Mark wants to see it implemented. It's a bit clunky right
> now... perhaps Mark has some insight into how we could make it cleaner.


The example with 'caps' is actually not a 'targetType'. Maybe the issue is having a misnomer here. In this context we have types as xsd:integer, xsd:boolean, etc, whereas the targetType is some sort of a modifier or an extra attribute or something like that.

I have the feeling of pushing a functionality into the RDFa API interface that is not really necessary and can be handled on the application level. The example is simply to have an all caps for certain strings; this is hardly something I would see on that level, to be honest!

At the minimum I would like to see some editorial note should be added to the interface that this is still to be discussed. I am not yet ready to agree having it in the interface...

[snip]

> 
>> I continue to be puzzled by the filter method, ie, by the fact that
>> it returns another DataStore, rather than just an array of
>> RDFTriple-s. I just do not get it... The PropertyGroup, for example,
>> returns a 'Sequence' argument, ie, it is possible to just return an
>> array. This should be discussed.
> 
> We do this to support filter chaining, so you can do stuff like:
> 
> var abcStore =
> document.data.store.filter(FILTER_A).filter(FILTER_B).filter(FILTER_C);
> 
> Keep in mind that you can only filter one subject, one property or one
> object at a time. You may have an RDFTripleFilter function for each
> FILTER_* that does things like "count but pass" triples. So FILTER_A and
> FILTER_B could analyze each stage of the DataStore, but then FILTER_C
> does the actual filtering based on data collected by FILTER_A and FILTER_B.
> 
> This is a /very/ advanced use case, but it allows very complex queries
> to be done in fairly compact code.
> 
> We could achieve the same result by returning Sequences from the
> DataStore.filter() method, but if we take that route, we have to make it
> easy to construct a new DataStore... and the code is much more
> verbose/bloated.

Sorry Manu, still not convinced. I do understand the argument, but as you say yourself, this is a /very/ advanced use case. Do we have to define an API that makes very advanced use cases easier while making the simple use cases more complicated or less natural and also more complicated to implement or vice versa? My impression is that the current approach chooses the former and I wonder whether the right approach would not be the latter. Besides, we also have a query interface that can take care of many things!

What I mean: I have not implemented a data store myself, but I had a glimpse into what, say, RDFLib for what it calls a graph. It _is_ more complicated that just an array of triples; because it has to be prepared to check the presence of a triple, its internal storage is more than just an array. I can imagine that it has several dictionaries for the triples, depending on whether you are look for them via predicates, subjects, etc. Ie, it is a relatively heavy stuff. What we have here that even for simple cases the implementation has to use this heavy stuff when a simple array would do. I think therefore we optimize on the wrong place. 

If an advanced user wants to use a datastore, he/she can create it. Maybe we could extend the DataStore interface by an 'addTriples' method that takes a whole array and adds all triples in the array in one call. If we do that, then the output of the filter method could be directly fed into a new datastore and the next filter could be applied if necessary. Yes, it is a little bit more complex but only for the advanced examples...

I think I would like to see an editorial note saying that this is still under discussion (and maybe an explicit issue?)


> 
>> I am not fully convinced about the necessity of having the 'forEach'
>> method. Sure, I can see its utility, but its functionality can easily
>> be programmed by a cycle through the triples of the store and it
>> seems to add too much to the Data store interface. I would consider
>> removing it altogether, including the DataStoreIterator interface.
> 
> Sure, we could remove forEach... but given the two choices - procedural
> iteration through the DataStore, or a functional iteration through the
> DataStore, I would personally pick the functional one more times than
> the procedural one. Doing functional programming in Javascript happens
> more naturally than in Python or many of the other functional-supporting
> languages.
> 
> It's very difficult to explain this, but when I started out using
> Javascript, I tended towards using procedural programming and it was
> always very awkward. For some reason, programming in a more functional
> way in Javascript ends up not biting you as much as programming in it
> procedurally... and after a while, you start to enjoy using Javascript's
> more functional aspects more often than the procedural ones. Our entire
> engineering team went through this transition - hating it at first and
> now it's something that is integral to the way we develop Javascript code.
> 
> So, while it's good to simplify... I'd be bothered by removing it at
> this point in time... perhaps we should discuss this more as I don't
> necessarily thing the explanation I give above is good enough to be used
> as the reason we have the forEach interface.

Ok, let us discuss that. Editorial note?

> 
>> ---- Data Parser interface.
>> 
>> The current parser is defined for a DOM-like parser, eRDF, Microdata,
>> whatever. But I would like to be able to have a turtle parser that
>> takes a URI as an argument, rather than an Element. Would that be
>> possible to do in WebIDL? In any case, it would really be good to
>> have that extension point to any type of parsing...
> 
> This is ISSUE-44:
> 
> http://www.w3.org/2010/02/rdfa/track/issues/44
> 
> We have some ideas for making this happen... dangerous ideas that are
> bound to scare people. :)

Ok, let us leave that open for now.

> 
>> I am not sure what the role of the store is for a DataIterator. They
>> way I understand it:
>> 
>> - parse puts all the triples into the Store and then one used the
>> DataStore interface - iterator just gives you the triples one after
>> the other. The user 'may' decide to add it to a store, of course, but
>> that is outside the realm of the iterator, isn't it?
> 
> Yes, that's correct.
> 
>> If so, I actually wonder whether the DataParser and the DataIterators
>> are not two completely different interfaces, for different usages and
>> it may be better to separate them altogether.
> 
> Perhaps... the division is fairly awkward at present. The idea is that
> the DataParser has two modes of operation - read-and-store and
> stream-and-discard:
> 
> parse() -> process the document and store every triple (read-and-store)
> iterate() -> stream triples as they are found (stream-and-discard)
> 
> The first requires quite a bit of memory, the second is far more memory
> efficient. Think desktop vs. smartphone.

And we had this issue before, and you convinced me about the necessity of having something like iterate. I do not dispute that. But having that 'two modes of operation' seems to be very convoluted to me.

As I can see it the iterate approach constitutes a completely separate model of operation and programming. One has an iterate, has the basic RDF interfaces like triples and... that is it? Nothing of the store and the related stuff, no property groups, nada. Right? Well then, let us make that explicit, maybe even a completely separated top level section. What is there right now is utterly confusing to me:-(

[snip]

> 
>> ---- Property group interface.
>> 
>> Editorial issue: the property group template section comes a bit out
>> of the blue, because the query is defined later. I would expect this
>> section to be moved down to the definition of Data Query...
> 
> Unfortunately, if we move it down there, people may not understand that
> Property Groups are meant to be language-native containers for Linked
> Data. Perhaps we need a better introduction to that section so it
> doesn't come from out of the blue? Would that address your concern, Ivan?

I guess... I would have to see what you mean...

[snip]

> 
>> ---- That may be a stylistic issue: isn't it more logical to have the
>> getItemsBy*** methods defined on the DocumentData interface rather
>> than the RDFaDocument? 
>> After all, those can be considered as
>> shorthands for specific query methods. I may also move the
>> getElementsBy* methods there for symmetry, though they are closer to
>> the 'usual' DOM methods.
> 
> The reason those methods are on RDFaDocument is because RDFaDocument is
> a supplemental interface to DOM Document. In other words, we expect
> anybody that implements the RDFa API in a DOM environment to implement
> those interfaces on the DOM Document object... this is so people can do
> stuff like this:
> 
> document.getItemsByType(...);
> 
> which is supposed to parallel calls like this:
> 
> document.getElementsById(...);
> 
> You could say it's stylistic... we could move all the document data
> calls to document.data, or we could get rid of those calls entirely.
> IIRC, Mark felt strongly about this and I tend to agree with him, but
> feel less strongly about it. I don't know how Benjamin feels about these
> interfaces being on Document vs. DocumentData.
> 

I did _not_ question the necessity of having those methods somewhere...

> I've left them for now until we get more feedback.

B.t.w., the tabulation of the RDFaDocument interface seems to have problems...

Ivan


> 
> Thanks for the thorough review, Ivan - it really helped a bunch :)
> 
> I'll publish a new Heartbeat-ready Working Draft in a few minutes.
> 
> -- manu
> 
> -- 
> Manu Sporny (skype: msporny, twitter: manusporny)
> President/CEO - Digital Bazaar, Inc.
> blog: Saving Journalism - The PaySwarm Developer API
> http://digitalbazaar.com/2010/09/12/payswarm-api/
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Attachments

application/pkcs7-signature attachment: smime.p7s
Received on Monday, 13 September 2010 09:15:09 UTC