Re: An RDFa API Diagram and some questions... from Ivan Herman on 2010-06-02 (public-rdfa-wg@w3.org from June 2010)

From: Ivan Herman <ivan@w3.org>
Date: Wed, 2 Jun 2010 09:36:32 +0200
To: Manu Sporny <msporny@digitalbazaar.com>
Cc: W3C RDFa WG <public-rdfa-wg@w3.org>
Message-Id: <572EEB2D-5C8C-457F-AD81-2C22AE6A8DBE@w3.org>
On Jun 2, 2010, at 04:37 , Manu Sporny wrote:

> On 06/01/2010 09:54 AM, Ivan Herman wrote:
>> So, I spent some time today trying to get my head around the mutual
>> dependencies of the current interfaces. I am of a visual type, so I
>> created a diagram. I attach this diagram; I am _not_ saying that this
>> has to be put into the document, but it certainly helped me to grasp
>> the dependencies and maybe it is useful for others.
> 
> I think it would be useful if we put this diagram (or something very
> close to it) into the RDFa DOM API document. Whether we do this before
> FPWD or after isn't that important (to me, at least)... just that we put
> the diagram in there. It really does help show which interfaces are related.

You know, no good deed goes unpunished:-) I am happy to keep this diagram updated for the document now and in future. If it is useful for others, all the better!

> 
>> Looking through the structure I also discovered two issues/questions.
>> 
>> - The DataIterator interface is fairly 'isolated' on the diagram; at
>> the moment it is used only by the Parser (unless I missed something).
> 
> Nope. It's only used by the DataParser.
> 
>> First of all, I am not sure what the role of the 'iterator' method is
>> on the parser interface (I have asked that before).
> 
> Being pedantic - it's the .iterate() method, not the .iterat*or*()

Sorry:-)

> method. Benjamin does a good job of clarifying what this method is meant
> to do in a reply to your original e-mail.
> 
> Basically - it's a less memory intensive way to extract triples from the
> page.

Right. But my unease is that it, sort of, bypasses the whole mechanism we have in place. 

The current model is based on the chain of Document->Store, a parser that parses the source into the store, and then we dig into the store for triples via various methods (query, etc). This is clean. But, if my understanding is correct, this iterate method operates on the original source and gives back RDF triples. Ie, it completely bypasses the Document->Store chain. Putting it another way, we actually have, conceptually, _two_, completely orthogonal mechanism within the the same API to access to the same triples! This bothers me.

The claim is that there might be implementation environments where some special mechanism might be needed to make things faster. If the DataIterator is a way to hide this to the user, then (as I propose it below) maybe using it as a return for the filter operation may be helpful and that may help implementations to optimize their stuff (I am not sure, actually, whether that works, I would have to think about it more)

All that being said: do we really have to optimize? In the vast majority of the cases the RDF graph extracted from a file is small. Have we ever seen an RDFa file yielding more than cca 100 triples? Ok, maybe 200? Compare the computing/storage need of this with, say, a flash content running on a phone! Ie, I wonder whether an optimization at that level is more than an academic issue (sorry Benjamin, no offense intended!)

> 
>> Also, looking at
>> other interfaces I realized that the DataStore.filter method returns
>> a DataStore. Wouldn't it be more logical if filter returned an
>> iterator instead (my understanding is that the DataIterator.next()
>> can be used to get to the next triple)?
> 
> Perhaps... the rationale was that .filter() can transform one DataStore
> into another (more specific) one. .iterate() just runs through a
> document - no dependency on having a data store of any kind.
> 

Exactly; and see my comment above.

> You could have .filter() return an iterator, but it would be an iterator
> over a datastore - no memory would be saved since all triples are coming
> from a pre-build data store.

True. See my remark above: do we need this?


> 
> That said, I can understand your viewpoint...
> 
>> - I wonder about the rdfa interface (section 5.3.4). While there is a
>> fairly logical relationship now with
>> 
>>    Document->DocumentData->{store,context, etc}
> 
> That section is marked for deletion... we don't need .getElements() at
> all. I'm not certain we need .containsRDFa() - but Benjamin had a few
> use cases for that particular item. I haven't thought about where to
> move it yet, but the general desire is that that "rdfa." interface
> should disappear completely. We just need to decide on whether or not we
> should remove .containsRDFa() completely, or keep it around. I'd like to
> remove it - you don't need to check to see if there is/isn't RDFa. You
> do your query and if there is no data, there is no RDFa and thus you do
> nothing in your Javascript.

First of all, I think this is something we should decide before the FPWD. Also, if we decide to keep the containsRDFa(), then this can go to the Document interface, in my view, just as we have some other convenience methods there.

Whether we need it... At the moment there is no compulsory mechanism to signal the presence of RDFa in an XHTML file. Put it another way, conceptually all XHTML files are also RDFa files because the usage of DTD, of @version, etc, are all optional. And, actually... 99% of XHTML files _will_ have RDFa content in them, because as soon as they use a CSS style sheet, the corresponding <link> element will generate a triple with the default 'stylesheet' @rel term value.

Which all means that I agree with you that the containsRDFa is not necessary or useful. As I said, in 99% of the cases it will return 'true'.

Alternatively: it may be useful to have a size() method on a store, returning the number of triples. And then 

containsRDFa() <=> size()==0

Ivan


> 
> -- manu
> 
> -- 
> Manu Sporny (skype: msporny, twitter: manusporny)
> President/CEO - Digital Bazaar, Inc.
> blog: Bitmunk 3.2.2 - Good Relations and Ditching Apache+PHP
> http://blog.digitalbazaar.com/2010/05/06/bitmunk-3-2-2/2/
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Attachments

application/pkcs7-signature attachment: smime.p7s
Received on Wednesday, 2 June 2010 07:35:29 UTC