RDFa API proposed changes & IDL from Nathan on 2010-09-19 (public-rdfa-wg@w3.org from September 2010)

From: Nathan <nathan@webr3.org>
Date: Sun, 19 Sep 2010 23:33:36 +0100
To: Manu Sporny <msporny@digitalbazaar.com>
CC: RDFA Working Group <public-rdfa-wg@w3.org>
Message-ID: <4C968FC0.6070100@webr3.org>
Hi Manu,

As per earlier notes and voice conversations, here are the changes I'm 
proposing along with full notes and reasons for each.

interface DataContext {
     void setMapping (in DOMString prefix, in DOMString iri);
     TypedLiteralConverter registerTypeConversion (in DOMString iri, in 
TypedLiteralConverter converter);
     IRI  resolveCurie (in DOMString curie);
     any  convertType (in DOMString value, in optional DOMString inputType);
     DataParser registerParser( in DOMString name, in DataParser parser );
     DataParser getParser( in DOMString name );
}

[NoInterfaceObject Callback]
interface DataParser {
     boolean parse (in any toparse, in DataStore store);
};

[NoInterfaceObject Callback]
interface TypedLiteralConverter {
     any convert (in DOMString value, in optional IRI inputType);
};

notes:
removed "modifier" from convertType and TypedLiteralConverter.convert
changed return of registerTypeConversion from void to TypedLiteralConverter.
added `registerParser` method to DataContext
added `getParser` method to DataContext
removed DataParser.iterate
removed `domElement` argument from DataParser.parse
added `any toparse` argument to DataParser.parse
added `store` argument to DataParser.parse

TypedLiteralConverter usage:
function myStringIntConverter(value,inputType) {
   // implementation which returns a string representation
   // of an int rather than a native int
}
var context = document.data.context;
// get the native converter and register a custom one
var nativeConverter = context.registerTypeConversion('xsd:int', 
myStringIntConverter);
// work with data
context.registerTypeConversion(nativeConverter); //return to native mode

The above should handle any use case which the modifier parameter could 
handle.

Parser changes:
Formerly there way no way to register new or custom parsers in the API 
meaning usage was limited to only those parsers natively implemented.
The iterate method made no sense on DataParser and was of no concern to 
the parser.
DataParser single parameter was incompatible with almost everything as 
it was set to Element (Document extends Node not Element) thus could not 
be used to parse a document, also could not be used to implement parsers 
for any other type of (non-dom) document in the future, if DOM 
constraint is needed it could be changed to accept `Node node` which 
would be both Element and Document compatible.
DataParser usage required instantiation from DocumentData with no clear 
way to select which DataStore you wanted to parse in to, removing the 
`store` attribute and adding a `store` property to the `parse` method 
allow clear, easy reuse of a parser with user-friendly store selection.
see DocumentData for additional proxy method.
Optionally the Callback extended attribute could be removed if this 
caused problems in different languages.


[NoInterfaceObject]
interface DataStore {
     readonly attribute unsigned long legnth;
     RDFTriple        get (in unsigned long index);
     void             add (in RDFTriple triple);
     DataStore        filter (in RDFTripleCallback filter);
     void             forEach (in RDFTripleCallback callback);
     void             import(in DataStore store);
     boolean          some(in RDFTripleCallback callback);
     boolean          every(in RDFTripleCallback callback);
     DataIterator     iterator();
};

[NoInterfaceObject, Callback, Null=Null]
interface RDFTripleCallback {
     boolean match (in RDFTriple triple, in unsigned long index, in 
DataStore store);
}

Notes:
changed `size` to `length` to align with ECMAScript and most languages 
(array.length)
removed `create**` methods (they don't belong here see DocumentData 
changes for reasons)
changed return type of `add` to void as it is impossible to ever return 
boolean False. (should add return unsigned long, the index?)
removed `getter` property of get method to remove indexed sequence 
functionality (see notes further down)
removed `pattern` from filter(), all functionality this could implement 
can be handled by DataQuery or by RDFTripleFilter, `pattern` allows for 
non-standardized implementation specific functionality to be introduced, 
in fact it forces this to happen, best left to libraries if they want to 
provide it (even though it isn't needed).
removed redundant `element` from filter - no need/use.
made `RDFTripleFilter filter` non optional
removed `clear` method, it isn't needed
changed `merge` method to `import` to make it clearer
changed return type of `import` (formerly `merge`) to void as it was 
impossible to ever return boolean False.
added quantification methods .some and .every
added `iterator` method, could be named `iterate` however feel best to 
align with common naming conventions, and given the meaning of iterator 
"one which iterates".
removed RDFTripleFilter and replaced with RDFTripleCallback, 
functionality is the same and is used in .filter, .forEach .some and 
.every (for all callbacks), the method takes 3 params, the RDFTriple, 
the index and the store which holds the triple, also aligned to match 
ECMAScript v5.
removed DataStoreIterator and replaced with universal RDFTripleCallback, 
formerly DataStoreIterator accepted 4 params, index, subject, property, 
object however this meant that the triple could not be accessed (for 
instance to use the .toString() method) and if using any other method in 
the api such as DataStore.add() a new RDFTriple would have had to be 
created. Changing the method to accept RDFTriple as param one exposes 
more functionality, saves the user code, and aligns with the rest of the 
API.


Why move create** methods:
Simply, they don't belong here, it is of no concern to a DataStore how 
IRI, RDFTriple, PlainLiteral etc are created. More technically having 
the create** methods on DataStore decoupled them from the default 
context which leads to unexpected behaviour, cross cutting concerns and 
a difficult implementation. Whereas if they are located on the 
DocumentData interface they are always coupled to a specific context, 
behaviour is expected, dependency is clear and cross cutting concerns 
are removed. (DocumentData can "see" .context, DataStore cannot see 
DataContext in any way).

Why remove indexed behaviour:
Indexed behaviour and array accessors [] raised an important issue over 
expected functionality, namely setting by index `store[23] = triple;`, 
in one case this meant editing the contents of the graph (something 
which couldn't be implemented with user expected functionality - i.e. 
changing the document source) and in another case it would make it 
almost impossible to prevent duplicates being added. From an 
implementation perspective it would be impossible in many languages, 
including ECMAScript to implement. Instead the DataStore should be 
treated like a typed int hash IntHash<RDFTriple> which nicely wraps the 
array/collection hiding any inadvisable methods; this can easily be 
implemented in any language. Iteration is still possible in two common 
forms:
   for(i=0;i<store.length;i++) {
     triple = store.get(i);
   }
and
   iterator = store.iterator();
   while((triple = iterator.next()) !== null) {
     // work with triple
   }
and of course there's the callback interface .forEach too.

Optional:
One could also add a `sequence<RDFTriple> toArray()` method to DataStore 
but this may be unneeded - I'm easy, it's an easy hit if it has benefits 
in other languages.


interface DocumentData {
     attribute DataStore   store;
     attribute DataContext context;
     attribute DataQuery   query;
     IRI           createIRI (in DOMString iri);
     PlainLiteral  createPlainLiteral (in DOMString value, in optional 
DOMString? language);
     TypedLiteral  createTypedLiteral (in DOMString value, in DOMString 
type);
     BlankNode     createBlankNode ();
     RDFTriple     createTriple (in RDFResource subject, in IRI 
property, in RDFNode object);
     DataContext   createContext ();
     DataStore     createStore ();
     DataQuery     createQuery (in DOMString type, in DataStore store);
     boolean       parse(in any toParse, in optional DataStore store, in 
optional DataParser parser);
};

Notes:
Added the create** methods from DataStore (as per notes above)
removed the `node` param from createIRI
removed `type` param from createStore
removed `parser` property
removed `createParser` method
removed `name` parameter from `createBlankNode`
added `parse` method

Notes:
`node` param on createIRI was legacy/editorial error and needed removed
`type` param on createStore was unused (if this is wrong, ad it back in)
`name` param on createBlankNode was unneeded, introduced a possible 
exception to BlankNode construction and encouraged people to think of 
blanknode names as dependable in some way, the unique per context "name" 
of a BlankNode can be retrieved by calling BlankNode.toString, or the 
blank node itself can be passed around when referring to it multiple 
times (such as using the same blank node as the object of one triple, 
and the subject of another).

More on Parser changes:
see earlier in the document, these changes make the `parser` attribute 
redundant in addition to the createParser method (which made it 
impossible to use custom parsers).
a new method has been added `parse` which only requires a single 
argument, namely what to parse.
An optional `store` parameter has been added which allows one to specify 
which store to parse in to without modifying the default value of 
DocumentData.store, similarly a `parser` parameter has been added which 
allows one to execute custom parsers easily.
This adds the constraint that all RDFa API implementations must support 
the default parser "rdfa1.1" which is used when 
document.data.parse(document) is called and no custom parser is handed in.
Optionally the `toparse` argument on this method could be changed to 
`Node node` to require usage within a DOM environment, whilst still 
supporting non-DOM environments via DataContext.getParser


[NoInterfaceObject]
interface DataIterator {
     readonly attribute DataStore store;
     boolean hasNext();
     RDFTriple next();
};

changed `store` attribute to readonly (changing the store at runtime 
will produce unexpected results)
removed attribute `root`, unused
removed attribute `filter`, no need, constrains DataIterator to only be 
used with DataStore.filter method
removed attribute `triplePattern`, redundant/unused
added hasNext() method to make iteration easier and prevent users making 
double calls to `next` by accident / getting confused.
usage:
   while(iterator.hasNext()) {
     triple = iterator.next();
   }
basic ECMAScript Datastore.iterator implementation:
   Datastore.prototype.iterator = function() {
     return {
       cur : 0,
       store : this,
       hasNext: function() { return cur < store.length; },
       next: function() { return store.get(cur++); }
     };
   }

Final change is simply to change RDFTriple.size to RDFTriple.length to 
align with most languages and common usage.

Best,

Nathan
Received on Sunday, 19 September 2010 22:34:17 UTC