Skippr "The RDF Navigation Server"; Call for Contributors, Subject experts and Comments

Skippr: "The RDF Navigation Server"
http://code.google.com/p/skippr/

----
DISCLAIMER: This is a "projected" project ( to be born in a couple of days ).
The idea behind this initiative is to "fill a gap" that becomes
evident as soon as you start pushing RDF into a user agent.
The reason for publishing this early brainstorming is to get feedback
and identify possible contributors for the key areas.
Please, read along.
----

Just like a skipper commands a vessel navigating across the seas,
Skippr commands a user agent's navigation through RDF datasets and the
World. It sits between the user agent and the Giant Global Graph,
providing cross cutting services that are necessary for a
comprehensive browsing experience.

By user agent I mean not only HTML browsers, but also mobile and even
multimodal agents ( ie voice ).

As a "Navigation Server", Skipper will provide the following basic services

Sessions
Delegated Authentication
Pluggable Trust Framework
User Interface Engine ( Fresnel )
Schema Inference and IFP Smushing
Guided Navigation Engine ( Facets )
Linked Data ( On Demand Scuttering )
Free Text Search Federation ( Sindice )
Provenance
SPARQL Endpoint Management and Federation

A Skippr server can be deployed in an open web scenario, a closed
intranet scenario, or even locally to be consumed by any platform that
presents RDF data to human users. In the future, for example, a
browser like Mozilla Firefox could be equiped with an integrated
Skippr engine and provide a XUL UI on top of it.
A cellphone with a small Java ME application showing nearby
restaurants from the GGG could go through a skippr server running
somewhere on the cloud ( possibly a trusted provider configured by the
user ). Skippr will take care of free text searching, smushing and
facet extraction to provide the user with a smooth experience,
regardless of the device's limitations.

The important thing here is not the deployment model but the
consolidation of a package that can be used to quickly RDF-enable
different user agents in a consistent manner.

While the amount of functionality to implement may seem prohibitive at
first glance ( yikes! ) these are all problems that *currently* stand
in the way between RDF and user agents. If we solve them in a
collaborative manner we would bring the following immediate benefits
to the community:

- Alignment between different working groups
- Increase uptake by lowering entry barrier and providing a higher
level framework that "makes sense" and "works"
- Standardization of vocabularies for basic concepts like agent,
session, facet, scutter, free text index, etc.
- Standardization or at least general consensus about possible
solutions to conflictive use cases, like the use of inference.
- Somewhere to plug ACLs and Trust when the time comes.
- Avoid an explosion of different Browser behaviours, UI engines,
navigations engines, etc.

Bottomline, Skippr is an umbrella project that seeks to integrate
existing efforts and avoid replication across different RDF user agent
teams ( which I presume are growing exponentially ). I hereby make an
open invitation to all members of the community that are currently
undertaking such projects to join the discussions on how the various
topics should be addressed.

BTW, I know what some of you algebrains are thinking:

"SPARQL Endpoint Management and Federation"??!!!  is this guy nuts?.

Well, not really. The idea here is not to create a full blown
distributed sparql engine, but rather a small 80/20 solution that
allows small queries to be distributed over a set of endpoints and a
linked data subset of the GGG. Remember that this is a "navigation
server" aimed at serving "directed navigational user agents". This
means that queries can be restricted in size and complexity. The
capacity of a session could be restricted as well.

Therefore, *scale* is an important simplifying factor to keep in mind
when designing skippr:

"Skippr is intended for serving HUMAN navigation only. It cares only
about a small portion of the GGG at a time".

I created a project page at Google code. No code uploaded yet, as I
would like suggestions on package naming because I feel this project
is sufficiently important to be community driven right from the
beginning.
( Is there any community domain that I could use? org.semanticweb.skippr? )

The first steps will be:
- Define a simple data model ( one sparql endpoint for now )
- Integrate a Fresnel Engine that operates on the data and publishes
services as RESTful RDF
- Work on a generic faceting engine that operates atop a SPARQL
endpoint and provides "guided navigation" services.

I have no particular predilection for the aforementioned topics, but I
happen to need them rather soon. Linked Data would be the next on my
list. But you can start working on the other topics if you wish.

You will find notes for both these services on the wiki ( @gcode ).
Other topics have been seeded as well.

Seed code will be up sometime NEXT WEEK. Don't expect much, I am not a
full time coder... I like to have a life every now and then ;)
I will provide a skeleton with the basics and hope to integrate a
fresnel engine and layout the code to begin working on facets.

Technicalities:
- Java 6
- Sesame 2 Final
- Restlet
- RESTful RDF ( have you heard of RDF?  funny looking stuff... it
stands for Really Difficult Format )




I copy the texts below as they stand today in the wiki


Sessions

While some user agents may have unlimited memory to store data as they
browse, others ( like cell phones and PDAs ) may only be able to store
a few MBs of data. A browsing session on any subset of the GGG will
most probably require significant memory as schema and instance data
are downloaded and accumulated.
To solve this issue, Skippr should provide the agent with a transient
"Session" data space that keeps and smushes data during the span of a
browsing session.



Delegated Authentication

Some SPARQL endpoints will most probably be secured. It makes sense to
consider adding an authentication ( SSO style ) feature to Skippr. It
makes even more sense to take a look at something like OpenID and see
if there is any overlapping. I haven't given this much thought as for
now I am either accessing public endpoints or private endpoints behind
a firewall.


Pluggable Trust Framework

While trust on a web-wide level is far from being solved, there will
most probably be more than one approach to solve the problem ( trust
services, corporate whitelists, etc ). Skippr will therefore only
define the contract, and provide a hook for, a trust framework.
A generic whitelist/blacklist policy framework should be provided.
Of course it will be RDF based... just like everything else in Skippr.


User Interface Engine ( Fresnel )

The available fresnel engines ( simile's and jfresnel ) provide only
Java APIs and require developers to "glue" them into their browsers.
Skippr will expose generic RESTful services that provide clients with
different formats ( XML and RDF ) of the different outputs generated
by Fersnel pipeline running on top of their live session data.

A user agent should be able to ask the query
"How am I supposed to present this resource?"
And get an RDF or XML response from the server without much thought.


Schema Inference and IFP Smushing

Skippr should provide different built-in configurations for RDFS and
OWL inference.  While reasoning is definitely a complex topic, some
minor inferences are not only necessary for smushing ( which is
critical when handling linked data ) but also for Facet generation and
correct UI selection ( specially class-subclass and
property-subproperty closures ).

Perhaps a backward-chainer would be enough for computing these
closures when evaluating FSL expressions or computing hierarchical
facets.


Guided Navigation Engine ( Facets )

Faceted browsing has proved extremely effective for dealing with mixed
and unknown RDF data. Unfortunately it is only available as part of
specific browsers operating on closed RDF datasets.
The goal of the Skipper/Faceter subproject is to provide a generic
Faceter engine operating over the user agent's current session. This
should include harvested Linked Data as well as a set of SPARQL
endpoints.

For now we will limit the engine to a single SPARQL endpoint. The
computation of Facets over multiple datasets is something that has to
be studied. In fact, it is not only about "adding up" the facets
computed on each dataset. It will definitely touch upon distributed
querying.

( or not? )



Linked Data ( On Demand )

The "Linking Open Data" initiative protocols are being widely deployed
and they hold great promeses for the immediate uptake of the Semantic
Web. Skipper should provide a built in and configurable scutter
capable of automatically harvesting RDF data as directed navigation
happens.
Scuttering policies should be configurable BUT we should emphasize
some generic convention.
Again, delegated auth seems necessary here.



Search Federation ( Sindice et al. )

RDF Indexes should provide a generic, discoverable interface in a
standardized vocabulary.  If this happens, Skippr will understand them
and provide integrated free text search to clients.


Provenance

Skippr should manage and provide provenance data for client
introspection.  Ideally, it should hide "quads" and only present
triples to clients while still allowing them to perform queries like
"Who states that foo:bar costs US$5??" as well as retract complete
sources.

The mythical "Oh Yeah!" button should be somewhere down this road.


SPARQL Endpoint Management and Federation

Duh, I just implemented a federated sparql engine on a lightweight
platform and... it was not fun!
Somebody else please.



Have a nice Weekend,
Al

Received on Saturday, 12 January 2008 09:51:57 UTC