Navigating Data (was Re: Take2: 15 Ways to Think About Data Quality (Just for a Start) )

Hi,

Changed subject line to match topic:

On 15 April 2011 14:47, glenn mcdonald <glenn@furia.com> wrote:
> This reminds me to come back to the point about what I initially
> called Directionality, and Dave improved to Modeling Consistency.

> ...
> - But even in RDF, directionality poses a significant discovery
> problem. In a minimal graph (let's say "minimal graph" means that each
> relationship is asserted in only one direction, so there's no
> relationship redundancy), you can't actually explore the data
> navigationally. You can't go to a single known point of interest, like
> a given president, and explore to find out everything the data holds
> and how it connects...

Doesn't this really depend on how the navigational interface is constructed?

If we're looking purely at Linked Data views created using a Concise
Bounded Description, then yes I agree, if there are no "back links" in
the data, then navigation is problematic.

But if we use different algorithms to describe the views, or
supplement it with SPARQL queries, then those navigational links can
be presented, e.g. "other resources that refer to this resources".

I think as you noted elsewhere inverse links could also be inferred
based on the schema. This simplifies the navigation UI as the links
are part of the data.

> ...You can explore the *outward* relationships from
> any given point, but to find out about the *inward* relationships you
> have to keep doing new queries over the entire dataset.

Yes.

> ...The same basic
> issue applies to an XML representation of the data as a tree: you can
> squirrel your way down, but only in the direction the original modeler
> decided was "down". If you need a different direction, you have to
> hire a hypersquirrel.

Well an XML node typically has a reference to its parent (it does in
the DOM anyway) so moving back up the tree is easy.

> - Of course, most RDF-presenting systems recognize this as a usability
> problem, and address it by turning the minimal graph into a redundant
> graph for UI purposes. Thus in a data-browser UI you usually see, for
> a given node, lists of both outward and inward relationships. This is
> better, but if this abstraction is done at the UI layer, you still
> lose it once you drop down into the SPARQL realm. This makes the
> SPARQL queries harder to write, because you can't write them the way
> you logically think about the question, you have to write them the way
> the data thinks about the question. And this skew from real logic to
> directional logic can make them *much* harder to understand or
> maintain, because the directionality obscures the purpose and reduces
> the self-documenting nature of the query.

Assuming you don't materialize the inferences directly in the data,
then isn't the answer to have both the SPARQL endpoint and the
navigational UI use the same set of inferred data?

> All of this is *much* better, in usability terms, if the data is
> redundantly, bi-directionally connected all the way down to the level
> of abstraction at which you're working. Now you can explore to figure
> out what's there, and you can write your queries in the way that makes
> the most human sense. The artificicial skew between the logical
> structure and the representational structure has been removed. This is
> perfectly possible in an RDF-based system, of course, if the software
> either generates or infers the missing inverses. We incur extra
> machine overhead to reduce the human congnitive burden. I contend this
> should be considered a nearly-mandatory best-practice for linked data,
> and that propogating inverses around the LOD cloud ought to be one of
> things that makes the LOD cloud *a thing*, rather than just a
> collection of logical silos.

The same problem exists on the document web: it can be useful to know
what links to a specific page. There are various techniques to help
address that, e.g. centralized indexes that can expose more of the
graph (Google) or point-to-point mechanisms for notifying links (e.g.
Pingback, etc).

With RDF system we may be able to infer some extra links, buth with
Linked Data we can't infer all of them, so we have the same issue and
can deploy very similar infrastructure to solve the problem.

Currently we have SameAs.org, which is specialized for one type of
linking, but it'd be nice to see others [1]. And there have been
experiments with various pingback/notification services for Linked
Data. Are any of the latter being widely deployed/used?

Cheers,

L.

[1]. http://www.ldodds.com/blog/2010/03/predicate-based-services/

-- 
Leigh Dodds
Programme Manager, Talis Platform
Mobile: 07850 928381
http://kasabi.com
http://talis.com

Talis Systems Ltd
43 Temple Row
Birmingham
B2 5LS

Received on Thursday, 28 April 2011 11:04:26 UTC