- From: Bob Ferris <zazi@elbklang.net>
- Date: Fri, 15 Apr 2011 16:54:03 +0200
- To: public-lod@w3.org
Hi Glenn, thanks a lot for your insightful thoughts. I think, I can fully agree to them. This topic reminds me a bit of a question I stated some time ago on SemanticOverflow (now answers.semanticweb.com): "When should I use explicit/anonymous defined inverse properties?" [1] (btw, this question is still not marked as "answered" ;) ) Cheers, Bob [1] http://answers.semanticweb.com/questions/1126/when-should-i-use-explicitanonymous-defined-inverse-properties On 4/15/2011 3:47 PM, glenn mcdonald wrote: > This reminds me to come back to the point about what I initially > called Directionality, and Dave improved to Modeling Consistency. > > Dave is right, I think, that in terms of data quality, it is > consistency that matters, not directionality. That is, as long as we > know that a president was involved in a presidency, it doesn't matter > whether we know that because the president linked to the presidency, > or the presidency linked to the president. In fact, in a relational > database the president and the presidency and the link might even be > in three separate tables. From a data-mathematical perspective, it > doesn't matter. All of these are ways of expressing the same logical > construct. We just want it to be done the same way for all > presidents/presidencies/links. > > But although directionality is immaterial for data *quality*, it > matters quite a bit for the usability of the system in which the data > reaches people. We know, for example, that in the real world > presidents have presidencies, and vice versa. But think about what it > takes to find out whether this information is represented in a given > dataset: > > - In a classic SQL-style relational database we probably have to just > know the schema, as there's usually no exploratory way to find this > kind of thing out. The RDBMS formalism doesn't usually represent the > relationships between tables. You not only have to know it from > external sources, but you have to restate it in each SQL join-query. > This may be acceptable in a database with only a few tables, where the > field-headings are kept consistent by convention, but it's extremely > problematic when you're trying to combine formerly-separate datasets > into large ones with multiple dimensions and purposes. If the LOD > cloud were in relational tables, it would be awful. Arguably the main > point of the cloud is to get the data out of relational tables (where > most of it probably originates) into a graph where the connections are > actually represented instead of implied. > > - But even in RDF, directionality poses a significant discovery > problem. In a minimal graph (let's say "minimal graph" means that each > relationship is asserted in only one direction, so there's no > relationship redundancy), you can't actually explore the data > navigationally. You can't go to a single known point of interest, like > a given president, and explore to find out everything the data holds > and how it connects. You can explore the *outward* relationships from > any given point, but to find out about the *inward* relationships you > have to keep doing new queries over the entire dataset. The same basic > issue applies to an XML representation of the data as a tree: you can > squirrel your way down, but only in the direction the original modeler > decided was "down". If you need a different direction, you have to > hire a hypersquirrel. > > - Of course, most RDF-presenting systems recognize this as a usability > problem, and address it by turning the minimal graph into a redundant > graph for UI purposes. Thus in a data-browser UI you usually see, for > a given node, lists of both outward and inward relationships. This is > better, but if this abstraction is done at the UI layer, you still > lose it once you drop down into the SPARQL realm. This makes the > SPARQL queries harder to write, because you can't write them the way > you logically think about the question, you have to write them the way > the data thinks about the question. And this skew from real logic to > directional logic can make them *much* harder to understand or > maintain, because the directionality obscures the purpose and reduces > the self-documenting nature of the query. > > > All of this is *much* better, in usability terms, if the data is > redundantly, bi-directionally connected all the way down to the level > of abstraction at which you're working. Now you can explore to figure > out what's there, and you can write your queries in the way that makes > the most human sense. The artificicial skew between the logical > structure and the representational structure has been removed. This is > perfectly possible in an RDF-based system, of course, if the software > either generates or infers the missing inverses. We incur extra > machine overhead to reduce the human congnitive burden. I contend this > should be considered a nearly-mandatory best-practice for linked data, > and that propogating inverses around the LOD cloud ought to be one of > things that makes the LOD cloud *a thing*, rather than just a > collection of logical silos.
Received on Friday, 15 April 2011 14:54:33 UTC