Re: implied datasets

* [2011-05-23 11:34:56 -0400] glenn mcdonald <glenn@furia.com> écrit:

] It seems to me that this is another demonstration of confusion that wouldn't
] happen if we all understood RDF IDs to be pure identifiers that belong to
] the graph representation of a dataset and nothing else. ISSN numbers are not
] graph-node IDs, they are real-world conceptual identifiers like social
] security numbers or SKUs or country codes. Many different data-structure
] might reference them in very different ways, so it should be fairly clear
] that they cannot uniquely identify anything but themselves, and thus they
] should themselves be represented in RDF as nodes. So the above should be
] more like:

Hi Glenn,

That may be so but it misses the point. The point is there is a field,
be it a URI or a literal however modelled, that can be used to join
between two datasets. This join field is "hidden" in that there exists
no (known) dataset that contains all possible values it can take on.

So you have a situation when you are trying to describe datasets where
you can say that DS1 and DS2 are indirectly linked and you want to
make that link explicit so that you can put it on diagrams ans such.

Saying,

  DS1 indirectlyLinkedTo DS2

is no good because then you get O(n^2) such statements which makes
your visualisation messy and furthermore you don't know without
examining them that they have any common values on the join field so
they may not actually be linked except in a degenerate sense.

Inventing a dataset that contains only the join field lets you say
something useful and coherent about the relationship between DS1 and
DS2.

There is nothing in this that requires the datasets themselves to be
RDF. See my other post to ckan-discuss on the same topic expressed in
terms of the relationships between CSV files.

Cheers,
-w
-- 
William Waites                <mailto:ww@styx.org>
http://river.styx.org/ww/        <sip:ww@styx.org>
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45

Received on Monday, 23 May 2011 16:58:51 UTC