data vs. information (Was Re: implied datasets)

* [2011-05-23 18:19:49 +0000] Hugh Glaser <hg@ecs.soton.ac.uk> écrit:

] I won't go into whether the April Fool's joke of the integers might
] actually be useful (note that dbpedia has quite a lot of URIs for numbers),
] but there will be many other "standard" URIs for things that we take for granted.
] The recent colour ones might seem like a joke as well, but perhaps not?

I had this a little bit in mind when I wrote the original mail, and this 
goes nicely to some related thoughts about quality.

The thing with the linked open numbers is that it makes the point
pretty neatly I think that it is silly to try to materialise
everything that can be stated in RDF. A small computer program that
describes numbers might have the same information content as all of
those numbers made manifest. And it would take up a lot less disk
space and be much faster to query. But you could still use it to refer
to the numbers when you needed to.

Is this always the case? It seems to be the tradeoff is speed
vs. space. For some aspects of numbers this makes sense (e.g. their
representation in roman numerals) but what about computationally
expensive things like their prime factors? This quickly becomes too
expensive to calculate on the fly but actually a lookup service could
make a certain amount of sense...

] My favourite at the moment is
] http://data.totl.net/chess/state/rnbqkbnr_pppppppp_8_8_8_8_PPPPPPPP_RNBQKBNR_w_KQkq_-_0_1
] A very large number of URIs that describe chess positions.
] And tells you things like the next legal move in RDF.
] 
] So if I had loads of games in RDF, I could reliably do some fun queries
] about games with move sequences, etc.
] 
] Seems to me it is very similar to William's requirements.

Oh, that's beautiful.

] However, it does it slightly differently, by having resolvable URIs for the
] positions, which can easily go to the more conventional representations.

And this works because the service has a compact representation of the
space of all possible positions and moves, a small computer
program. You can then materialise the small subspace that you're
interested in and run some analysis on it.

Its nice that totl wants to run that program for me but I guess they
could just as easily give me the program and let me run it myself. Bit
for bit they would have given me far more information and far less
data. But then it might be more convenient for me to use their service
if I only have a relatively small number of positions/moves to
consider. Might the service be useful for a program to help study how
to play chess? Quite possibly. Would it make sense to build a
chess-playing computer on top of their service? It would be
interesting to see but I suspect network traffic and delays would be
prohibitive.

Its the same story with the trend of taking CSV files, a pretty
compact and easy to work with representation of tabular data, and
expanding them into giant RDF datasets that take up a lot of disk
space and are cumbersome to query. A service to refer to a cell in a
spreadsheet, to give it a URI and return some small amount of data
would be useful. Proactively materialising the whole thing (not
infinite but in some cases still very large) is probably not.

] Is that not a better way of doing what you want, William?
] Bring up a simple site that actually has http://example.org/issn/1234-5678 
] or perhaps more appropriately something like http://totl.net/issn/1234-5678
] which actually resolves to some (generated) RDF snippet that is
] sensible.

So quite reasonable, and I believe but am not certain that
crossref.org has already done exactly this for ISSNs (but that just
points to urn:issn:.... and linked periodicals) but then for this to
be useful (according to my use case) I would have to convince
all/many/most dataset authors to refer to these URIs of mine. Maybe
crossref.org will become this service for ISSNs. If that happens I
will be the first to agree that this is far better than URN. At that
point it becomes an actual dataset instead of a supposed one. But in
the meantime...

Cheers,
-w
-- 
William Waites                <mailto:ww@styx.org>
http://river.styx.org/ww/        <sip:ww@styx.org>
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45

Received on Monday, 23 May 2011 22:03:10 UTC