Re: data vs. information (Was Re: implied datasets)

Thanks for all your thoughts William - food for pondering.
A few comments, which I find hard to interleave - sorry.
The totl.net site doesn't have to be hit for the URIs to have value.
dbpedia doesn't even have to exist for the URIs to have value (<ducks />) - well at least not very often; it may have been down for the last month, for all I know, but I have been using the URIs.
They are just there as useful identifiers.
Ah yes, crossref.org; I realise all this is quite controversial in the publishing sphere - I think there is a site that does Linked Data to DOI, I don't think crossref.org does it, but can't remember which it is.

It is rather strange that we worry about having "authoritative" or at least agreed URIs for hard things like people, but don't manage to have them for less complex things (at least in terms of enumeration) such as pantone colours or chemical elements, and yes, ISSN.

dbpedia can sort of fit this role, if wikipedia had pages on them, but somehow it feels like clear datasets such as this should be sort of taken out.
And of course, when we get to datasets of essentially arbitrary size (IPV6 URI anyone?, or even V4), we are in a different world of representation and service.
Best
Hugh

On 23 May 2011, at 23:02, William Waites wrote:

> * [2011-05-23 18:19:49 +0000] Hugh Glaser <hg@ecs.soton.ac.uk> écrit:
> 
> ] I won't go into whether the April Fool's joke of the integers might
> ] actually be useful (note that dbpedia has quite a lot of URIs for numbers),
> ] but there will be many other "standard" URIs for things that we take for granted.
> ] The recent colour ones might seem like a joke as well, but perhaps not?
> 
> I had this a little bit in mind when I wrote the original mail, and this 
> goes nicely to some related thoughts about quality.
> 
> The thing with the linked open numbers is that it makes the point
> pretty neatly I think that it is silly to try to materialise
> everything that can be stated in RDF. A small computer program that
> describes numbers might have the same information content as all of
> those numbers made manifest. And it would take up a lot less disk
> space and be much faster to query. But you could still use it to refer
> to the numbers when you needed to.
> 
> Is this always the case? It seems to be the tradeoff is speed
> vs. space. For some aspects of numbers this makes sense (e.g. their
> representation in roman numerals) but what about computationally
> expensive things like their prime factors? This quickly becomes too
> expensive to calculate on the fly but actually a lookup service could
> make a certain amount of sense...
> 
> ] My favourite at the moment is
> ] http://data.totl.net/chess/state/rnbqkbnr_pppppppp_8_8_8_8_PPPPPPPP_RNBQKBNR_w_KQkq_-_0_1
> ] A very large number of URIs that describe chess positions.
> ] And tells you things like the next legal move in RDF.
> ] 
> ] So if I had loads of games in RDF, I could reliably do some fun queries
> ] about games with move sequences, etc.
> ] 
> ] Seems to me it is very similar to William's requirements.
> 
> Oh, that's beautiful.
> 
> ] However, it does it slightly differently, by having resolvable URIs for the
> ] positions, which can easily go to the more conventional representations.
> 
> And this works because the service has a compact representation of the
> space of all possible positions and moves, a small computer
> program. You can then materialise the small subspace that you're
> interested in and run some analysis on it.
> 
> Its nice that totl wants to run that program for me but I guess they
> could just as easily give me the program and let me run it myself. Bit
> for bit they would have given me far more information and far less
> data. But then it might be more convenient for me to use their service
> if I only have a relatively small number of positions/moves to
> consider. Might the service be useful for a program to help study how
> to play chess? Quite possibly. Would it make sense to build a
> chess-playing computer on top of their service? It would be
> interesting to see but I suspect network traffic and delays would be
> prohibitive.
> 
> Its the same story with the trend of taking CSV files, a pretty
> compact and easy to work with representation of tabular data, and
> expanding them into giant RDF datasets that take up a lot of disk
> space and are cumbersome to query. A service to refer to a cell in a
> spreadsheet, to give it a URI and return some small amount of data
> would be useful. Proactively materialising the whole thing (not
> infinite but in some cases still very large) is probably not.
> 
> ] Is that not a better way of doing what you want, William?
> ] Bring up a simple site that actually has http://example.org/issn/1234-5678 
> ] or perhaps more appropriately something like http://totl.net/issn/1234-5678
> ] which actually resolves to some (generated) RDF snippet that is
> ] sensible.
> 
> So quite reasonable, and I believe but am not certain that
> crossref.org has already done exactly this for ISSNs (but that just
> points to urn:issn:.... and linked periodicals) but then for this to
> be useful (according to my use case) I would have to convince
> all/many/most dataset authors to refer to these URIs of mine. Maybe
> crossref.org will become this service for ISSNs. If that happens I
> will be the first to agree that this is far better than URN. At that
> point it becomes an actual dataset instead of a supposed one. But in
> the meantime...
> 
> Cheers,
> -w
> -- 
> William Waites                <mailto:ww@styx.org>
> http://river.styx.org/ww/        <sip:ww@styx.org>
> F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45

-- 
Hugh Glaser,  
              Intelligence, Agents, Multimedia
              School of Electronics and Computer Science,
              University of Southampton,
              Southampton SO17 1BJ
Work: +44 23 8059 3670, Fax: +44 23 8059 3045
Mobile: +44 75 9533 4155 , Home: +44 23 8061 5652
http://www.ecs.soton.ac.uk/~hg/

Received on Monday, 23 May 2011 22:46:29 UTC