Re: Minting URIs: how to deal with unknown data structures from Giovanni Tummarello on 2011-04-16 (public-lod@w3.org from April 2011)

From: Giovanni Tummarello <giovanni.tummarello@deri.org>
Date: Sat, 16 Apr 2011 23:36:56 +0200
To: Frans Knibbe <frans.knibbe@geodan.nl>
Cc: public-lod <public-lod@w3.org>
Message-ID: <BANLkTikFVHLE5SxGQVy2j9YE5fhti8An4w@mail.gmail.com>
Hi Frank, my 2c from the Sindice.com point of view..  (as we struggle
to actually make use and make easy for others to use all this)

i wouldn't really worry too much,

just give to the machines what you'd give to humans, that technically
means simply make sure all the pages you display (and that talk about
your content) have RDFa on them.

so if you have pages for your employees, just add triples on them with
proper markup.

Use http://inspector.sindice.net to see/inspect how you're doing.

My advice is : make sure the description is rich, as rich as possible
to enable disambiguation.

--> Ideally you should "reuse other people's URIs', or put sameas" in
practice i think this sort of advice is just utopia - i mean, i find
its asking people with perfectly good data to do some huge effort when
the benefits of it are really unclear and intangible at this point.
--> In practice i would aim at simply "describing your data very
well".that is making sure your descriptions are rich and expressive
enough so that one could (if needed) easily link your descriptions to
other datasets. We wrote a small position paper some time ago which i
feel like reccomanding [1]

once this is done, make sure you have a sitemap.xml file to tell the
world what your exposed data is (e.g. your employees, your products,
whatever) and you're set.

If you change something, search engines (or agents) will simply index
your new structures.. and eventually make sense of it. You're data
wont be any more "strange" then that of other people, so either we're
smart enough in adapting or the "Web of Data" beyond the current
"google rich snippets" or facebook opengraph will never be.

Gio


[1] Publishing Data that Links Itself: A Conjecture
G Tummarello, R Delbru - 2010 AAAI Spring Symposium Series, 2010 - aaai.org
http://www.aaai.org/ocs/index.php/SSS/SSS10/paper/download/1189/1467

On Fri, Apr 15, 2011 at 2:48 PM, Frans Knibbe <frans.knibbe@geodan.nl> wrote:
> Hello,
>
> Some newbie questions here...
>
> I have recently come in contact with the concept of Linked Data and I have
> become enthusiastic. I would like to promote the idea within my company (we
> specialize is geographical data) and within my country. I have read the
> excellent Linked Data book (“Linked Data: Evolving the Web into a Global
> Data Space”) and I think I am almost ready to start publishing Linked Data.
> I understand that it is important to get the URIs right, and not have to
> change them later. That is what my questions are about.
>
> I have acquired the first part (authority) of my URIs, let's say it is
> lod.mycompany.com. Now I am faced with the question: How do I come up with a
> URI scheme that will stand the test of time? I think I will start with
> publishing some FOAF data of myself and co-workers. And then hopefully more
> and more data will follow. At this moment I can not possible imagine which
> types of data we will publish. They are likely to have some kind of
> geographical component, but that is true for a lot of data. I believe it is
> not possible to come up with any hierarchical structure that will
> accommodate all types of data that might ever be published.
>
> So I think it is best to leave out any indication of data organization in
> the path element of the URI (i.e. http://lod.mycompany.com/people is a bad
> idea). In my understanding, I could use base URIs like
> http://lod.mycompany.com/resource, http://lod.mycompany.com/page and
> hhtp://lod.mycompany.com.data, and then use unique identifiers for all the
> things I want to publish something about. If I understand correctly, I don't
> need the URI to describe the hierarchy of my data because all Linked Data
> are self-describing. Nice.
>
> But then I am faced with the problem: What method do I use to mint my
> identifiers? Those identifiers need to be unique. Should I use a number
> sequence, or a hash function? In those cases the URIs would be uniform and
> give no indication of the type of data. But a number sequence seems unsafe,
> and in the case of a hash function I would still need to make some kind of
> structured choice of input values.
>
> I would welcome any advice on this topic from people who have had some more
> experience with publishing Linked Data.
>
> Regards,
> Frans Knibbe
>
>
>
>
>
>
Received on Saturday, 16 April 2011 21:37:25 UTC