Re: How do you explore a SPARQL Endpoint?

From: Pavel Klinov <pavel.klinov@uni-ulm.de>
Date: Sun, 25 Jan 2015 22:53:52 +0100
Message-ID: <CAG5JQxVrv16maxk0WzcvQorDnoqO7CGM7yZ4hTUx05w7VLj6gA@mail.gmail.com>
To: Bernard Vatant <bernard.vatant@mondeca.com>
Cc: Pavel Klinov <pavel.klinov@uni-ulm.de>, Juan Sequeda <juanfederico@gmail.com>, Semantic Web <semantic-web@w3.org>, public-lod public <public-lod@w3.org>
Hi Bernard,

On Fri, Jan 23, 2015 at 11:28 AM, Bernard Vatant
<bernard.vatant@mondeca.com> wrote:
> Hi Pavel
> Maybe what you are missing is that RDF data, by design, do not need a
> schema.

Right, I am aware of that. I think it's important to separate the
absence of the schema from the absence of an explicit representation
of the schema.

Most of real-life datasets have some structure which reflects what the
data is all about. Knowing such structure is useful (and often
necessary) to be able to write meaningful queries and that's, I think,
what the initial question was. When such structure exists, I'd say
that the dataset has an *implicit* schema (or a conceptual model, if
you will). What is absent is an explicit representation of the schema,
or the conceptual model, in terms of RDFS, OWL, or SKOS axioms.

Then, of course, you're right that the RDF spec doesn't mandate that
the schema is explicitly represented (put in other words, that the
structure is explicitly modeled). Which is fine.

However, when the schema *is* represented explicitly, knowing it is a
huge help to users which otherwise know little about the data. It's
especially important for public data endpoints. Several projects, e.g.
ProLOD++, aimed at analyzing the structure of LOD would benefit from
being able to request the schema from those datasets, which i)
represent it explicitly and ii) manage it separately from the data and
thus can service such requests efficiently. As I said above, ii) also
makes sense for other reasons.

What is missing is a simple protocol for asking "is your data's
structure modeled explicitly? If yes, please give me the schema
triples". Or at least the vocabulary used in the data. Instead,
everyone just comes up with their own exploratory SPARQL queries,
which would seem like unnecessary work if there were a simpler
question to ask.


PPS. It'd also be correct to claim that even when a structure exists,
realistic data can be messy and not fit into it entirely. We've seen
stuff like literals in the range of object properties, etc. It's a
separate issue having to do with validation, for which there's an
ongoing effort at W3C. However, it doesn't generally hinder writing
queries which is what we're discussing here.

> Since the very notion of schema for RDF data has no meaning at all,
> and the absence of schema is a bit frightening, people tend to give it a lot
> of possible meanings, depending on your closed world or open world
> assumption, otherwise said if the "schema" will be used for some kind of
> inference or validation. The use of "Schema" in RDFS has done nothing to
> clarify this, and the use of "Ontology" in OWL added a layer of confusion. I
> tend to say "vocabulary" to name the set of types and predicates used by a
> dataset (like in Linked Open Vocabularies), which is a minimal commitment to
> how it is considered by the dataset owner, bearing in mind that this
> "vocabulary" is generally a mix of imported terms from SKOS, FOAF, Dublin
> Core ... and home-made ones. Which is completely OK with the spirit of RDF.
> The brand new LDOM [1] or whatever it ends up to be named at the end of the
> day might clarify the situation, or muddle those waters a bit more :)
> [1] http://spinrdf.org/ldomprimer.html
> 2015-01-23 10:37 GMT+01:00 Pavel Klinov <pavel.klinov@uni-ulm.de>:
>> Alright, so this isn't an answer and I might be saying something
>> totally silly (since I'm not a Linked Data person, really).
>> If I re-phrase this question as the following: "how do I extract a
>> schema from a SPARQL endpoint?", then it seems to pop up quite often
>> (see, e.g., [1]). I understand that the original question is a bit
>> more general but it's fair to say that knowing the schema is a huge
>> help for writing meaningful queries.
>> As an outsider, I'm quite surprised that there's still no commonly
>> accepted (i'm avoiding "standard" here) way of doing this. People
>> either hope that something like VoID or LOV vocabularies are being
>> used, or use 3-party tools, or write all sorts of ad hoc SPARQL
>> queries themselves, looking for types, object properties,
>> domains/ranges etc-etc. There are also papers written on this subject.
>> At the same time, the database engines which host datasets often (not
>> always) manage the schema separately from the data. There're good
>> reasons for that. One reason, for example, is to be able to support
>> basic reasoning over the data, or integrity validation. Just because
>> in RDF the schema language and the data language are the same, so
>> schema and data triples can be interleaved, it need not (and often
>> not) be managed that way.
>> Yet, there's no standard way of requesting the schema from the
>> endpoint, and I don't quite understand why. There's the SPARQL 1.1
>> Service Description, which could, in theory, cover it, but it doesn't.
>> Servicing such schema extraction requests doesn't have to be mandatory
>> so the endpoints which don't have their schemas right there don't have
>> to sift through the data. Also, schemas are typically quite small.
>> I guess there's some problem with this which I'm missing...
>> Thanks,
>> Pavel
>> [1]
>> http://answers.semanticweb.com/questions/25696/extract-ontology-schema-for-a-given-sparql-endpoint-data-set
>> On Thu, Jan 22, 2015 at 3:09 PM, Juan Sequeda <juanfederico@gmail.com>
>> wrote:
>> > Assume you are given a URL for a SPARQL endpoint. You have no idea what
>> > data
>> > is being exposed.
>> >
>> > What do you do to explore that endpoint? What queries do you write?
>> >
>> > Juan Sequeda
>> > +1-575-SEQ-UEDA
>> > www.juansequeda.com
Received on Sunday, 25 January 2015 21:54:28 UTC

