Re: How do you explore a SPARQL Endpoint? from Bernard Vatant on 2015-01-25 (public-lod@w3.org from January 2015)

From: Bernard Vatant <bernard.vatant@mondeca.com>
Date: Sun, 25 Jan 2015 23:44:32 +0100
To: Pavel Klinov <pavel.klinov@uni-ulm.de>
Cc: Juan Sequeda <juanfederico@gmail.com>, Semantic Web <semantic-web@w3.org>, public-lod public <public-lod@w3.org>
Message-ID: <CAK4ZFVHNOZD4WfQQr1+Xvc+ACsgDTftktBYfpRtPzb-Hcad90Q@mail.gmail.com>
Hi Pavel

Very interesting discussion, thanks for the follow-up. Some quick answers
below, but I'm currently writing a blog post which will go in more details
on the notion of Data Patterns, a term I've been pushing last week on the
DC Architecture list, where it seems to have gained some traction.
See https://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind1501&L=dc-architecture
for the discussion.

2015-01-25 22:53 GMT+01:00 Pavel Klinov <pavel.klinov@uni-ulm.de>:

>
> On Fri, Jan 23, 2015 at 11:28 AM, Bernard Vatant
> <bernard.vatant@mondeca.com> wrote:
> > Hi Pavel
> >
> > Maybe what you are missing is that RDF data, by design, do not need a
> > schema.
>
> Right, I am aware of that. I think it's important to separate the
> absence of the schema from the absence of an explicit representation
> of the schema.
>

Well indeed, but there is no fine line of separation, since there is
neither standard or even consensual definition of "schema" here :)

Most of real-life datasets have some structure...


Indeed, because they are generally transformed from structured data (data
bases, XML, whatever). But structure does not mean schema. I would rather
say patterns than structure. Structure carries the implicit notion of a
global, consistent architecture. Pattern is more generic, and more fit to
denote regularities that can happen at various levels of granularity, and
not necessarily everywhere in the data (because of heterogeneous sources
for examples).


> ... which reflects what the
> data is all about. Knowing such structure is useful (and often
> necessary) to be able to write meaningful queries and that's, I think,
> what the initial question was.


Certainly, and I would rewrite this question : How do you find out data
patterns in a dataset?


> When such structure exists, I'd say
> that the dataset has an *implicit* schema (or a conceptual model, if
> you will).


Well, that's where I don't follow. If data, as it happens more and more, is
gathered from heterogeneous sources, the very notion of a conceptual model
is jumping to conclusions. In natural languages, patterns often precede the
grammar describing them, even if the patterns described in the grammar at
some point become prescriptive rules. Data should be looked at the same way.


> What is absent is an explicit representation of the schema,
> or the conceptual model, in terms of RDFS, OWL, or SKOS axioms.
>

When the dataset gathers various sources and various vocabularies, such a
schema does not exists, actually.


> Then, of course, you're right that the RDF spec doesn't mandate that
> the schema is explicitly represented (put in other words, that the
> structure is explicitly modeled). Which is fine.
>

Nobody can disagree on that :)


> However, when the schema *is* represented explicitly, knowing it is a
> huge help to users which otherwise know little about the data.


OK, but the question is : which is a good format for exposing this
structure?
RDFS/OWL ontology/vocabulary, Application Profiles, RDF Shapes / whatever
it will be named, or ... ?


> It's especially important for public data endpoints. Several projects, e.g.
> ProLOD++, aimed at analyzing the structure of LOD would benefit from
> being able to request the schema from those datasets, which i)
> represent it explicitly and ii) manage it separately from the data and
> thus can service such requests efficiently. As I said above, ii) also
> makes sense for other reasons.
>

Agreed

What is missing is a simple protocol for asking "is your data's
> structure modeled explicitly? If yes, please give me the schema
> triples".


Assuming the "schema" is expressed in some idiom of RDF ...

Or at least the vocabulary used in the data. Instead,
> everyone just comes up with their own exploratory SPARQL queries,
> which would seem like unnecessary work if there were a simpler
> question to ask.
>

Sure.


> Cheers,
> Pavel
>
> PPS. It'd also be correct to claim that even when a structure exists,
> realistic data can be messy and not fit into it entirely. We've seen
> stuff like literals in the range of object properties, etc. It's a
> separate issue having to do with validation, for which there's an
> ongoing effort at W3C. However, it doesn't generally hinder writing
> queries which is what we're discussing here.
>

Well I don't see it as a separate issue. All the raging debate around RDF
Shapes is not (yet) about validation, but on the definition of what a
shape/structure/schema can be.



>  > Since the very notion of schema for RDF data has no meaning at all,
> > and the absence of schema is a bit frightening, people tend to give it a
> lot
> > of possible meanings, depending on your closed world or open world
> > assumption, otherwise said if the "schema" will be used for some kind of
> > inference or validation. The use of "Schema" in RDFS has done nothing to
> > clarify this, and the use of "Ontology" in OWL added a layer of
> confusion. I
> > tend to say "vocabulary" to name the set of types and predicates used by
> a
> > dataset (like in Linked Open Vocabularies), which is a minimal
> commitment to
> > how it is considered by the dataset owner, bearing in mind that this
> > "vocabulary" is generally a mix of imported terms from SKOS, FOAF, Dublin
> > Core ... and home-made ones. Which is completely OK with the spirit of
> RDF.
> >
> > The brand new LDOM [1] or whatever it ends up to be named at the end of
> the
> > day might clarify the situation, or muddle those waters a bit more :)
> >
> > [1] http://spinrdf.org/ldomprimer.html
> >
> > 2015-01-23 10:37 GMT+01:00 Pavel Klinov <pavel.klinov@uni-ulm.de>:
> >>
> >> Alright, so this isn't an answer and I might be saying something
> >> totally silly (since I'm not a Linked Data person, really).
> >>
> >> If I re-phrase this question as the following: "how do I extract a
> >> schema from a SPARQL endpoint?", then it seems to pop up quite often
> >> (see, e.g., [1]). I understand that the original question is a bit
> >> more general but it's fair to say that knowing the schema is a huge
> >> help for writing meaningful queries.
> >>
> >> As an outsider, I'm quite surprised that there's still no commonly
> >> accepted (i'm avoiding "standard" here) way of doing this. People
> >> either hope that something like VoID or LOV vocabularies are being
> >> used, or use 3-party tools, or write all sorts of ad hoc SPARQL
> >> queries themselves, looking for types, object properties,
> >> domains/ranges etc-etc. There are also papers written on this subject.
> >>
> >> At the same time, the database engines which host datasets often (not
> >> always) manage the schema separately from the data. There're good
> >> reasons for that. One reason, for example, is to be able to support
> >> basic reasoning over the data, or integrity validation. Just because
> >> in RDF the schema language and the data language are the same, so
> >> schema and data triples can be interleaved, it need not (and often
> >> not) be managed that way.
> >>
> >> Yet, there's no standard way of requesting the schema from the
> >> endpoint, and I don't quite understand why. There's the SPARQL 1.1
> >> Service Description, which could, in theory, cover it, but it doesn't.
> >> Servicing such schema extraction requests doesn't have to be mandatory
> >> so the endpoints which don't have their schemas right there don't have
> >> to sift through the data. Also, schemas are typically quite small.
> >>
> >> I guess there's some problem with this which I'm missing...
> >>
> >> Thanks,
> >> Pavel
> >>
> >> [1]
> >>
> http://answers.semanticweb.com/questions/25696/extract-ontology-schema-for-a-given-sparql-endpoint-data-set
> >>
> >> On Thu, Jan 22, 2015 at 3:09 PM, Juan Sequeda <juanfederico@gmail.com>
> >> wrote:
> >> > Assume you are given a URL for a SPARQL endpoint. You have no idea
> what
> >> > data
> >> > is being exposed.
> >> >
> >> > What do you do to explore that endpoint? What queries do you write?
> >> >
> >> > Juan Sequeda
> >> > +1-575-SEQ-UEDA
> >> > www.juansequeda.com
> >>
> >
> >
> >
> >
>

-- 

*Bernard Vatant*
Vocabularies & Data Engineering
Tel :  + 33 (0)9 71 48 84 59
Skype : bernard.vatant
http://google.com/+BernardVatant
--------------------------------------------------------
*Mondeca*
35 boulevard de Strasbourg 75010 Paris
www.mondeca.com
Follow us on Twitter : @mondecanews <http://twitter.com/#%21/mondecanews>
----------------------------------------------------------
Received on Sunday, 25 January 2015 22:45:21 UTC