Re: How do you explore a SPARQL Endpoint?

This work [1] might be helpful to some people. It automatically learns a
"schema" from a given RDF dataset, including most probable classes and
properties and most probable relations/paths between given classes and etc.
Next, it can automatically translate a casual user's intuitive graph query
or schema-free query to a formal SPARQL query using the learned schema and
statistical NLP techniques, like textual semantic similarity.

[1]
http://ebiquity.umbc.edu/paper/html/id/658/Schema-Free-Querying-of-Semantic-Data


Cheers,

Lushan

On Sun, Jan 25, 2015 at 11:32 PM, Pavel Klinov <pavel.klinov@uni-ulm.de>
wrote:

> On Sun, Jan 25, 2015 at 11:44 PM, Bernard Vatant
> <bernard.vatant@mondeca.com> wrote:
> > Hi Pavel
> >
> > Very interesting discussion, thanks for the follow-up. Some quick answers
> > below, but I'm currently writing a blog post which will go in more
> details
> > on the notion of Data Patterns, a term I've been pushing last week on
> the DC
> > Architecture list, where it seems to have gained some traction.
> > See
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind1501&L=dc-architecture
> > for the discussion.
>
> OK, thanks for the link, will check it out. I agree that the patterns
> is perhaps a better term than "schema" since by the latter people
> typically mean explicit specification. I guess it's my use of the term
> "schema" which created some confusion initially.
>
> >> ... which reflects what the
> >> data is all about. Knowing such structure is useful (and often
> >> necessary) to be able to write meaningful queries and that's, I think,
> >> what the initial question was.
> >
> >
> > Certainly, and I would rewrite this question : How do you find out data
> > patterns in a dataset?
>
> I think it's a more general and tough question having to do with data
> mining. Not sure that anyone would venture into finding out data
> patterns against a public endpoint just to be able to write queries
> for it.
>
> >
> >>
> >> When such structure exists, I'd say
> >> that the dataset has an *implicit* schema (or a conceptual model, if
> >> you will).
> >
> >
> > Well, that's where I don't follow. If data, as it happens more and more,
> is
> > gathered from heterogeneous sources, the very notion of a conceptual
> model
> > is jumping to conclusions.
>
> A merger of structures is still a structure. By anyways, I've already
> agreed to say patterns =)
>
> > In natural languages, patterns often precede the
> > grammar describing them, even if the patterns described in the grammar at
> > some point become prescriptive rules. Data should be looked at the same
> way.
>
> Not sure. I won't immediately disagree since I don't have statistics
> regarding structured/unstructured datasets out there.
>
> >>
> >> What is absent is an explicit representation of the schema,
> >> or the conceptual model, in terms of RDFS, OWL, or SKOS axioms.
> >
> >
> > When the dataset gathers various sources and various vocabularies, such a
> > schema does not exists, actually.
>
> Not necessarily. Parts of it may exist. Take yago, for example. It's
> derived from a bunch of sources including Wikipedia and GeoNames and
> yet offers its schema for a separate download.
>
> >> However, when the schema *is* represented explicitly, knowing it is a
> >> huge help to users which otherwise know little about the data.
> >
> >
> > OK, but the question is : which is a good format for exposing this
> > structure?
> > RDFS/OWL ontology/vocabulary, Application Profiles, RDF Shapes /
> whatever it
> > will be named, or ... ?
>
> I think this question is a bit secondary. If the need were recognized,
> this could be, at least in theory, agreed on.
>
> >>
> >> PPS. It'd also be correct to claim that even when a structure exists,
> >> realistic data can be messy and not fit into it entirely. We've seen
> >> stuff like literals in the range of object properties, etc. It's a
> >> separate issue having to do with validation, for which there's an
> >> ongoing effort at W3C. However, it doesn't generally hinder writing
> >> queries which is what we're discussing here.
> >
> >
> > Well I don't see it as a separate issue. All the raging debate around RDF
> > Shapes is not (yet) about validation, but on the definition of what a
> > shape/structure/schema can be.
>
> OK, won't disagree on this.
>
> Thanks,
> Pavel
>
> >
> >
> >>
> >> > Since the very notion of schema for RDF data has no meaning at all,
> >> > and the absence of schema is a bit frightening, people tend to give
> it a
> >> > lot
> >> > of possible meanings, depending on your closed world or open world
> >> > assumption, otherwise said if the "schema" will be used for some kind
> of
> >> > inference or validation. The use of "Schema" in RDFS has done nothing
> to
> >> > clarify this, and the use of "Ontology" in OWL added a layer of
> >> > confusion. I
> >> > tend to say "vocabulary" to name the set of types and predicates used
> by
> >> > a
> >> > dataset (like in Linked Open Vocabularies), which is a minimal
> >> > commitment to
> >> > how it is considered by the dataset owner, bearing in mind that this
> >> > "vocabulary" is generally a mix of imported terms from SKOS, FOAF,
> >> > Dublin
> >> > Core ... and home-made ones. Which is completely OK with the spirit of
> >> > RDF.
> >> >
> >> > The brand new LDOM [1] or whatever it ends up to be named at the end
> of
> >> > the
> >> > day might clarify the situation, or muddle those waters a bit more :)
> >> >
> >> > [1] http://spinrdf.org/ldomprimer.html
> >> >
> >> > 2015-01-23 10:37 GMT+01:00 Pavel Klinov <pavel.klinov@uni-ulm.de>:
> >> >>
> >> >> Alright, so this isn't an answer and I might be saying something
> >> >> totally silly (since I'm not a Linked Data person, really).
> >> >>
> >> >> If I re-phrase this question as the following: "how do I extract a
> >> >> schema from a SPARQL endpoint?", then it seems to pop up quite often
> >> >> (see, e.g., [1]). I understand that the original question is a bit
> >> >> more general but it's fair to say that knowing the schema is a huge
> >> >> help for writing meaningful queries.
> >> >>
> >> >> As an outsider, I'm quite surprised that there's still no commonly
> >> >> accepted (i'm avoiding "standard" here) way of doing this. People
> >> >> either hope that something like VoID or LOV vocabularies are being
> >> >> used, or use 3-party tools, or write all sorts of ad hoc SPARQL
> >> >> queries themselves, looking for types, object properties,
> >> >> domains/ranges etc-etc. There are also papers written on this
> subject.
> >> >>
> >> >> At the same time, the database engines which host datasets often (not
> >> >> always) manage the schema separately from the data. There're good
> >> >> reasons for that. One reason, for example, is to be able to support
> >> >> basic reasoning over the data, or integrity validation. Just because
> >> >> in RDF the schema language and the data language are the same, so
> >> >> schema and data triples can be interleaved, it need not (and often
> >> >> not) be managed that way.
> >> >>
> >> >> Yet, there's no standard way of requesting the schema from the
> >> >> endpoint, and I don't quite understand why. There's the SPARQL 1.1
> >> >> Service Description, which could, in theory, cover it, but it
> doesn't.
> >> >> Servicing such schema extraction requests doesn't have to be
> mandatory
> >> >> so the endpoints which don't have their schemas right there don't
> have
> >> >> to sift through the data. Also, schemas are typically quite small.
> >> >>
> >> >> I guess there's some problem with this which I'm missing...
> >> >>
> >> >> Thanks,
> >> >> Pavel
> >> >>
> >> >> [1]
> >> >>
> >> >>
> http://answers.semanticweb.com/questions/25696/extract-ontology-schema-for-a-given-sparql-endpoint-data-set
> >> >>
> >> >> On Thu, Jan 22, 2015 at 3:09 PM, Juan Sequeda <
> juanfederico@gmail.com>
> >> >> wrote:
> >> >> > Assume you are given a URL for a SPARQL endpoint. You have no idea
> >> >> > what
> >> >> > data
> >> >> > is being exposed.
> >> >> >
> >> >> > What do you do to explore that endpoint? What queries do you write?
> >> >> >
> >> >> > Juan Sequeda
> >> >> > +1-575-SEQ-UEDA
> >> >> > www.juansequeda.com
> >> >>
> >> >
> >> >
> >> >
> >> >
> >
> >
> > --
> > Bernard Vatant
> > Vocabularies & Data Engineering
> > Tel :  + 33 (0)9 71 48 84 59
> > Skype : bernard.vatant
> > http://google.com/+BernardVatant
> > --------------------------------------------------------
> > Mondeca
> > 35 boulevard de Strasbourg 75010 Paris
> > www.mondeca.com
> > Follow us on Twitter : @mondecanews
> > ----------------------------------------------------------
>
>

Received on Wednesday, 4 February 2015 19:39:23 UTC