Re: How do you explore a SPARQL Endpoint? from Michael F Uschold on 2015-02-04 (semantic-web@w3.org from February 2015)

From: Michael F Uschold <uschold@gmail.com>
Date: Wed, 4 Feb 2015 12:42:37 -0800
To: Lushan Han <lushan1@umbc.edu>
Cc: Pavel Klinov <pavel.klinov@uni-ulm.de>, Bernard Vatant <bernard.vatant@mondeca.com>, Juan Sequeda <juanfederico@gmail.com>, Semantic Web <semantic-web@w3.org>, public-lod public <public-lod@w3.org>
Message-ID: <CADfiEMORvyjgQWJFNCTz2sCq8dtiQ3VSagqxBn0WS_VNXH6eCQ@mail.gmail.com>
We had occasion to need this ability on an application we were building for
a client using a triple store (TS). Triples were being created using
scripts and being updated into the TS,we also had an application that
allowed users to enter information which added more triples.  All of this
was backed by an ontology that was evolving. It was pretty tricking knowing
what parts of the ontology were being exercised and which were not.  So we
wrote some SPARQL queries that produced a table where each row said
something like this:
There are 543 triples where the subject is  of type Person and the
predicate is employedBy and the object is of type Organization.
A row looked like this:

Subject

On Wed, Feb 4, 2015 at 11:35 AM, Lushan Han <lushan1@umbc.edu> wrote:

> This work [1] might be helpful to some people. It automatically learns a
> "schema" from a given RDF dataset, including most probable classes and
> properties and most probable relations/paths between given classes and etc.
> Next, it can automatically translate a casual user's intuitive graph query
> or schema-free query to a formal SPARQL query using the learned schema and
> statistical NLP techniques, like textual semantic similarity.
>
> [1]
> http://ebiquity.umbc.edu/paper/html/id/658/Schema-Free-Querying-of-Semantic-Data
>
>
> Cheers,
>
> Lushan
>
> On Sun, Jan 25, 2015 at 11:32 PM, Pavel Klinov <pavel.klinov@uni-ulm.de>
> wrote:
>
>> On Sun, Jan 25, 2015 at 11:44 PM, Bernard Vatant
>> <bernard.vatant@mondeca.com> wrote:
>> > Hi Pavel
>> >
>> > Very interesting discussion, thanks for the follow-up.. Some quick
>> answers
>>
>> > below, but I'm currently writing a blog post which will go in more
>> details
>> > on the notion of Data Patterns, a term I've been pushing last week on
>> the DC
>> > Architecture list, where it seems to have gained some traction.
>> > See
>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind1501&L=dc-architecture
>> > for the discussion.
>>
>> OK, thanks for the link, will check it out. I agree that the patterns
>> is perhaps a better term than "schema" since by the latter people
>> typically mean explicit specification. I guess it's my use of the term
>> "schema" which created some confusion initially.
>>
>> >> ... which reflects what the
>> >> data is all about. Knowing such structure is useful (and often
>> >> necessary) to be able to write meaningful queries and that's, I think,
>> >> what the initial question was.
>> >
>> >
>> > Certainly, and I would rewrite this question : How do you find out data
>> > patterns in a dataset?
>>
>> I think it's a more general and tough question having to do with data
>> mining. Not sure that anyone would venture into finding out data
>> patterns against a public endpoint just to be able to write queries
>> for it.
>>
>> >
>> >>
>> >> When such structure exists, I'd say
>> >> that the dataset has an *implicit* schema (or a conceptual model, if
>> >> you will).
>> >
>> >
>> > Well, that's where I don't follow. If data, as it happens more and
>> more, is
>> > gathered from heterogeneous sources, the very notion of a conceptual
>> model
>> > is jumping to conclusions.
>>
>> A merger of structures is still a structure. By anyways, I've already
>> agreed to say patterns =)
>>
>> > In natural languages, patterns often precede the
>> > grammar describing them, even if the patterns described in the grammar
>> at
>> > some point become prescriptive rules. Data should be looked at the same
>> way.
>>
>> Not sure. I won't immediately disagree since I don't have statistics
>> regarding structured/unstructured datasets out there.
>>
>> >>
>> >> What is absent is an explicit representation of the schema,
>> >> or the conceptual model, in terms of RDFS, OWL, or SKOS axioms.
>> >
>> >
>> > When the dataset gathers various sources and various vocabularies, such
>> a
>> > schema does not exists, actually.
>>
>> Not necessarily. Parts of it may exist. Take yago, for example. It's
>> derived from a bunch of sources including Wikipedia and GeoNames and
>> yet offers its schema for a separate download.
>>
>> >> However, when the schema *is* represented explicitly, knowing it is a
>> >> huge help to users which otherwise know little about the data.
>> >
>> >
>> > OK, but the question is : which is a good format for exposing this
>> > structure?
>> > RDFS/OWL ontology/vocabulary, Application Profiles, RDF Shapes /
>> whatever it
>> > will be named, or ... ?
>>
>> I think this question is a bit secondary. If the need were recognized,
>> this could be, at least in theory, agreed on.
>>
>> >>
>> >> PPS. It'd also be correct to claim that even when a structure exists,
>> >> realistic data can be messy and not fit into it entirely. We've seen
>> >> stuff like literals in the range of object properties, etc. It's a
>> >> separate issue having to do with validation, for which there's an
>> >> ongoing effort at W3C. However, it doesn't generally hinder writing
>> >> queries which is what we're discussing here.
>> >
>> >
>> > Well I don't see it as a separate issue. All the raging debate around
>> RDF
>> > Shapes is not (yet) about validation, but on the definition of what a
>> > shape/structure/schema can be.
>>
>> OK, won't disagree on this.
>>
>> Thanks,
>> Pavel
>>
>> >
>> >
>> >>
>> >> > Since the very notion of schema for RDF data has no meaning at all,
>> >> > and the absence of schema is a bit frightening, people tend to give
>> it a
>> >> > lot
>> >> > of possible meanings, depending on your closed world or open world
>> >> > assumption, otherwise said if the "schema" will be used for some
>> kind of
>> >> > inference or validation. The use of "Schema" in RDFS has done
>> nothing to
>> >> > clarify this, and the use of "Ontology" in OWL added a layer of
>> >> > confusion. I
>> >> > tend to say "vocabulary" to name the set of types and predicates
>> used by
>> >> > a
>> >> > dataset (like in Linked Open Vocabularies), which is a minimal
>> >> > commitment to
>> >> > how it is considered by the dataset owner, bearing in mind that this
>> >> > "vocabulary" is generally a mix of imported terms from SKOS, FOAF,
>> >> > Dublin
>> >> > Core ... and home-made ones. Which is completely OK with the spirit
>> of
>> >> > RDF.
>> >> >
>> >> > The brand new LDOM [1] or whatever it ends up to be named at the end
>> of
>> >> > the
>> >> > day might clarify the situation, or muddle those waters a bit more :)
>> >> >
>> >> > [1] http://spinrdf.org/ldomprimer.html
>> >> >
>> >> > 2015-01-23 10:37 GMT+01:00 Pavel Klinov <pavel.klinov@uni-ulm.de>:
>> >> >>
>> >> >> Alright, so this isn't an answer and I might be saying something
>> >> >> totally silly (since I'm not a Linked Data person, really).
>> >> >>
>> >> >> If I re-phrase this question as the following: "how do I extract a
>> >> >> schema from a SPARQL endpoint?", then it seems to pop up quite often
>> >> >> (see, e.g., [1]). I understand that the original question is a bit
>> >> >> more general but it's fair to say that knowing the schema is a huge
>> >> >> help for writing meaningful queries.
>> >> >>
>> >> >> As an outsider, I'm quite surprised that there's still no commonly
>> >> >> accepted (i'm avoiding "standard" here) way of doing this. People
>> >> >> either hope that something like VoID or LOV vocabularies are being
>> >> >> used, or use 3-party tools, or write all sorts of ad hoc SPARQL
>> >> >> queries themselves, looking for types, object properties,
>> >> >> domains/ranges etc-etc. There are also papers written on this
>> subject.
>> >> >>
>> >> >> At the same time, the database engines which host datasets often
>> (not
>> >> >> always) manage the schema separately from the data. There're good
>> >> >> reasons for that. One reason, for example, is to be able to support
>> >> >> basic reasoning over the data, or integrity validation. Just because
>> >> >> in RDF the schema language and the data language are the same, so
>> >> >> schema and data triples can be interleaved, it need not (and often
>> >> >> not) be managed that way.
>> >> >>
>> >> >> Yet, there's no standard way of requesting the schema from the
>> >> >> endpoint, and I don't quite understand why. There's the SPARQL 1.1
>> >> >> Service Description, which could, in theory, cover it, but it
>> doesn't.
>> >> >> Servicing such schema extraction requests doesn't have to be
>> mandatory
>> >> >> so the endpoints which don't have their schemas right there don't
>> have
>> >> >> to sift through the data. Also, schemas are typically quite small.
>> >> >>
>> >> >> I guess there's some problem with this which I'm missing...
>> >> >>
>> >> >> Thanks,
>> >> >> Pavel
>> >> >>
>> >> >> [1]
>> >> >>
>> >> >>
>> http://answers.semanticweb.com/questions/25696/extract-ontology-schema-for-a-given-sparql-endpoint-data-set
>> >> >>
>> >> >> On Thu, Jan 22, 2015 at 3:09 PM, Juan Sequeda <
>> juanfederico@gmail.com>
>> >> >> wrote:
>> >> >> > Assume you are given a URL for a SPARQL endpoint. You have no idea
>> >> >> > what
>> >> >> > data
>> >> >> > is being exposed.
>> >> >> >
>> >> >> > What do you do to explore that endpoint? What queries do you
>> write?
>> >> >> >
>> >> >> > Juan Sequeda
>> >> >> > +1-575-SEQ-UEDA
>> >> >> > www.juansequeda.com
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> >
>> >
>> >
>> > --
>> > Bernard Vatant
>> > Vocabularies & Data Engineering
>> > Tel :  + 33 (0)9 71 48 84 59
>> > Skype : bernard.vatant
>> > http://google.com/+BernardVatant
>> > --------------------------------------------------------
>> > Mondeca
>> > 35 boulevard de Strasbourg 75010 Paris
>> > www.mondeca.com
>> > Follow us on Twitter : @mondecanews
>> > ----------------------------------------------------------
>>
>>
>


-- 

Michael Uschold
   Senior Ontology Consultant, Semantic Arts
   http://www.semanticarts.com
   LinkedIn: http://tr.im/limfu
   Skype, Twitter: UscholdM
Received on Wednesday, 4 February 2015 20:43:11 UTC