Re: How do you explore a SPARQL Endpoint?

Sorry, ignore priore email, it was sent prematurely.

We had occasion to need the ability to eplore a triple store in an
application we were building for a client using a triple store (TS).
Triples were being created using scripts and being updated into the TS,we
also had an application that allowed users to enter information which added
more triples.  All of this was backed by an ontology that was evolving. It
was pretty tricking knowing what parts of the ontology were being exercised
and which were not.  So we wrote some SPARQL queries that produced a table
where each row said something like this:
There are 543 triples where the subject is  of type Person and the
predicate is employedBy and the object is of type Organization.
The table looked a bit like this:

Subject        Predicate         Object         Count
Person          hasEmployer    Organization 2344
Organization locatedIn        GeoRegion       432

We found this to be extremely useful, not only to see exactly what was
being used, but also how much as well as what was NOT being used, which
were candidates for removing from the ontology.  The SPARQL queries are not
simple to write, but they are not too bad either. Some of the other
responses spoke of similar things.

This is more specialized than the original question, which was to find out
what the ontology was.   Here were were more concerned about which parts of
the ontology were being used.

Michael


On Wed, Feb 4, 2015 at 12:42 PM, Michael F Uschold <uschold@gmail.com>
wrote:

> We had occasion to need this ability on an application we were building
> for a client using a triple store (TS). Triples were being created using
> scripts and being updated into the TS,we also had an application that
> allowed users to enter information which added more triples.  All of this
> was backed by an ontology that was evolving. It was pretty tricking knowing
> what parts of the ontology were being exercised and which were not.  So we
> wrote some SPARQL queries that produced a table where each row said
> something like this:
> There are 543 triples where the subject is  of type Person and the
> predicate is employedBy and the object is of type Organization.
> A row looked like this:
>
> Subject
>
> On Wed, Feb 4, 2015 at 11:35 AM, Lushan Han <lushan1@umbc.edu> wrote:
>
>> This work [1] might be helpful to some people. It automatically learns a
>> "schema" from a given RDF dataset, including most probable classes and
>> properties and most probable relations/paths between given classes and etc.
>> Next, it can automatically translate a casual user's intuitive graph query
>> or schema-free query to a formal SPARQL query using the learned schema and
>> statistical NLP techniques, like textual semantic similarity.
>>
>> [1]
>> http://ebiquity.umbc.edu/paper/html/id/658/Schema-Free-Querying-of-Semantic-Data
>>
>>
>> Cheers,
>>
>> Lushan
>>
>> On Sun, Jan 25, 2015 at 11:32 PM, Pavel Klinov <pavel.klinov@uni-ulm.de>
>> wrote:
>>
>>> On Sun, Jan 25, 2015 at 11:44 PM, Bernard Vatant
>>> <bernard.vatant@mondeca.com> wrote:
>>> > Hi Pavel
>>> >
>>> > Very interesting discussion, thanks for the follow-up.. Some quick
>>> answers
>>>
>>> > below, but I'm currently writing a blog post which will go in more
>>> details
>>> > on the notion of Data Patterns, a term I've been pushing last week on
>>> the DC
>>> > Architecture list, where it seems to have gained some traction.
>>> > See
>>> https://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind1501&L=dc-architecture
>>> > for the discussion.
>>>
>>> OK, thanks for the link, will check it out. I agree that the patterns
>>> is perhaps a better term than "schema" since by the latter people
>>> typically mean explicit specification. I guess it's my use of the term
>>> "schema" which created some confusion initially.
>>>
>>> >> ... which reflects what the
>>> >> data is all about. Knowing such structure is useful (and often
>>> >> necessary) to be able to write meaningful queries and that's, I think,
>>> >> what the initial question was.
>>> >
>>> >
>>> > Certainly, and I would rewrite this question : How do you find out data
>>> > patterns in a dataset?
>>>
>>> I think it's a more general and tough question having to do with data
>>> mining. Not sure that anyone would venture into finding out data
>>> patterns against a public endpoint just to be able to write queries
>>> for it.
>>>
>>> >
>>> >>
>>> >> When such structure exists, I'd say
>>> >> that the dataset has an *implicit* schema (or a conceptual model, if
>>> >> you will).
>>> >
>>> >
>>> > Well, that's where I don't follow. If data, as it happens more and
>>> more, is
>>> > gathered from heterogeneous sources, the very notion of a conceptual
>>> model
>>> > is jumping to conclusions.
>>>
>>> A merger of structures is still a structure. By anyways, I've already
>>> agreed to say patterns =)
>>>
>>> > In natural languages, patterns often precede the
>>> > grammar describing them, even if the patterns described in the grammar
>>> at
>>> > some point become prescriptive rules. Data should be looked at the
>>> same way.
>>>
>>> Not sure. I won't immediately disagree since I don't have statistics
>>> regarding structured/unstructured datasets out there.
>>>
>>> >>
>>> >> What is absent is an explicit representation of the schema,
>>> >> or the conceptual model, in terms of RDFS, OWL, or SKOS axioms.
>>> >
>>> >
>>> > When the dataset gathers various sources and various vocabularies,
>>> such a
>>> > schema does not exists, actually.
>>>
>>> Not necessarily. Parts of it may exist. Take yago, for example. It's
>>> derived from a bunch of sources including Wikipedia and GeoNames and
>>> yet offers its schema for a separate download.
>>>
>>> >> However, when the schema *is* represented explicitly, knowing it is a
>>> >> huge help to users which otherwise know little about the data.
>>> >
>>> >
>>> > OK, but the question is : which is a good format for exposing this
>>> > structure?
>>> > RDFS/OWL ontology/vocabulary, Application Profiles, RDF Shapes /
>>> whatever it
>>> > will be named, or ... ?
>>>
>>> I think this question is a bit secondary. If the need were recognized,
>>> this could be, at least in theory, agreed on.
>>>
>>> >>
>>> >> PPS. It'd also be correct to claim that even when a structure exists,
>>> >> realistic data can be messy and not fit into it entirely. We've seen
>>> >> stuff like literals in the range of object properties, etc. It's a
>>> >> separate issue having to do with validation, for which there's an
>>> >> ongoing effort at W3C. However, it doesn't generally hinder writing
>>> >> queries which is what we're discussing here.
>>> >
>>> >
>>> > Well I don't see it as a separate issue. All the raging debate around
>>> RDF
>>> > Shapes is not (yet) about validation, but on the definition of what a
>>> > shape/structure/schema can be.
>>>
>>> OK, won't disagree on this.
>>>
>>> Thanks,
>>> Pavel
>>>
>>> >
>>> >
>>> >>
>>> >> > Since the very notion of schema for RDF data has no meaning at all,
>>> >> > and the absence of schema is a bit frightening, people tend to give
>>> it a
>>> >> > lot
>>> >> > of possible meanings, depending on your closed world or open world
>>> >> > assumption, otherwise said if the "schema" will be used for some
>>> kind of
>>> >> > inference or validation. The use of "Schema" in RDFS has done
>>> nothing to
>>> >> > clarify this, and the use of "Ontology" in OWL added a layer of
>>> >> > confusion. I
>>> >> > tend to say "vocabulary" to name the set of types and predicates
>>> used by
>>> >> > a
>>> >> > dataset (like in Linked Open Vocabularies), which is a minimal
>>> >> > commitment to
>>> >> > how it is considered by the dataset owner, bearing in mind that this
>>> >> > "vocabulary" is generally a mix of imported terms from SKOS, FOAF,
>>> >> > Dublin
>>> >> > Core ... and home-made ones. Which is completely OK with the spirit
>>> of
>>> >> > RDF.
>>> >> >
>>> >> > The brand new LDOM [1] or whatever it ends up to be named at the
>>> end of
>>> >> > the
>>> >> > day might clarify the situation, or muddle those waters a bit more
>>> :)
>>> >> >
>>> >> > [1] http://spinrdf.org/ldomprimer.html
>>> >> >
>>> >> > 2015-01-23 10:37 GMT+01:00 Pavel Klinov <pavel.klinov@uni-ulm.de>:
>>> >> >>
>>> >> >> Alright, so this isn't an answer and I might be saying something
>>> >> >> totally silly (since I'm not a Linked Data person, really).
>>> >> >>
>>> >> >> If I re-phrase this question as the following: "how do I extract a
>>> >> >> schema from a SPARQL endpoint?", then it seems to pop up quite
>>> often
>>> >> >> (see, e.g., [1]). I understand that the original question is a bit
>>> >> >> more general but it's fair to say that knowing the schema is a huge
>>> >> >> help for writing meaningful queries.
>>> >> >>
>>> >> >> As an outsider, I'm quite surprised that there's still no commonly
>>> >> >> accepted (i'm avoiding "standard" here) way of doing this. People
>>> >> >> either hope that something like VoID or LOV vocabularies are being
>>> >> >> used, or use 3-party tools, or write all sorts of ad hoc SPARQL
>>> >> >> queries themselves, looking for types, object properties,
>>> >> >> domains/ranges etc-etc. There are also papers written on this
>>> subject.
>>> >> >>
>>> >> >> At the same time, the database engines which host datasets often
>>> (not
>>> >> >> always) manage the schema separately from the data. There're good
>>> >> >> reasons for that. One reason, for example, is to be able to support
>>> >> >> basic reasoning over the data, or integrity validation. Just
>>> because
>>> >> >> in RDF the schema language and the data language are the same, so
>>> >> >> schema and data triples can be interleaved, it need not (and often
>>> >> >> not) be managed that way.
>>> >> >>
>>> >> >> Yet, there's no standard way of requesting the schema from the
>>> >> >> endpoint, and I don't quite understand why. There's the SPARQL 1.1
>>> >> >> Service Description, which could, in theory, cover it, but it
>>> doesn't.
>>> >> >> Servicing such schema extraction requests doesn't have to be
>>> mandatory
>>> >> >> so the endpoints which don't have their schemas right there don't
>>> have
>>> >> >> to sift through the data. Also, schemas are typically quite small.
>>> >> >>
>>> >> >> I guess there's some problem with this which I'm missing...
>>> >> >>
>>> >> >> Thanks,
>>> >> >> Pavel
>>> >> >>
>>> >> >> [1]
>>> >> >>
>>> >> >>
>>> http://answers.semanticweb.com/questions/25696/extract-ontology-schema-for-a-given-sparql-endpoint-data-set
>>> >> >>
>>> >> >> On Thu, Jan 22, 2015 at 3:09 PM, Juan Sequeda <
>>> juanfederico@gmail.com>
>>> >> >> wrote:
>>> >> >> > Assume you are given a URL for a SPARQL endpoint. You have no
>>> idea
>>> >> >> > what
>>> >> >> > data
>>> >> >> > is being exposed.
>>> >> >> >
>>> >> >> > What do you do to explore that endpoint? What queries do you
>>> write?
>>> >> >> >
>>> >> >> > Juan Sequeda
>>> >> >> > +1-575-SEQ-UEDA
>>> >> >> > www.juansequeda.com
>>> >> >>
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >
>>> >
>>> > --
>>> > Bernard Vatant
>>> > Vocabularies & Data Engineering
>>> > Tel :  + 33 (0)9 71 48 84 59
>>> > Skype : bernard.vatant
>>> > http://google.com/+BernardVatant
>>> > --------------------------------------------------------
>>> > Mondeca
>>> > 35 boulevard de Strasbourg 75010 Paris
>>> > www.mondeca.com
>>> > Follow us on Twitter : @mondecanews
>>> > ----------------------------------------------------------
>>>
>>>
>>
>
>
> --
>
> Michael Uschold
>    Senior Ontology Consultant, Semantic Arts
>    http://www.semanticarts.com
>    LinkedIn: http://tr.im/limfu
>    Skype, Twitter: UscholdM
>
>
>


-- 

Michael Uschold
   Senior Ontology Consultant, Semantic Arts
   http://www.semanticarts.com
   LinkedIn: http://tr.im/limfu
   Skype, Twitter: UscholdM

Received on Wednesday, 4 February 2015 20:49:35 UTC