- From: Lushan Han <lushan1@umbc.edu>
- Date: Wed, 4 Feb 2015 11:35:46 -0800
- To: Pavel Klinov <pavel.klinov@uni-ulm.de>
- Cc: Bernard Vatant <bernard.vatant@mondeca.com>, Juan Sequeda <juanfederico@gmail.com>, Semantic Web <semantic-web@w3.org>, public-lod public <public-lod@w3.org>
- Message-ID: <CAOyMU3hYa0c+0mmtPv8qfNy=x+GAzv1C=rzTaQcKcKfyicc0wg@mail.gmail.com>
This work [1] might be helpful to some people. It automatically learns a "schema" from a given RDF dataset, including most probable classes and properties and most probable relations/paths between given classes and etc. Next, it can automatically translate a casual user's intuitive graph query or schema-free query to a formal SPARQL query using the learned schema and statistical NLP techniques, like textual semantic similarity. [1] http://ebiquity.umbc.edu/paper/html/id/658/Schema-Free-Querying-of-Semantic-Data Cheers, Lushan On Sun, Jan 25, 2015 at 11:32 PM, Pavel Klinov <pavel.klinov@uni-ulm.de> wrote: > On Sun, Jan 25, 2015 at 11:44 PM, Bernard Vatant > <bernard.vatant@mondeca.com> wrote: > > Hi Pavel > > > > Very interesting discussion, thanks for the follow-up. Some quick answers > > below, but I'm currently writing a blog post which will go in more > details > > on the notion of Data Patterns, a term I've been pushing last week on > the DC > > Architecture list, where it seems to have gained some traction. > > See > https://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind1501&L=dc-architecture > > for the discussion. > > OK, thanks for the link, will check it out. I agree that the patterns > is perhaps a better term than "schema" since by the latter people > typically mean explicit specification. I guess it's my use of the term > "schema" which created some confusion initially. > > >> ... which reflects what the > >> data is all about. Knowing such structure is useful (and often > >> necessary) to be able to write meaningful queries and that's, I think, > >> what the initial question was. > > > > > > Certainly, and I would rewrite this question : How do you find out data > > patterns in a dataset? > > I think it's a more general and tough question having to do with data > mining. Not sure that anyone would venture into finding out data > patterns against a public endpoint just to be able to write queries > for it. > > > > >> > >> When such structure exists, I'd say > >> that the dataset has an *implicit* schema (or a conceptual model, if > >> you will). > > > > > > Well, that's where I don't follow. If data, as it happens more and more, > is > > gathered from heterogeneous sources, the very notion of a conceptual > model > > is jumping to conclusions. > > A merger of structures is still a structure. By anyways, I've already > agreed to say patterns =) > > > In natural languages, patterns often precede the > > grammar describing them, even if the patterns described in the grammar at > > some point become prescriptive rules. Data should be looked at the same > way. > > Not sure. I won't immediately disagree since I don't have statistics > regarding structured/unstructured datasets out there. > > >> > >> What is absent is an explicit representation of the schema, > >> or the conceptual model, in terms of RDFS, OWL, or SKOS axioms. > > > > > > When the dataset gathers various sources and various vocabularies, such a > > schema does not exists, actually. > > Not necessarily. Parts of it may exist. Take yago, for example. It's > derived from a bunch of sources including Wikipedia and GeoNames and > yet offers its schema for a separate download. > > >> However, when the schema *is* represented explicitly, knowing it is a > >> huge help to users which otherwise know little about the data. > > > > > > OK, but the question is : which is a good format for exposing this > > structure? > > RDFS/OWL ontology/vocabulary, Application Profiles, RDF Shapes / > whatever it > > will be named, or ... ? > > I think this question is a bit secondary. If the need were recognized, > this could be, at least in theory, agreed on. > > >> > >> PPS. It'd also be correct to claim that even when a structure exists, > >> realistic data can be messy and not fit into it entirely. We've seen > >> stuff like literals in the range of object properties, etc. It's a > >> separate issue having to do with validation, for which there's an > >> ongoing effort at W3C. However, it doesn't generally hinder writing > >> queries which is what we're discussing here. > > > > > > Well I don't see it as a separate issue. All the raging debate around RDF > > Shapes is not (yet) about validation, but on the definition of what a > > shape/structure/schema can be. > > OK, won't disagree on this. > > Thanks, > Pavel > > > > > > >> > >> > Since the very notion of schema for RDF data has no meaning at all, > >> > and the absence of schema is a bit frightening, people tend to give > it a > >> > lot > >> > of possible meanings, depending on your closed world or open world > >> > assumption, otherwise said if the "schema" will be used for some kind > of > >> > inference or validation. The use of "Schema" in RDFS has done nothing > to > >> > clarify this, and the use of "Ontology" in OWL added a layer of > >> > confusion. I > >> > tend to say "vocabulary" to name the set of types and predicates used > by > >> > a > >> > dataset (like in Linked Open Vocabularies), which is a minimal > >> > commitment to > >> > how it is considered by the dataset owner, bearing in mind that this > >> > "vocabulary" is generally a mix of imported terms from SKOS, FOAF, > >> > Dublin > >> > Core ... and home-made ones. Which is completely OK with the spirit of > >> > RDF. > >> > > >> > The brand new LDOM [1] or whatever it ends up to be named at the end > of > >> > the > >> > day might clarify the situation, or muddle those waters a bit more :) > >> > > >> > [1] http://spinrdf.org/ldomprimer.html > >> > > >> > 2015-01-23 10:37 GMT+01:00 Pavel Klinov <pavel.klinov@uni-ulm.de>: > >> >> > >> >> Alright, so this isn't an answer and I might be saying something > >> >> totally silly (since I'm not a Linked Data person, really). > >> >> > >> >> If I re-phrase this question as the following: "how do I extract a > >> >> schema from a SPARQL endpoint?", then it seems to pop up quite often > >> >> (see, e.g., [1]). I understand that the original question is a bit > >> >> more general but it's fair to say that knowing the schema is a huge > >> >> help for writing meaningful queries. > >> >> > >> >> As an outsider, I'm quite surprised that there's still no commonly > >> >> accepted (i'm avoiding "standard" here) way of doing this. People > >> >> either hope that something like VoID or LOV vocabularies are being > >> >> used, or use 3-party tools, or write all sorts of ad hoc SPARQL > >> >> queries themselves, looking for types, object properties, > >> >> domains/ranges etc-etc. There are also papers written on this > subject. > >> >> > >> >> At the same time, the database engines which host datasets often (not > >> >> always) manage the schema separately from the data. There're good > >> >> reasons for that. One reason, for example, is to be able to support > >> >> basic reasoning over the data, or integrity validation. Just because > >> >> in RDF the schema language and the data language are the same, so > >> >> schema and data triples can be interleaved, it need not (and often > >> >> not) be managed that way. > >> >> > >> >> Yet, there's no standard way of requesting the schema from the > >> >> endpoint, and I don't quite understand why. There's the SPARQL 1.1 > >> >> Service Description, which could, in theory, cover it, but it > doesn't. > >> >> Servicing such schema extraction requests doesn't have to be > mandatory > >> >> so the endpoints which don't have their schemas right there don't > have > >> >> to sift through the data. Also, schemas are typically quite small. > >> >> > >> >> I guess there's some problem with this which I'm missing... > >> >> > >> >> Thanks, > >> >> Pavel > >> >> > >> >> [1] > >> >> > >> >> > http://answers.semanticweb.com/questions/25696/extract-ontology-schema-for-a-given-sparql-endpoint-data-set > >> >> > >> >> On Thu, Jan 22, 2015 at 3:09 PM, Juan Sequeda < > juanfederico@gmail.com> > >> >> wrote: > >> >> > Assume you are given a URL for a SPARQL endpoint. You have no idea > >> >> > what > >> >> > data > >> >> > is being exposed. > >> >> > > >> >> > What do you do to explore that endpoint? What queries do you write? > >> >> > > >> >> > Juan Sequeda > >> >> > +1-575-SEQ-UEDA > >> >> > www.juansequeda.com > >> >> > >> > > >> > > >> > > >> > > > > > > > -- > > Bernard Vatant > > Vocabularies & Data Engineering > > Tel : + 33 (0)9 71 48 84 59 > > Skype : bernard.vatant > > http://google.com/+BernardVatant > > -------------------------------------------------------- > > Mondeca > > 35 boulevard de Strasbourg 75010 Paris > > www.mondeca.com > > Follow us on Twitter : @mondecanews > > ---------------------------------------------------------- > >
Received on Wednesday, 4 February 2015 19:39:23 UTC