- From: Pavel Klinov <pavel.klinov@uni-ulm.de>
- Date: Mon, 26 Jan 2015 08:32:09 +0100
- To: Bernard Vatant <bernard.vatant@mondeca.com>
- Cc: Pavel Klinov <pavel.klinov@uni-ulm.de>, Juan Sequeda <juanfederico@gmail.com>, Semantic Web <semantic-web@w3.org>, public-lod public <public-lod@w3.org>
On Sun, Jan 25, 2015 at 11:44 PM, Bernard Vatant <bernard.vatant@mondeca.com> wrote: > Hi Pavel > > Very interesting discussion, thanks for the follow-up. Some quick answers > below, but I'm currently writing a blog post which will go in more details > on the notion of Data Patterns, a term I've been pushing last week on the DC > Architecture list, where it seems to have gained some traction. > See https://www.jiscmail.ac.uk/cgi-bin/webadmin?A1=ind1501&L=dc-architecture > for the discussion. OK, thanks for the link, will check it out. I agree that the patterns is perhaps a better term than "schema" since by the latter people typically mean explicit specification. I guess it's my use of the term "schema" which created some confusion initially. >> ... which reflects what the >> data is all about. Knowing such structure is useful (and often >> necessary) to be able to write meaningful queries and that's, I think, >> what the initial question was. > > > Certainly, and I would rewrite this question : How do you find out data > patterns in a dataset? I think it's a more general and tough question having to do with data mining. Not sure that anyone would venture into finding out data patterns against a public endpoint just to be able to write queries for it. > >> >> When such structure exists, I'd say >> that the dataset has an *implicit* schema (or a conceptual model, if >> you will). > > > Well, that's where I don't follow. If data, as it happens more and more, is > gathered from heterogeneous sources, the very notion of a conceptual model > is jumping to conclusions. A merger of structures is still a structure. By anyways, I've already agreed to say patterns =) > In natural languages, patterns often precede the > grammar describing them, even if the patterns described in the grammar at > some point become prescriptive rules. Data should be looked at the same way. Not sure. I won't immediately disagree since I don't have statistics regarding structured/unstructured datasets out there. >> >> What is absent is an explicit representation of the schema, >> or the conceptual model, in terms of RDFS, OWL, or SKOS axioms. > > > When the dataset gathers various sources and various vocabularies, such a > schema does not exists, actually. Not necessarily. Parts of it may exist. Take yago, for example. It's derived from a bunch of sources including Wikipedia and GeoNames and yet offers its schema for a separate download. >> However, when the schema *is* represented explicitly, knowing it is a >> huge help to users which otherwise know little about the data. > > > OK, but the question is : which is a good format for exposing this > structure? > RDFS/OWL ontology/vocabulary, Application Profiles, RDF Shapes / whatever it > will be named, or ... ? I think this question is a bit secondary. If the need were recognized, this could be, at least in theory, agreed on. >> >> PPS. It'd also be correct to claim that even when a structure exists, >> realistic data can be messy and not fit into it entirely. We've seen >> stuff like literals in the range of object properties, etc. It's a >> separate issue having to do with validation, for which there's an >> ongoing effort at W3C. However, it doesn't generally hinder writing >> queries which is what we're discussing here. > > > Well I don't see it as a separate issue. All the raging debate around RDF > Shapes is not (yet) about validation, but on the definition of what a > shape/structure/schema can be. OK, won't disagree on this. Thanks, Pavel > > >> >> > Since the very notion of schema for RDF data has no meaning at all, >> > and the absence of schema is a bit frightening, people tend to give it a >> > lot >> > of possible meanings, depending on your closed world or open world >> > assumption, otherwise said if the "schema" will be used for some kind of >> > inference or validation. The use of "Schema" in RDFS has done nothing to >> > clarify this, and the use of "Ontology" in OWL added a layer of >> > confusion. I >> > tend to say "vocabulary" to name the set of types and predicates used by >> > a >> > dataset (like in Linked Open Vocabularies), which is a minimal >> > commitment to >> > how it is considered by the dataset owner, bearing in mind that this >> > "vocabulary" is generally a mix of imported terms from SKOS, FOAF, >> > Dublin >> > Core ... and home-made ones. Which is completely OK with the spirit of >> > RDF. >> > >> > The brand new LDOM [1] or whatever it ends up to be named at the end of >> > the >> > day might clarify the situation, or muddle those waters a bit more :) >> > >> > [1] http://spinrdf.org/ldomprimer.html >> > >> > 2015-01-23 10:37 GMT+01:00 Pavel Klinov <pavel.klinov@uni-ulm.de>: >> >> >> >> Alright, so this isn't an answer and I might be saying something >> >> totally silly (since I'm not a Linked Data person, really). >> >> >> >> If I re-phrase this question as the following: "how do I extract a >> >> schema from a SPARQL endpoint?", then it seems to pop up quite often >> >> (see, e.g., [1]). I understand that the original question is a bit >> >> more general but it's fair to say that knowing the schema is a huge >> >> help for writing meaningful queries. >> >> >> >> As an outsider, I'm quite surprised that there's still no commonly >> >> accepted (i'm avoiding "standard" here) way of doing this. People >> >> either hope that something like VoID or LOV vocabularies are being >> >> used, or use 3-party tools, or write all sorts of ad hoc SPARQL >> >> queries themselves, looking for types, object properties, >> >> domains/ranges etc-etc. There are also papers written on this subject. >> >> >> >> At the same time, the database engines which host datasets often (not >> >> always) manage the schema separately from the data. There're good >> >> reasons for that. One reason, for example, is to be able to support >> >> basic reasoning over the data, or integrity validation. Just because >> >> in RDF the schema language and the data language are the same, so >> >> schema and data triples can be interleaved, it need not (and often >> >> not) be managed that way. >> >> >> >> Yet, there's no standard way of requesting the schema from the >> >> endpoint, and I don't quite understand why. There's the SPARQL 1.1 >> >> Service Description, which could, in theory, cover it, but it doesn't. >> >> Servicing such schema extraction requests doesn't have to be mandatory >> >> so the endpoints which don't have their schemas right there don't have >> >> to sift through the data. Also, schemas are typically quite small. >> >> >> >> I guess there's some problem with this which I'm missing... >> >> >> >> Thanks, >> >> Pavel >> >> >> >> [1] >> >> >> >> http://answers.semanticweb.com/questions/25696/extract-ontology-schema-for-a-given-sparql-endpoint-data-set >> >> >> >> On Thu, Jan 22, 2015 at 3:09 PM, Juan Sequeda <juanfederico@gmail.com> >> >> wrote: >> >> > Assume you are given a URL for a SPARQL endpoint. You have no idea >> >> > what >> >> > data >> >> > is being exposed. >> >> > >> >> > What do you do to explore that endpoint? What queries do you write? >> >> > >> >> > Juan Sequeda >> >> > +1-575-SEQ-UEDA >> >> > www.juansequeda.com >> >> >> > >> > >> > >> > > > > -- > Bernard Vatant > Vocabularies & Data Engineering > Tel : + 33 (0)9 71 48 84 59 > Skype : bernard.vatant > http://google.com/+BernardVatant > -------------------------------------------------------- > Mondeca > 35 boulevard de Strasbourg 75010 Paris > www.mondeca.com > Follow us on Twitter : @mondecanews > ----------------------------------------------------------
Received on Monday, 26 January 2015 07:32:42 UTC