- From: Hugh Glaser <hugh@glasers.org>
- Date: Wed, 28 Nov 2018 21:58:16 +0000
- To: David Booth <david@dbooth.org>
- Cc: semantic-web@w3.org
Interesting. This may be slightly off direct topic, but it is about how developers (me, in the case) do things, so possibly relevant. I have moved to a slightly different way of doing things recently. In the context of building sites based on RDF data from a variety of sources. I used to process the stuff coming in, from csv, PDO, RDF or whatever, into the RDF I wanted as I imported it. Now, I am experimenting with doing it differently. I simply, and of course can quite quickly, convert the input into RDF using the most naive RDF structure possible. (At its simplest, for csv I would use a (constructed) URI for each row and a new predicate for each column, with the cell contents as object.) That is, the RDF is intended to capture all the source data, and only the source data. I stuff it in a Linked Data-enabled SPARQL endpoint (so that I have resolvable URIs for the source data records). If it is already in RDF, it may be that I can simply use the source servers themselves for this stage, if they provide the right services, and I am not breaking their terms of use doing a fair chunk of querying. Then I create a process I also call “lifting” - it is different in detail to yours, but performs a similar function. I lift the data in the primitive RDF into the RDF that I want. And then I put it into one of my nice, clean stores from which the sites will be built. (The actual organisation of store granularity depends on source data size and other things.) This seems to be great. My clean store has links to the source store records, which gives great provenance. I can also put links the other way, if I am actually using the source store for other things, which happens. I always know exactly what data I have acquired, and there is nothing (or little) hidden in the acquisition process. Any transformations are gathered into the one place of the lifting spec. I don’t have to look at bloody csv files or HTML source or whatever so much to work out what real RDF I want:- at that stage I look at my source RDF version using Linked Data (with a SPARQL endpoint as well) - what could be nicer than that? :-) Separation of concerns: I do the acquisition, and that is pretty much done; I can then experiment with exactly what RDF I want, without revisiting the source transformation. I can ignore any source data I don’t want. So, for example, with WikiData I can lose the fancy predicates and just keep the Direct ones, and for many sources I can discard all the variants of label predicates. Perhaps the biggest thing is: The transformations can use all the knowledge I have been given. Very often when importing, you get things as records, or pages or whatever, about a single resource. But each record will have IDs for other things that are referenced elsewhere in the source dataset. If you have processed all the source data into a graph, you may have a lot more information about that resource and related resources to make better decisions about sameAsness, bNode identifiers etc. Basically, RDF is a great resource for doing data cleaning! :-) Get everything into RDF as soon as possible - then you can really think about it. (Sorry if I reiterate stuff from your presentation, David, but I can’t see access it at the moment.) > On 28 Nov 2018, at 17:04, David Booth <david@dbooth.org> wrote: > > On 11/28/18 9:15 AM, Hugh Glaser wrote: > > RDF -> RDF [translation] is hugely important for building > > stuff, to remove stuff, or convert into preferred ontologies. > > Agreed. In my experience it's needed in almost every RDF > application. > > > . . If there were good tools to do this (or even one :-), > > or maybe there is), that integrated with what people use, > > would that be useful? > > Yes! I have often used SPARQL to perform RDF-->RDF > translation, though we also experimented with ShExMap and > JavaScript in a previous project. > > > That would encourage a library of transformation specs, > > such as dc->dct, xxx->skos etc. > > We also experimented with the idea of creating a mapping hub > for sharing translation rules. It was agnostic about the > "rules" language (including ShExMap and JavaScript), used > github for storing/sharing the rules themselves, and provided > a front-end for categorizing/finding existing translation > rules. The idea is described on slide 53 (also attached): > http://tinyurl.com/YosemiteRoadmap20150709slides > > We also built a rough POC (but don't expect it to be fully > functional): > https://mappinghub.github.io/ > I still think this mapping hub idea has a *lot* of merit. > > David Booth > <slide53.pdf>
Received on Wednesday, 28 November 2018 21:59:27 UTC