W3C home > Mailing lists > Public > semantic-web@w3.org > July 2014

Re: More on: Should information be merged from several RDF files?

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 7 Jul 2014 21:57:53 -0500
Cc: SW-forum Web <semantic-web@w3.org>
Message-Id: <9965B241-ED9B-497A-9403-8C20041B73FD@ihmc.us>
To: Victor Porton <porton@narod.ru>

On Jul 7, 2014, at 4:17 PM, Victor Porton <porton@narod.ru> wrote:

> I am writing a program.
> I read RDF files while executing my program.
> After each RDF loaded, my program does some actions (and probably terminates).
> It is not predictable which RDF file will be loaded next, because in intervals between loading RDF files my program does some computations and the next loaded RDF file depends on these computation.
> As such, I cannot first load all RDF files and merge information in them. Instead of this I need to load RDF files one-by-one and update my program data structure after reading each RDF file.
> If I would read all RDF files at once I would be able just to merge data from all RDF files. But I cannot do that.
> Upon reading each RDF file, I update internal data structures of my program based on RDF triples loaded.

So far, nothing you have said tells us why you are using RDF for this application. RDF is intended for use in transmitting assertional information across the Web, analogously with how HTML is designed to transmit hypertext. Does your application have any relationship to this kind of use? 

> I cannot base building these internal data structures of my program on the result of set-theoretic union of all RDF triples loaded till the moment. The reason for this is that loading an additional RDF may render my data inconsistent

Two points in response. 

First, this notion of 'inconsistent' which you are using is not the RDF notion of consistency. You are therefore, apparently, using some kind semantic extension of RDF. (See http://www.w3.org/TR/2014/REC-rdf11-mt-20140225/#semantic-extensions-and-entailment-regimes ) You might do well to try to describe this extension more precisely before proceeding. (The restriction you describe below is defined in the OWL semantic extension: it is the requirement that the predicate be a functional property.) 

Second, it is of the essence of RDF and RDF extensions that they can express inconsistencies. RDF users should be prepared to deal with clashes or inconsistencies between data items and have strategies for dealing with them. These might range form simply throwing an error, to a sophisticated truth-maintenance system which finds maximally consistent subsets of RDF triples. 

> (if it has two or more different objects for a predicate which should have no more than on value, as in an example below). So this would require removal of some data from my program data structures, what would aimlessly complicate the code. I want only to add new data structures, not remove them, to make my program easier.

With respect, this is rather like saying that I want to avoid doing arithmetic, so I want all my sums to be correct without having to add them up. RDF simply carries data to your code: if that data is faulty or more complicated than you would prefer it to be, don't blame RDF or seek to find an RDF magic bullet. 

> So the only remaining option is to load RDF one-by-one and construct new internal data structures of my program based only on the last loaded RDF file (not all loaded RDF files together).

You have decided to resolve contradictions by preferring the most-recently read data over 'older' data. This sounds like a possibly workable simplification, but I would not want to rely on it for anything important. 

> A question remains:
> # file-1.rdf
> <http://example.com> <#property-which-can-have-only-one-value> 1 .
> # file-2.rdf
> <http://example.com> <#property-which-can-have-only-one-value> 2 .
> Let we load first file-1.rdf and then later file-2.rdf. Should the triple from file-2.rdf be ignored? Or should I construct a new data structure from the data of both files, as if the subject URLs in these files would be different?

All of these are possible strategies for resolving conflicts. Nothing in RDF prefers one over the other. The choice is yours. Only someone who knows what your data means, and how it is created, would be able to make an intelligent decision here. There is no magic bullet. 

Pat Hayes

> Here is my project, by the way:
> http://freesoft.portonvictor.org/namespaces.xml
> --
> Victor Porton - http://portonvictor.org

IHMC                                     (850)434 8903 home
40 South Alcaniz St.            (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile (preferred)
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Tuesday, 8 July 2014 02:58:26 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 07:42:52 UTC