Re: Is there real world RDF-S/OWL instance data? from Frank Manola on 2006-08-03 (semantic-web@w3.org from August 2006)

From: Frank Manola <fmanola@acm.org>
Date: Thu, 03 Aug 2006 17:44:17 -0400
To: bob@snee.com
CC: Sören Auer <auer@informatik.uni-leipzig.de>, semantic-web@w3.org
Message-ID: <44D26E31.4060302@acm.org>
Bob--

I guess I can imagine scenarios like the one you suggest: in which the 
basic motivation for hiding the instance data is better control over the 
use of a particular ontology, and this then creates the confidence to 
actually create the instance data.  But those weren't the cases I had in 
mind.

I'm thinking in terms of organizations that view the Semantic Web as 
basically an advanced interoperability mechanism.  With whom, then, 
might the organization want better interoperability?

They might want better interoperability (both now and in the future) 
among the applications and data they use for their own purely internal 
operations.  In that case, both the schemas (or ontologies) and the 
instance data will likely be kept private (e.g., it's none of your 
business what kinds of information I store about my own employees, let 
alone the information itself).

They might also want better interoperability for operations in which 
they need to interchange data with partners.  In that case, the schemas 
will need to be shared among the partners, and instance data (at least 
that required for the operations in question) as well.  Note that even 
though this instance data is shared among the partners, it's not 
necessarily going to be available to the general public (e.g., by doing 
an ordinary Web search).

The CIM electric utility example I mentioned is an example of this 
latter situation.  In this case, it was felt appropriate to develop the 
schemas as international standards, so obviously the schemas are 
available to the general public (or at least to anyone who wants to pay 
for the standards;  they may not be online free somewhere).  Given the 
standard, anyone who wants to can develop instance data conforming to it 
(I suppose you could describe the electrical equipment in your house). 
But I can see why an electric utility might think it perfectly 
reasonable to share information on the specific pieces of equipment it 
has in its grid (described according to the standard schemas) with 
another utility it has a power-sharing agreement with, and at the same 
time think it was none of *my* business what that equipment is (I might 
be a terrorist trying to plan blowing it up, for example).

Similarly, I can think of lots of other examples (the patient 
information example is one) where the *kinds* of things about which data 
is recorded (i.e., the information that would be contained in an 
ontology) is generally a matter of public knowledge, but the specific 
instances need to be restricted to those who "need to know" according to 
one definition or another.

Finally, of course, an organization (or an individual) might want better 
interoperability with the general public.  The kinds of data that appear 
on the current Web fall into this category (or could fall into it), 
e.g., product catalogs, airline schedules, public geographic 
information, as well as extensions of it  such as individual calendars 
(e.g., for scheduling appointments), etc.  I'd certainly like to see 
much more of that data available in RDF.  At the same time, though, that 
  data will probably be only the tip of the iceberg, with lots of 
supporting data remaining hidden (although there may be some synergy 
between the "external" interoperability and the "internal" 
interoperability aspects for a given organization).

For example, the same airline that makes its flight schedules available 
to the public on the (public) Semantic Web will still probably keep its 
aircraft inventory and employee information private, even if it's 
represented in RDF.  Similarly, the same power company that restricts 
its grid configuration to its partners (even though it's in RDF) could 
make its billing information available to its customers on the Semantic 
Web (presumably restricting access on a per-customer basis).

Of course, an implicit part of this discussion is whether this sort of 
hidden stuff is really part of the Semantic Web.  I tend to think that 
it is, but with really primitive access controls (in many cases, 
physical disconnection), but I can see where others might disagree. 
Needless to say, we need some more work on access controls (and related 
definitions of Semantic-Web-connectivity).

--Frank


Bob DuCharme wrote:
> On Wed, August 2, 2006 12:50 pm, Frank Manola wrote:
>> I appreciate that your request was for publicly-available instance data,
>> but as a possibly-minor footnote to this thread (especially given its
>> title), I'd note (again) that the *publicly-available* instance data
>> doesn't necessarily include all *real world* instance data.
> 
> Frank,
> 
> I find it perfectly plausible that there is more RDF behind firewalls than
> publicly available (although of course we can't be sure), but I was
> curious: do you have any ideas about why this is so? Could it be because a
> system limited to use within one enterprise makes it easier to impose more
> top-down control over the use of a particular ontology, and that this
> greater control gives people more incentive to follow through on a project
> involving the creation and use of large amounts of RDF data?
> 
> thanks,
> 
> Bob
>
Received on Thursday, 3 August 2006 21:37:07 UTC