Re: Is there real world RDF-S/OWL instance data? from Harry Halpin on 2006-08-03 (semantic-web@w3.org from August 2006)

From: Harry Halpin <hhalpin@ibiblio.org>
Date: Thu, 03 Aug 2006 23:32:59 +0100
To: semantic-web@w3.org
Message-ID: <44D2799B.3080608@ibiblio.org>
Everyone,
   I think the truth of the matter should be rather obvious: While the
Semantic Web is lifting off, it's hard to find *public* instance data
because there's not too much of it. There may be large amounts of
private instance data being used inside firewalls in enterprises and
governments. That's a reasonable thing to be true but hard to verify in
practice. Second, there is a large incentive for corporations not to
make their data publically available as RDF - because often, it is the
data and the display of it with ads that is making them money.

   However, I would say the future of public instance data for RDF looks
very bright. First, the W3C is taking a proactive stance: the new W3C
GRDDL Working Group in co-operation with the microformats community
should allow lots more instance data to be generated [1]. There's also
the RDF/A initiative, which with groups like Creative Commons on-board
should drastically increase the amount of RDF instance data [2]. Second,
if the Web 2.0 bubble does burst or slow down (which it will), then it
makes sense for much of that data to be released to the public. After
all, the overall trend towards data decentralization and universal
formats seems to be a good thing, and the Semantic Web is the logical
conclusion. In the mean-time, mark-up your homepage with RDF/A or
microformats+GRDDL - and think about how Semantic Web "2.0" applications
can make use of this sort of data.

             cheers,
                   harry

[1] http://www.w3.org/2001/sw/grddl-wg/
[2] http://www.w3.org/TR/xhtml-rdfa-primer/
                                          


Frank Manola wrote:
>
> Bob--
>
> I guess I can imagine scenarios like the one you suggest: in which the
> basic motivation for hiding the instance data is better control over
> the use of a particular ontology, and this then creates the confidence
> to actually create the instance data.  But those weren't the cases I
> had in mind.
>
> I'm thinking in terms of organizations that view the Semantic Web as
> basically an advanced interoperability mechanism.  With whom, then,
> might the organization want better interoperability?
>
> They might want better interoperability (both now and in the future)
> among the applications and data they use for their own purely internal
> operations.  In that case, both the schemas (or ontologies) and the
> instance data will likely be kept private (e.g., it's none of your
> business what kinds of information I store about my own employees, let
> alone the information itself).
>
> They might also want better interoperability for operations in which
> they need to interchange data with partners.  In that case, the
> schemas will need to be shared among the partners, and instance data
> (at least that required for the operations in question) as well.  Note
> that even though this instance data is shared among the partners, it's
> not necessarily going to be available to the general public (e.g., by
> doing an ordinary Web search).
>
> The CIM electric utility example I mentioned is an example of this
> latter situation.  In this case, it was felt appropriate to develop
> the schemas as international standards, so obviously the schemas are
> available to the general public (or at least to anyone who wants to
> pay for the standards;  they may not be online free somewhere).  Given
> the standard, anyone who wants to can develop instance data conforming
> to it (I suppose you could describe the electrical equipment in your
> house). But I can see why an electric utility might think it perfectly
> reasonable to share information on the specific pieces of equipment it
> has in its grid (described according to the standard schemas) with
> another utility it has a power-sharing agreement with, and at the same
> time think it was none of *my* business what that equipment is (I
> might be a terrorist trying to plan blowing it up, for example).
>
> Similarly, I can think of lots of other examples (the patient
> information example is one) where the *kinds* of things about which
> data is recorded (i.e., the information that would be contained in an
> ontology) is generally a matter of public knowledge, but the specific
> instances need to be restricted to those who "need to know" according
> to one definition or another.
>
> Finally, of course, an organization (or an individual) might want
> better interoperability with the general public.  The kinds of data
> that appear on the current Web fall into this category (or could fall
> into it), e.g., product catalogs, airline schedules, public geographic
> information, as well as extensions of it  such as individual calendars
> (e.g., for scheduling appointments), etc.  I'd certainly like to see
> much more of that data available in RDF.  At the same time, though,
> that  data will probably be only the tip of the iceberg, with lots of
> supporting data remaining hidden (although there may be some synergy
> between the "external" interoperability and the "internal"
> interoperability aspects for a given organization).
>
> For example, the same airline that makes its flight schedules
> available to the public on the (public) Semantic Web will still
> probably keep its aircraft inventory and employee information private,
> even if it's represented in RDF.  Similarly, the same power company
> that restricts its grid configuration to its partners (even though
> it's in RDF) could make its billing information available to its
> customers on the Semantic Web (presumably restricting access on a
> per-customer basis).
>
> Of course, an implicit part of this discussion is whether this sort of
> hidden stuff is really part of the Semantic Web.  I tend to think that
> it is, but with really primitive access controls (in many cases,
> physical disconnection), but I can see where others might disagree.
> Needless to say, we need some more work on access controls (and
> related definitions of Semantic-Web-connectivity).
>
> --Frank
>
>
> Bob DuCharme wrote:
>> On Wed, August 2, 2006 12:50 pm, Frank Manola wrote:
>>> I appreciate that your request was for publicly-available instance
>>> data,
>>> but as a possibly-minor footnote to this thread (especially given its
>>> title), I'd note (again) that the *publicly-available* instance data
>>> doesn't necessarily include all *real world* instance data.
>>
>> Frank,
>>
>> I find it perfectly plausible that there is more RDF behind firewalls
>> than
>> publicly available (although of course we can't be sure), but I was
>> curious: do you have any ideas about why this is so? Could it be
>> because a
>> system limited to use within one enterprise makes it easier to impose
>> more
>> top-down control over the use of a particular ontology, and that this
>> greater control gives people more incentive to follow through on a
>> project
>> involving the creation and use of large amounts of RDF data?
>>
>> thanks,
>>
>> Bob
>>
>


-- 
		-harry

Harry Halpin,  University of Edinburgh 
http://www.ibiblio.org/hhalpin 6B522426
Received on Thursday, 3 August 2006 22:33:33 UTC