Re: data smushing from Dan Brickley on 2001-01-03 (www-rdf-interest@w3.org from January 2001)

From: Dan Brickley <Daniel.Brickley@bristol.ac.uk>
Date: Wed, 3 Jan 2001 18:34:48 +0000 (GMT)
To: Seth Russell <seth@robustai.net>
cc: Bill dehOra <BdehOra@interx.com>, Dan Brickley <Daniel.Brickley@bristol.ac.uk>, David Megginson <david@megginson.com>, xml-dev <xml-dev@lists.xml.org>, www-rdf-interest <www-rdf-interest@w3.org>
Message-ID: <Pine.GSO.4.21.0101031825150.25071-100000@mail.ilrt.bris.ac.uk>

On Wed, 3 Jan 2001, Seth Russell wrote:

> > Isn't 'smushing' just unification hacking; am I missing something?
> 
> I'm not familiar with "unification hacking" but "smushing" is probably
> more commonly called "aggregation".  To my knowledge the term was
> first used by Dan Brickley as follows:
> 
> see:
> http://lists.w3.org/Archives/Public/www-rdf-interest/2000Dec/0191.html
> 
>   (2) 2nd pass node convergence ("data smushing")

It's just a cutesy term we've been using around here for databases that
"go one better than URIs" when aggregating diverse RDF information into
a common store.  Not worth over analysing! I'm wary of calling the
collection of hacks and heuristics that I'm using "unification", though
maybe that's what's going on. 

dan

ps. copied below an example scenario I'm playing with (haven't
checked the markup but drift should be clear...) excerpted from a
writeup in progress. Maybe examples of this kind will be useful to
clarify the behaviour of current RDF systems? 

Example Scenario

a.rdf 

<Company>
<corporateHomepage web:resource="http://megacorp.example.com/"/>
<name>MegaCorp Inc.</name>
<ticker>MEGA</ticker>
<owner>
 <Person>
 <name>Mr Mega</name>
 <personalMailbox web:resource="mailto:mega@megacorp.example.com"/>
 <personalHomepage web:resource="http://megacorp.example.com/~mega"/>
 <age>50</age> 
 </Person>
</owner>
</Company>

b.rdf 

<User>
 <personalMailbox web:resource="mailto:mega@megacorp.example.com"/>
 <technologyInterest web:resource="http://www.w3.org/XML/">
 <technologyInterest web:resource="http://www.w3.org/RDF/">
 <technologyInterest web:resource="http://www.mozilla.org/">
</User>

c.rdf 

<Organisation>
<corporateHomepage web:resource="http://megacorp.example.com/"/>
 <ethicalPolicy>
   <PolicyStatement web:about="http://dotherightthing.example.org/policy.xhtml"/>
   <title>Ethical Business Shared Guidelines 1.1</title>
   </PolicyStatement>
 </ethicalPolicy>
</Organisation>

notes...

We have three chunks of RDF/XML
data, each providing fragments of information that can usefully be
combined. Our goal is to combine this information
successfully despite the lack of common URIs for the key entities
described in the three RDF files. Whether we do this
at storage time, at query time, or some combination of both remains to
be discussed. 

I assume we have an interest in aggregating the statements encoded in
these three pieces of RDF/XML, so that we can answer questions such as
the following: 

       (Q1) What are the technology interests of persons who own
	companies that have an ethical policy
       committment to the policy stated in the document
	http://dotherightthing.example.org/policy.xhtml 

To answer such a question in the absence of URI names for the company
and person, we need to identify them using
other information that we have about those entities. In this scenario,
we exploit additional meta-information about some
of the properties used in our descriptions. Specifically, we make assume
that... personalMailbox, personalHomepage
and corporateHomepage are uniquely identifying properties. By this we
mean that, for any given value of one of these
properties, there exists 'at most one' resource with that
characteristic. We do not concern ourselves here with <em>how</em> we
know this, except to note that augmented RDF schemas might be used to represent such
interesting properties-of-properties.

Received on Wednesday, 3 January 2001 13:36:40 UTC