@code and @literal

The DC folks are having a discussion that parallels our distinction
between @code and @literal, eg:
 
  <subject literal="George Walker Bush" />
 
vs:
 
  <scheme alias="wiki" uri="http://en.wikipedia.org/wiki/" />
  [...]
  <subject code="wiki:43rd_President_of_the_United_States" />
 
Misha


-----Original Message-----
From: DCMI Architecture Group [mailto:DC-ARCHITECTURE@JISCMAIL.AC.UK
<mailto:DC-ARCHITECTURE@JISCMAIL.AC.UK> ] On Behalf Of Pete Johnston
Sent: 24 March 2006 13:06
To: DC-ARCHITECTURE@JISCMAIL.AC.UK
Subject: Re: FW: Domains and ranges of DC properties

Rachel Heery wrote:

> I think it would be useful to clarify to this list the underlying
purpose
> of this exercise and the benefits. Or is this being done in a spirit
of
> 'enabling' unknown future benefits??
>
> As I understand it the aim of the exercise is to make precise
distinctions
> about the "value space" of a property in machine-processable
definitions.
> And that this is being done to indicate what can be inferred when a
> particular property is used in a triple.
>
> Can someone perhaps give some use cases of how this would be
beneficial??

Mikael beat me to a reply, but as I've been writing this for the last
hour or so, I'll send it anyway. I think I'm repeating some of what
Mikael said, but I've had a stab at a more concrete example of how the
rdfs:range data might be used.

It seems to me the bottom line is, as Mikael said, that this information
is _already_ part of the definition of the DCMI-owned properties, but it
is currently accessible only to a human reader of the term
description/definition (and even in the human-readable definitions
sometimes it isn't as clear/unambiguous as it should be - though the
DCMI Usage Board has gone to some pains to try to address that, I
think).

Including the RDFS assertions makes the information which is already
there _explicit_ and accessible to an _application_.

Why is that useful?

One attempt at a "Use Case" :

Problem: to resolve clearly the ambiguity over whether the value of
dc:creator (etc) is the agent or the name of the agent.

The DCMI definition says the value is the agent. Some metadata providers
follow this definition; some metadata providers interpret the definition
as allowing them to use the name of the agent, a literal, as the value.
(And to make matters worse, DCMI has published specifications which
license both approaches.)

To an RDF application, those two data providers are "saying" two
different things. The data providers think they are "saying" the same
thing in two different ways.

Consider a service provider consuming metadata and working across both
those sets of data that they have harvested.

Case 1: The service provider may not even be aware that these two
patterns are in use. They set up a query to answer the question "Which
resources were created by the agent called 'John Smith'?" and that query
only returns half the results because they are catering only for one
pattern. Not very satisfactory for the user of the service. Cheap,
certainly, and maybe cheerful for the service provider in the short
term, but probably not for the user who only sees half the results.

Case 2: The service provider may be aware from analysing their harvested
data that both these patterns are in use, but not that DCMI and the data
providers expects them to be treated as equivalent. The service provider
might choose to reject/ignore one of the two cases. How do they decide?
Different service providers might make different choices. Again, not
very satisfactory for the user of those services, who gets contradictory
results. Still cheap-ish and moderately cheerful for the service
provider, but the user gets less cheerful as they get different results
from different services (and eventually vents their spleen on the
service providers, who then aren't cheerful either).

Case 3: The service provider may be aware that both these patterns are
in use, and also that DCMI and the data providers expect them to be
treated as equivalent. So the provider introduces some processing -
processing specific to this group of properties - which treats the two
cases as the same or maps one to the other. This introduces a cost to
the service provider. They can't apply the generic rule, the rule they
use for all the other RDF data they have harvested: they have to
introduce "special case" rule for this specific property or set of
properties. And every new service provider that comes along and wants to
process the data has to (a) find out that these idiosyncrasies specific
to this group of properties exist (how do they do that?) and (b)
implement this special-case rule in their application. And the more
options left open to the data providers - not just two patterns, but
three, four, five etc - the more rules the service providers have to
find out about and handle in their application, and the more complex and
costly those applications become. OK, the user may now be fairly
cheerful, but it's neither cheap nor cheerful for our service providers,
who probably decide that working with DC metadata is just more hassle
than it's worth!

By introducing, say, an rdfs:range constraint for the dc:creator that
says the range is a class of some:Agent, which is disjoint -
"non-overlapping" - with the class rdfs:Literal, DCMI says unambiguously
to my two metadata providers and - perhaps more importantly - to their
two metadata creation applications that the value is the agent, not the
name of the agent. It allows those two metadata creation applications to
"say" the same thing when they create the metadata record.

And it says unambiguously to the service provider and to their metadata
processing application that the value is the agent not the literal name
of the agent. So when that application processes some harvested data, it
can detect contradictions between the data and DCMI's description of the
property.

In my case 2 above the best the application could do was say to the
human administrator of the service "Look, I've got two (three, four
five) sorts of thing as values for dc:creator here" - the Swoogle case,
as Mikael pointed out - leaving the human administrator to go to read
about dc:creator and try to work out what to do.

With the rdfs:range information, the application can recognise "DCMI
says the range of dc:creator is some:Agent which is disjoint with
rdfs:Literal, so all these sorts of thing are also instances of
some:Agent, that's fine - but ooh, look, Admin-Person, this set of data
has literal values and contradicts that assertion". Our providing the
rdfs:range data provides the basis for more clarity and consistency. It
makes the work of service providers easier, and cheaper, and hopefully
the end result is more cheerfulness all round! ;-)

Pete
--
Pete Johnston
Research Officer (Interoperability)
UKOLN, University of Bath, Bath BA2 7AY, UK
tel: +44 (0)1225 383619    fax: +44 (0)1225 386838
mailto:p.johnston@ukoln.ac.uk <mailto:p.johnston@ukoln.ac.uk> 
http://www.ukoln.ac.uk/ukoln/staff/p.johnston/
<http://www.ukoln.ac.uk/ukoln/staff/p.johnston/> 



To find out more about Reuters visit www.about.reuters.com

Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd.

Received on Friday, 24 March 2006 14:08:18 UTC