- From: Misha Wolf <Misha.Wolf@reuters.com>
- Date: Fri, 24 Mar 2006 14:01:51 +0000
- To: newsml-2@yahoogroups.com
- Cc: public-rdf-in-xhtml task force <public-rdf-in-xhtml-tf@w3.org>
- Message-id: <A29ADE959C70A1449470AA9A212F5D80016B99CE@LONSMSXM06.emea.ime.reuters.com>
The DC folks are having a discussion that parallels our distinction between @code and @literal, eg: <subject literal="George Walker Bush" /> vs: <scheme alias="wiki" uri="http://en.wikipedia.org/wiki/" /> [...] <subject code="wiki:43rd_President_of_the_United_States" /> Misha -----Original Message----- From: DCMI Architecture Group [mailto:DC-ARCHITECTURE@JISCMAIL.AC.UK <mailto:DC-ARCHITECTURE@JISCMAIL.AC.UK> ] On Behalf Of Pete Johnston Sent: 24 March 2006 13:06 To: DC-ARCHITECTURE@JISCMAIL.AC.UK Subject: Re: FW: Domains and ranges of DC properties Rachel Heery wrote: > I think it would be useful to clarify to this list the underlying purpose > of this exercise and the benefits. Or is this being done in a spirit of > 'enabling' unknown future benefits?? > > As I understand it the aim of the exercise is to make precise distinctions > about the "value space" of a property in machine-processable definitions. > And that this is being done to indicate what can be inferred when a > particular property is used in a triple. > > Can someone perhaps give some use cases of how this would be beneficial?? Mikael beat me to a reply, but as I've been writing this for the last hour or so, I'll send it anyway. I think I'm repeating some of what Mikael said, but I've had a stab at a more concrete example of how the rdfs:range data might be used. It seems to me the bottom line is, as Mikael said, that this information is _already_ part of the definition of the DCMI-owned properties, but it is currently accessible only to a human reader of the term description/definition (and even in the human-readable definitions sometimes it isn't as clear/unambiguous as it should be - though the DCMI Usage Board has gone to some pains to try to address that, I think). Including the RDFS assertions makes the information which is already there _explicit_ and accessible to an _application_. Why is that useful? One attempt at a "Use Case" : Problem: to resolve clearly the ambiguity over whether the value of dc:creator (etc) is the agent or the name of the agent. The DCMI definition says the value is the agent. Some metadata providers follow this definition; some metadata providers interpret the definition as allowing them to use the name of the agent, a literal, as the value. (And to make matters worse, DCMI has published specifications which license both approaches.) To an RDF application, those two data providers are "saying" two different things. The data providers think they are "saying" the same thing in two different ways. Consider a service provider consuming metadata and working across both those sets of data that they have harvested. Case 1: The service provider may not even be aware that these two patterns are in use. They set up a query to answer the question "Which resources were created by the agent called 'John Smith'?" and that query only returns half the results because they are catering only for one pattern. Not very satisfactory for the user of the service. Cheap, certainly, and maybe cheerful for the service provider in the short term, but probably not for the user who only sees half the results. Case 2: The service provider may be aware from analysing their harvested data that both these patterns are in use, but not that DCMI and the data providers expects them to be treated as equivalent. The service provider might choose to reject/ignore one of the two cases. How do they decide? Different service providers might make different choices. Again, not very satisfactory for the user of those services, who gets contradictory results. Still cheap-ish and moderately cheerful for the service provider, but the user gets less cheerful as they get different results from different services (and eventually vents their spleen on the service providers, who then aren't cheerful either). Case 3: The service provider may be aware that both these patterns are in use, and also that DCMI and the data providers expect them to be treated as equivalent. So the provider introduces some processing - processing specific to this group of properties - which treats the two cases as the same or maps one to the other. This introduces a cost to the service provider. They can't apply the generic rule, the rule they use for all the other RDF data they have harvested: they have to introduce "special case" rule for this specific property or set of properties. And every new service provider that comes along and wants to process the data has to (a) find out that these idiosyncrasies specific to this group of properties exist (how do they do that?) and (b) implement this special-case rule in their application. And the more options left open to the data providers - not just two patterns, but three, four, five etc - the more rules the service providers have to find out about and handle in their application, and the more complex and costly those applications become. OK, the user may now be fairly cheerful, but it's neither cheap nor cheerful for our service providers, who probably decide that working with DC metadata is just more hassle than it's worth! By introducing, say, an rdfs:range constraint for the dc:creator that says the range is a class of some:Agent, which is disjoint - "non-overlapping" - with the class rdfs:Literal, DCMI says unambiguously to my two metadata providers and - perhaps more importantly - to their two metadata creation applications that the value is the agent, not the name of the agent. It allows those two metadata creation applications to "say" the same thing when they create the metadata record. And it says unambiguously to the service provider and to their metadata processing application that the value is the agent not the literal name of the agent. So when that application processes some harvested data, it can detect contradictions between the data and DCMI's description of the property. In my case 2 above the best the application could do was say to the human administrator of the service "Look, I've got two (three, four five) sorts of thing as values for dc:creator here" - the Swoogle case, as Mikael pointed out - leaving the human administrator to go to read about dc:creator and try to work out what to do. With the rdfs:range information, the application can recognise "DCMI says the range of dc:creator is some:Agent which is disjoint with rdfs:Literal, so all these sorts of thing are also instances of some:Agent, that's fine - but ooh, look, Admin-Person, this set of data has literal values and contradicts that assertion". Our providing the rdfs:range data provides the basis for more clarity and consistency. It makes the work of service providers easier, and cheaper, and hopefully the end result is more cheerfulness all round! ;-) Pete -- Pete Johnston Research Officer (Interoperability) UKOLN, University of Bath, Bath BA2 7AY, UK tel: +44 (0)1225 383619 fax: +44 (0)1225 386838 mailto:p.johnston@ukoln.ac.uk <mailto:p.johnston@ukoln.ac.uk> http://www.ukoln.ac.uk/ukoln/staff/p.johnston/ <http://www.ukoln.ac.uk/ukoln/staff/p.johnston/> To find out more about Reuters visit www.about.reuters.com Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Reuters Ltd.
Received on Friday, 24 March 2006 14:08:18 UTC