Re: Recommendations: specificity from Jodi Schneider on 2011-03-29 (public-lld@w3.org from March 2011)

From: Jodi Schneider <jodi.schneider@deri.org>
Date: Tue, 29 Mar 2011 15:35:53 +0100
To: Diane I. Hillmann <metadata.maven@GMAIL.COM>
Cc: Karen Coyle <kcoyle@kcoyle.net>, public-lld <public-lld@w3.org>
Message-Id: <924EBFCD-7E0F-41E6-A596-D8C4C7CF8091@deri.org>
Useful thoughts, Diane!

On 29 Mar 2011, at 15:28, Diane I. Hillmann wrote:

> On 3/29/11 8:02 AM, Jodi Schneider wrote:
>> Two sharing issues--about audience and about deduplication--occurred to me as I was reading Richard Light's post. We need: (1) Mechanisms to record the audience of descriptions and to deliver the appropriate description. Customization of records is likely to be needed for a long time into the future. Audience considerations are always important, and say, the description may depend on whether the collection is a children's book library for teachers, or a collection of children's novels for early readers. Or whether the collection is aimed at specialist astronomers or in a university general science collection. (Besides audience this could also depend on other factors, i.e. on my mobile I'll want a brief description, yet if I have more screenspace I might want as much info as will fit.) (2) Mechanisms for ensuring records are not overwritten or destroyed nefariously (when I think of "one catalog to rule them all" I worry about censorship becoming easier when there's only one hub for records, and how to ensure that "lots of copies keep stuff safe"). At the same time we need to avoid the many downsides which currently accompany multiplicity and duplication! -Jodi
> I think these are important issues, but perhaps we should turn these ideas around, and come at them not from the 'top' (e.g., the intention of the data creators), but from the provenance of the creators themselves, that should be part of the statements they provide. For instance, you should be able to determine intended audience from who provided the description, whether (in your example of children's books) the description comes from a publisher or an academic department training teachers. We know from past experience that the data creator's notion of who they're aiming at in terms of audience is necessarily incomplete--they have little idea about the needs of anyone outside their limited context, so depending on them to define target audiences is probably an exercise in futility.

So then this becomes:
- need ways of determining the appropriate description (ideally without the viewer specifying it directly)
- need ways of mapping characteristics of describers to characteristics of the appropriate viewers


> As for (2), we should be talking about how unnecessary the whole idea of de-duplication becomes in a world where statement-level data is the norm.  

Ok, then this becomes an issue of detecting when two statements are the same.

> The number and diversity of statements is important information when evaluating the usefulness of data, particularly in a machine environment.  If you have, for instance, 10 statements about the format of an item and 9 of them agree, is that not useful?

Sure, but also "the majority is always wrong" (i.e. we need more sophisticated ways to track authenticity, provenance, likely correctness)

> The duplication here supports the validity of those 9 statements that agree.  And, particularly when we're talking about a world with numerous points of view, accepting all of the available statements as part of an overall description of a resource gives us far more to work with, and if we know where those statements came from, we can provide either a targeted description to a particular set of users or something broader for others. When we look ahead to a world where we are using machines to assist us in interpreting and improving data, surely we should be thinking that more is better?  Why, in the current environment where storage is cheap would we seek to delete or overwrite information?  Old habits die hard, certainly, but we should be challenging them where we can.

Then tracking what the "best" data is (and having algorithms for detecting it) will be important -- because "most recent" != "best"

-Jodi

> 
> Diane
Received on Tuesday, 29 March 2011 14:36:25 UTC