Re: Good grammar and proper footnotes for data

Hi Tom,

As much as I agree with the points you make, I am worried that using the 
language metaphor may cause confusion (yes I am aware I also did it at 
my thesis presentation, but that was to explain RDF to my mother ;).

At whom is your explanation directed? Why would a more technical / dry 
description (i.e. "URIs are used to globally identify and access concept 
descriptions; the publisher is responsible for providing useful 
information at that location, e.g. relate it to other concepts") not work?

For example, the word "grammar" invokes the idea of constraints on data, 
while this is an aspect RDF lacks (and which is becoming problematic now 
as you know from the issues with DCAM and Application Profiles in RDF/OWL).

The word "footnotes" is confusing to me as at least in Dutch this refers 
to small textual commentaries inserted at the bottom of pages, not to 
references to papers etc.

Depending on your audience this might (not) matter.

Best,
Mark.

On 18/10/2010 4:41, Thomas Baker wrote:
> Dear all,
>
> Some thoughts on my own motivation for pursuing the cause
> of linked data -- "unexamined assumptions" expressed here in
> strong terms to encourage discussion :-)
>
> I'm wondering how many of you agree that RDF is a language of
> data -- the only such language we have with any traction --
> and that URIs are the footnotes for data in the Web age?
>
> Science and scholarship are founded on footnotes, and in
> a sense, libraries were built to support the integrity and
> longevity of footnotes.  Good grammar and proper footnotes --
> what's not to like?  Can we agree on enough of the principles
> here to work them into the case for library linked data?
>
> Tom
>
>
>
> RDF is the grammar for a language of data.  URIs are the words
> of that language.  As in natural language, these words (i.e.,
> the URIs) belong to grammatical categories.  RDF properties
> (such as "isReferencedBy") function a bit like verbs, RDF
> classes like nouns.
>
> As in natural languages, where utterances are meaningful only
> if they follow a sentence grammar, RDF statements follow a
> simple and consistent three-part grammar of subject, predicate,
> and object.  Analogously to paragraphs, RDF statements are
> aggregated into RDF graphs.
>
> Aside from being words in the language of data, URIs double
> as footnotes.  As footnotes they indicate the maintenance
> responsibility for words by way of ownership of the domain
> names under which the URIs were coined, as recorded in the
> globally managed Domain Name Service (DNS).  Inasmuch the URIs
> of words lead to documentation of official definitions, the
> Web itself provides the language of data with its dictionary.
>
> The fifteen elements of Dublin Core have been likened to a
> "pidgin" -- a lexicon of generic predicates good enough for the
> sort of rudimentary but serviceable communication that occurs
> between speakers of different languages.  Just as pidgins
> are inadequate for more subtle or differentiated expression,
> a healthy ecosystem of RDF vocabularies needs to include
> more specialized vocabularies for use by social or scholarly
> communities of discourse among themselves.
>
> RDF is a language designed by humans for processing
> by machines.  The RDF language -- the grammar together
> with available RDF vocabularies -- does not itself solve
> the difficulties of human communication any more than
> the prevalence of English guarantees world understanding.
> However, RDF does support the process of connecting dots --
> of creating "knowledge" -- by providing a linguistic basis for
> expressing and linking data.
>
> Just as English as a second language provides a basis for
> communication among non-native English speakers, RDF provides
> a common second language into which local data formats can be
> translated and exposed.  Just as English is useful without
> being the best of all possible grammars, RDF happens to be
> what we currently have -- the only general-purpose language
> for data with any traction.  But just as English grammar
> follows deep linguistic structures determined by the human
> capacity for language, it is likely that RDF, if re-invented,
> would end up strongly resembling what we currently have.
>
> Aside from supporting data interchange in the here and now, RDF
> provides a response to the ongoing and inevitable obsolescence
> of computer applications and customized data formats by
> expressing knowledge using a well-understood grammar and citing
> publicly documented vocabularies and resource URIs.  In this
> sense, it supports data that does not require additional
> out-of-band information for its interpretation, i.e., data
> that "speaks for itself".  This assumes, of course, that
> our cultural memory institutions will deploy robust methods
> for preserving the parts of the Web where the underlying RDF
> vocabularies and resource identifiers are documented.
>
> We are in the midst of a rapid shift from a world in which
> information was predominantly print-based to one in which it is
> predominantly digital.  The scale and speed of transformation
> virtually guarantees that any computer applications and user
> interfaces we use today will at some point, probably soon,
> be superseded.  Data that cannot speak for itself will be more
> vulnerable to becoming irrelevant.
>
> Not only is data expected to be linkable in the present,
> but we hope they will be remain intelligible in the future.
> In 2010, to put information into ad-hoc data formats in
> the absence of well-defined interpretations as RDF triples
> is like making statements without grammar.  Creating data
> without URIs is like writing without proper footnotes.
> This is okay for information with a short shelf life --
> i.e., most information -- but information of lasting cultural
> significance deserves better.  Cultural memory institutions
> live by the ethos of scholarship, by which things like good
> grammar and proper footnotes should really matter. The language
> of RDF represents the application of that ethos to data itself.
>
>

Received on Tuesday, 19 October 2010 12:53:04 UTC