Re: Good grammar and proper footnotes for data

Hi Tom,

I _really_ like this -- it's a highly useful way of looking at the RDF data
model. I'm no linguist or semanticist, but I would think that it's possible
to extend the notion of RDF as a "language of data" to being analogous to
one of many in the messy realm of all human languages. RDF could easily be
seen, and perhaps more fully understood, as the core 'language' that humans
and machines use to communicate their respective understanding of the
'world'.

"URIs as footnotes" and citations, "healthy ecosystem of RDF vocabularies",
RDF as a "common second language", "data that speaks for itself", RDF as an
application of "the ethos of scholarship.

Great stuff!

Jon

On Sun, Oct 17, 2010 at 10:41 PM, Thomas Baker <tbaker@tbaker.de> wrote:

> Dear all,
>
> Some thoughts on my own motivation for pursuing the cause
> of linked data -- "unexamined assumptions" expressed here in
> strong terms to encourage discussion :-)
>
> I'm wondering how many of you agree that RDF is a language of
> data -- the only such language we have with any traction --
> and that URIs are the footnotes for data in the Web age?
>
> Science and scholarship are founded on footnotes, and in
> a sense, libraries were built to support the integrity and
> longevity of footnotes.  Good grammar and proper footnotes --
> what's not to like?  Can we agree on enough of the principles
> here to work them into the case for library linked data?
>
> Tom
>
>
>
> RDF is the grammar for a language of data.  URIs are the words
> of that language.  As in natural language, these words (i.e.,
> the URIs) belong to grammatical categories.  RDF properties
> (such as "isReferencedBy") function a bit like verbs, RDF
> classes like nouns.
>
> As in natural languages, where utterances are meaningful only
> if they follow a sentence grammar, RDF statements follow a
> simple and consistent three-part grammar of subject, predicate,
> and object.  Analogously to paragraphs, RDF statements are
> aggregated into RDF graphs.
>
> Aside from being words in the language of data, URIs double
> as footnotes.  As footnotes they indicate the maintenance
> responsibility for words by way of ownership of the domain
> names under which the URIs were coined, as recorded in the
> globally managed Domain Name Service (DNS).  Inasmuch the URIs
> of words lead to documentation of official definitions, the
> Web itself provides the language of data with its dictionary.
>
> The fifteen elements of Dublin Core have been likened to a
> "pidgin" -- a lexicon of generic predicates good enough for the
> sort of rudimentary but serviceable communication that occurs
> between speakers of different languages.  Just as pidgins
> are inadequate for more subtle or differentiated expression,
> a healthy ecosystem of RDF vocabularies needs to include
> more specialized vocabularies for use by social or scholarly
> communities of discourse among themselves.
>
> RDF is a language designed by humans for processing
> by machines.  The RDF language -- the grammar together
> with available RDF vocabularies -- does not itself solve
> the difficulties of human communication any more than
> the prevalence of English guarantees world understanding.
> However, RDF does support the process of connecting dots --
> of creating "knowledge" -- by providing a linguistic basis for
> expressing and linking data.
>
> Just as English as a second language provides a basis for
> communication among non-native English speakers, RDF provides
> a common second language into which local data formats can be
> translated and exposed.  Just as English is useful without
> being the best of all possible grammars, RDF happens to be
> what we currently have -- the only general-purpose language
> for data with any traction.  But just as English grammar
> follows deep linguistic structures determined by the human
> capacity for language, it is likely that RDF, if re-invented,
> would end up strongly resembling what we currently have.
>
> Aside from supporting data interchange in the here and now, RDF
> provides a response to the ongoing and inevitable obsolescence
> of computer applications and customized data formats by
> expressing knowledge using a well-understood grammar and citing
> publicly documented vocabularies and resource URIs.  In this
> sense, it supports data that does not require additional
> out-of-band information for its interpretation, i.e., data
> that "speaks for itself".  This assumes, of course, that
> our cultural memory institutions will deploy robust methods
> for preserving the parts of the Web where the underlying RDF
> vocabularies and resource identifiers are documented.
>
> We are in the midst of a rapid shift from a world in which
> information was predominantly print-based to one in which it is
> predominantly digital.  The scale and speed of transformation
> virtually guarantees that any computer applications and user
> interfaces we use today will at some point, probably soon,
> be superseded.  Data that cannot speak for itself will be more
> vulnerable to becoming irrelevant.
>
> Not only is data expected to be linkable in the present,
> but we hope they will be remain intelligible in the future.
> In 2010, to put information into ad-hoc data formats in
> the absence of well-defined interpretations as RDF triples
> is like making statements without grammar.  Creating data
> without URIs is like writing without proper footnotes.
> This is okay for information with a short shelf life --
> i.e., most information -- but information of lasting cultural
> significance deserves better.  Cultural memory institutions
> live by the ethos of scholarship, by which things like good
> grammar and proper footnotes should really matter. The language
> of RDF represents the application of that ethos to data itself.
>
>
> --
> Tom Baker <tbaker@tbaker.de>
>
>


-- 
Jon

I check email just a couple of times daily; to reach me sooner, click here:
http://awayfind.com/jonphipps

Received on Monday, 18 October 2010 11:45:51 UTC