Re: Good grammar and proper footnotes for data from stu on 2010-10-19 (public-lld@w3.org from October 2010)

From: stu <stuart.weibel@gmail.com>
Date: Tue, 19 Oct 2010 09:01:32 -0700
To: Thomas Baker <tbaker@tbaker.de>
Cc: public-lld <public-lld@w3.org>
Message-ID: <AANLkTin1s-uOOb8jUtTxtW2qWL-kXp61TF-GZ3RS1geW@mail.gmail.com>
RDF isn't the language.  Its the syntax, the encoding of the grammar, no?

The language is the aggregate of the vocabularies, the grammar, and the
conventions and idioms of expression.  Many natural languages share a
similar grammar, but we don't call them the same language, we call them
English and German and French, etc.  The names of the languages are taken
from the common vocabularies, even though they carry all these components in
a whole.

DC is one of the vocabularies, RDF is one of the grammars available to us
for expressing concepts using vocabularies, and the conventions and idioms
are founded in the many communities (this diversity being a major impediment
to interoperability).

By the way, this analysis amplifies one the underlying impediments that has
hammered us over and over... trying to interoperate among 'languages' that
have related vocabularies, but really different underlying data models
(including, but not limited to, grammar).  DC and LOM, for example.  A huge
train wreck that could have so easily been avoided.

And where do OWL and SKOS fit in?  Do we need them in the model?

An historical note of almost no interest, but relevant to your statement:
" it is likely that RDF, if re-invented, would end up strongly resembling
what we currently have."  One of the early suggestions about the metadata
encoding approach that became RDF was that it would be implemented in either
Lisp or Prolog (I can't recall which, and its not so important, as both of
them have roots in the syllogistic deduction that characterizes RDF.  Thank
heaven THAT didn't happen ;-) !

I actually think it is worthwhile to explore these reductionist metaphors
for what we're trying to do... not sure you quite nailed it, Tom, but worth
polishing I think.

On Sun, Oct 17, 2010 at 7:41 PM, Thomas Baker <tbaker@tbaker.de> wrote:

> Dear all,
>
> Some thoughts on my own motivation for pursuing the cause
> of linked data -- "unexamined assumptions" expressed here in
> strong terms to encourage discussion :-)
>
> I'm wondering how many of you agree that RDF is a language of
> data -- the only such language we have with any traction --
> and that URIs are the footnotes for data in the Web age?
>
> Science and scholarship are founded on footnotes, and in
> a sense, libraries were built to support the integrity and
> longevity of footnotes.  Good grammar and proper footnotes --
> what's not to like?  Can we agree on enough of the principles
> here to work them into the case for library linked data?
>
> Tom
>
>
>
> RDF is the grammar for a language of data.  URIs are the words
> of that language.  As in natural language, these words (i.e.,
> the URIs) belong to grammatical categories.  RDF properties
> (such as "isReferencedBy") function a bit like verbs, RDF
> classes like nouns.
>
> As in natural languages, where utterances are meaningful only
> if they follow a sentence grammar, RDF statements follow a
> simple and consistent three-part grammar of subject, predicate,
> and object.  Analogously to paragraphs, RDF statements are
> aggregated into RDF graphs.
>
> Aside from being words in the language of data, URIs double
> as footnotes.  As footnotes they indicate the maintenance
> responsibility for words by way of ownership of the domain
> names under which the URIs were coined, as recorded in the
> globally managed Domain Name Service (DNS).  Inasmuch the URIs
> of words lead to documentation of official definitions, the
> Web itself provides the language of data with its dictionary.
>
> The fifteen elements of Dublin Core have been likened to a
> "pidgin" -- a lexicon of generic predicates good enough for the
> sort of rudimentary but serviceable communication that occurs
> between speakers of different languages.  Just as pidgins
> are inadequate for more subtle or differentiated expression,
> a healthy ecosystem of RDF vocabularies needs to include
> more specialized vocabularies for use by social or scholarly
> communities of discourse among themselves.
>
> RDF is a language designed by humans for processing
> by machines.  The RDF language -- the grammar together
> with available RDF vocabularies -- does not itself solve
> the difficulties of human communication any more than
> the prevalence of English guarantees world understanding.
> However, RDF does support the process of connecting dots --
> of creating "knowledge" -- by providing a linguistic basis for
> expressing and linking data.
>
> Just as English as a second language provides a basis for
> communication among non-native English speakers, RDF provides
> a common second language into which local data formats can be
> translated and exposed.  Just as English is useful without
> being the best of all possible grammars, RDF happens to be
> what we currently have -- the only general-purpose language
> for data with any traction.  But just as English grammar
> follows deep linguistic structures determined by the human
> capacity for language, it is likely that RDF, if re-invented,
> would end up strongly resembling what we currently have.
>
> Aside from supporting data interchange in the here and now, RDF
> provides a response to the ongoing and inevitable obsolescence
> of computer applications and customized data formats by
> expressing knowledge using a well-understood grammar and citing
> publicly documented vocabularies and resource URIs.  In this
> sense, it supports data that does not require additional
> out-of-band information for its interpretation, i.e., data
> that "speaks for itself".  This assumes, of course, that
> our cultural memory institutions will deploy robust methods
> for preserving the parts of the Web where the underlying RDF
> vocabularies and resource identifiers are documented.
>
> We are in the midst of a rapid shift from a world in which
> information was predominantly print-based to one in which it is
> predominantly digital.  The scale and speed of transformation
> virtually guarantees that any computer applications and user
> interfaces we use today will at some point, probably soon,
> be superseded.  Data that cannot speak for itself will be more
> vulnerable to becoming irrelevant.
>
> Not only is data expected to be linkable in the present,
> but we hope they will be remain intelligible in the future.
> In 2010, to put information into ad-hoc data formats in
> the absence of well-defined interpretations as RDF triples
> is like making statements without grammar.  Creating data
> without URIs is like writing without proper footnotes.
> This is okay for information with a short shelf life --
> i.e., most information -- but information of lasting cultural
> significance deserves better.  Cultural memory institutions
> live by the ethos of scholarship, by which things like good
> grammar and proper footnotes should really matter. The language
> of RDF represents the application of that ethos to data itself.
>
>
> --
> Tom Baker <tbaker@tbaker.de>
>
>
Received on Tuesday, 19 October 2010 17:37:40 UTC