"destabilizing core technologies: was Re: An RDF wishlist from Patrick Durusau on 2010-07-01 (public-lod@w3.org from July 2010)

From: Patrick Durusau <patrick@durusau.net>
Date: Thu, 01 Jul 2010 05:39:32 -0400
To: Dan Brickley <danbri@danbri.org>
CC: Pat Hayes <phayes@ihmc.us>, Linked Data community <public-lod@w3.org>, Semantic Web <semantic-web@w3.org>
Message-ID: <4C2C6254.5030708@durusau.net>
Dan,

Just a quick response to only one of the interesting points you raise:

> It's clear that many workshop participants were aware of the risk of
> destabilizing the core technologies just as we are gaining some very
> promising real-world traction. That was a relief to read. For those
> who have invested time and money in helping us get this far, and who
> had the resources to participate, this concern was probably enough to
> motivate participation.

It might be helpful to recall that "destabilizing the core technologies" 
was exactly the approach that SGML took when its "....little annoyances 
[brought] friction and frustration to those working with [SGML]..."

There was "...promising real-world traction."

I don't know what else to call the US Department of Defense mandating 
the use of SGML for defense contracts. That is certainly "real-world" 
and it seems hard to step on an economic map of the US without stepping 
in defense contracts of one sort or another.

Clinging to decisions that seemed right at the time they were made is a 
real problem. It is only because we make decisions that we have the 
opportunity to look back and wish we had decided differently. That is 
called experience. If we don't learn from experience, well, there are 
other words to describe that.

Some of the audience for these postings will remember that the result of 
intransigence on the part of the SGML community was XML.

I am not advocating in favor of any specific changes. I am suggesting 
that clinging to prior decisions simply because they are prior decisions 
doesn't have a good track record. Learning from prior decisions, on the 
other hand, such as the reduced (in my opinion) feature set of XML, 
seems to have a better one. (Other examples left as an exercise for the 
reader.)

Hope you are having a great day!

Patrick



On 7/1/2010 4:46 AM, Dan Brickley wrote:
> (rejigged subject line)
>
> On Thu, Jul 1, 2010 at 4:35 AM, Pat Hayes<phayes@ihmc.us>  wrote:
>    
>>> Pat, I wish you had been there.  ;)
>>>        
>> I have very mixed views on this, I have to say. Part of me wanted badly to
>> be present. But after reading the results of the straw poll, part of me
>> wants to completely forget about RDF,  never think about an ontology or a
>> logic ever again, and go off and do something completely different, like art
>> or philosophy.
>>      
> I have mixed feelings about missing the workshop too. Having been
> pushing this wheelbarrow uphill for far too long, it does seem a shame
> to have missed such an event. On the other hand, it is hard to know
> what to make of the workshop outcomes since the participants form an
> unusually specialist subset of humanity, and the problem of what W3C
> next does with its RDF standard such a small part of the larger
> problem.
>
> It's clear that many workshop participants were aware of the risk of
> destabilizing the core technologies just as we are gaining some very
> promising real-world traction. That was a relief to read. For those
> who have invested time and money in helping us get this far, and who
> had the resources to participate, this concern was probably enough to
> motivate participation. It's clear also that participants were aware
> of many of the little annoyances that bring friction and frustration
> to those working with RDF. What I'm less sure of is how to represent
> the perspective of those who have explored RDF and walked away. Over
> the years, many bright people have investigated RDF enthusiastically,
> and left disappointed. Those folk didn't come to the workshop, they
> didn't write a position paper, and they probably don't particularly
> care about its outcomes. But they're just the kind of people who will
> need to enjoy using RDF if we are to succeed.
>
> Is RDF hard to work with? I think the answer remains 'yes', but we
> lack consensus on why. And it seems even somehow disloyal to admit it.
> If I had to list reasons, I'd leave nits like 'subjects as literals'
> pretty low down. Many of the reasons I think are anavoidable, and
> intrinsic to the kind of technology and problems we're dealing with.
> But there are also lots of areas for improvement. Most of these are
> nothing to do with fixups to W3C standards documentation. And finally,
> we can lesson the perception of pain by improving the other side:
> getting more decent linked data out there, so the suffering people go
> through is "worth it".
>
> Some reasons why RDF is annoying and hard (a mildly ordered list):
>
> * RDF data is gappy, chaotic, full of unexpected extensions and
> omissions - BY DESIGN
> * RDF toolkits each offer different items from a large menu (syntaxes,
> storage, inference facilities), so even when you're getting a lot, you
> probably don't appreciate what you're getting and we have no common
> checklist that help non-guru developers understand this.
> * RDF toolkit / library immaturity; eg1. I wasted half a weekend
> recently trying to find a decent Javascript system. eg2. I work in
> Python using the popular rdflib library, whose half-finished SPARQL
> support was recently removed and put into an 'extras' package; nobody
> seems quite sure how well it works. The Ruby landscape remains messy
> although the public-rdf-ruby list have recently been collaborating
> actively to improve things. Broken old and abandoned code litters the
> Web; good stuff remains on the bleeding edge and unpackaged. Great
> ideas, code and algorithms remain trapped in a single implementation
> language rather than transliterated to other widely deployed
> languages. Almost every toolkit's SQL backend is represented
> differently. Only a few serializers bother to prettify RDF/XML nicely,
> despite there being opensource code out there that could easily be
> copied.
> * RDF is good for aggregation of externally managed data; managing
> data *as* RDF comes with certain complexities since edit/delete
> operations on a connected graph aren't as intuitive as on a closed
> tree structure. If I delete a certain node from the graph, which
> others should be cleaned up too? Named graphs help somewhat there but
> good habits aren't yet understood, much less documented.
> * As a community, we have some standards for documenting the atomic
> terms in our vocabularies (ie. RDFS/OWL) but we tend to stop there,
> and not to document the larger graph patterns that are needed to
> really communicate using these structures, or the underlying use cases
> that motivated them in the first place. We also don't do nearly enough
> analytics and stats over the actual data out there to make it easier
> to consume, and for publishers to gravitate towards existing idioms
> rather than make up similar-but-different graph patterns that'll
> confuse the landscape further.
> * Our small community (we are outnumbered by Visual Basic enthusiasts,
> let alone Javascripters) is fragmented and grumpy. OWL and Linked Data
> enthusiasts too often talk and think disparagingly about each others'
> work, or not-so-secretly wish the others would just go away and stop
> messing things up. And all this foolish posturing despite the fact
> that Linked Data is a massive deployment of OWL-documented
> vocabularies, and that the essential but annoying gappy chaotic nature
> of RDF can at least partly be patched up by techniques that help us
> figure out when two different RDF expressions are saying the same
> thing, aka inference.
> * Enthusiasm sometimes borders on a religious zeal that would be
> better spent on toolkit polish than on overloading mailing lists; or
> on prolonging petty wars ('x is not semantic enough', 'y isn't really
> Linked Data'...) with other folk who prefer for whatever reason to use
> different technologies to publish, share and link data.
>
> So, what do we do?
>
> A few years ago, Edd Dumbill turned the XML Europe conference into the
> XTech conference, transforming it from a nose-too-close-to-the-screen
> event for markup nerds, into an event that brought together browser
> people, XML markup experts, open data advocates (creative commons
> etc.), and forward-thinking creative technologists of every kind.
> XTech is no longer with us, although I expect Edd's work at OSCON
> (http://www.oscon.com/oscon2010 which I'd happily have attended over
> any RDF/SemWeb event) shows similar insight. XTech was important as it
> provided a meeting place for technologists with different technical
> favourites, while also tapping into the larger themes that motivate
> much of the passion in the first place. It helped people identify
> themselves with a larger effort, rather than with some specific
> technology tool.  I think we can learn a lot from XTech.
>
> RDF enthusiasts share 99.9% of their geek DNA with the microformats
> community, with XML experts, with OWL people, ... but time and again
> end up nitpicking on embarrassing details. Someone "isn't really"
> publishing Linked Data because their RDF doesn't have enough URIs in
> it, or they use unfashionable URI schemes. Or their Apache Web server
> isn't sending 303 redirects. Or they've used a plain XML language or
> other standard instead. This kind of partisan hectoring can shrink a
> community passionate about sharing data in the Web, just at a time
> when this effort should be growing more inclusive and taking a broader
> view of what we're trying to achieve.
>
> The formats and protocols are a detail. They'll evolve over time. If
> people do stuff that doesn't work, they'll find out and do other
> things instead. The thing that keeps me involved is the common passion
> for sharing information in the Web. If we keep that as an anchor point
> rather than some flavour of some version of RDF, I think a lot of the
> rest falls into place. I love
> http://www.w3.org/Illustrations/LetsShare.ai.gif "Let's Share What We
> Know" - an ancient slogan of the early Web project. If we take "Let's
> share what we know" as a central anchor, rather than triples, we can
> evaluate different technical strategies in terms of whether they help
> by making it easier to "share what we know" using the Web.
>
> Going back to my list, I think the reason to use RDF will simply be
> that others have also chosen to use it. Nothing more really, it's
> about the data, above all. Sure the reason we can all choose to use it
> and gain value from each others' parallel decision, is the emphasis on
> linking, sharing, mixing, decentralisation. But when choosing whether
> to bother with RDF, I think for future decision makers it'll all be
> about the data not the implementation techniques.
>
>   The reason is *not* the tooling, the fabulous parsers, awe-inspiring
> inference engines, expressive query languages or cleverly designed
> syntaxes. Those are all means-to-an-end, which is sharing information
> about the world. Or getting hold of cheap/free and bulky background
> datasets, if you prefer to couch it in less idealistic terms.
>
> And why would anyone care to get all this semi-related, messy Web
> data? Because problems don't come nicely scoped and packaged into
> cleanly distinct domains. Whenever you try to solve one problem, it
> borders on a dozen others that are a higher priority for people
> elsewhere. You think you're working with 'events' data but find
> yourself with information describing musicians; you think you're
> describing musicians, but find yourself describing digital images; you
> think you're describing digital images, but find yourself describing
> geographic locations; you think you're building a database of
> geographic locations, and find yourself modeling the opening hours of
> the businesses based at those locations. To a poet or idealist, these
> interconnections might be beautiful or inspiring; to a project manager
> or product manager, they are as likely to be terrifying.
>
> Any practical project at some point needs to be able to say "Enough
> with all this interwingularity! this is our bit of the problem space,
> and forget the rest for now". In those terms, a linked Web of RDF data
> provides a kind of safety valve. By dropping in identifiers that link
> to a big pile of other people's data, we can hopefully make it easier
> to keep projects nicely scoped without needlessly restricting future
> functionality. An events database can remain an events database, but
> use identifiers for artists and performers, making it possible to
> filter events by properties of those participants. A database of
> places can be only a link or two away from records describing the
> opening hours or business offerings of the things at those places.
> Linked Data (and for that matter FOAF...) is fundamentally a story
> about information sharing, rather than about triples. Some information
> is in RDF triples; but lots more is in documents, videos,
> spreadsheets, custom formats, or [hence FOAF] in people's heads.
>
> Looked at in these terms, my RDF wishlist would be based on looking at
> things from the consumer side. Publishing RDF is fiddly, but do-able.
> And it only takes a few lines of [perl|java|ruby|prolog|xslt...] to
> expose massive amounts of information in the Web. The linked data
> scene in recent years has started to do just this on an impressive
> scale. But consuming RDF remains pretty annoying, a hurdle to be
> crossed to get at the good stuff, the data. Even while RDF remains our
> single best story for how such a Web of data can be broken down into a
> largely uncoordinated global division of labour, RDF itself remains
> ... annoying. So my RDF wishlist would be about making RDF less
> annoying or risky to consume. While a lot of that is about tool
> maturity, there is a lot around data licensing and dealing with a
> natural waryness of depending too much on others, and on making sure
> RDF and the 'linked data' idea is presented in a more inclusive manner
> that respects that fact that most of the world's information isn't
> going to gain much from being put into URI-based triples.
>
> The very nature of RDF makes it somewhat annoying to work with. RDF
> data is always going to be a kind of frankenstein's data monster,
> patched together from bits and pieces that can just about be made to
> fit together. Fortunately, we have at our fingertips a world wide Web
> that lets us share an awful lot of these bits; the more we can get
> re-usable RDF datasets out there, the less people will worry about the
> pain of using it, and the more likely it'll be that there will be
> genuinely useful, relevant data on hand when someone goes looking for
> it.
>
> All the time we run around evangelizing RDF while not admitting that
> it is also kind of annoying, we raise expectations that will be dashed
> when people actually try using it. All the time we spend ages writing
> long emails when we could be fixing and improving RDF software or
> datasets, we're probably also prolonging the problem. For my part in
> that last one, and for this over long mail, ... sorry :)
>
> cheers,
>
> Dan
>
>
>    

-- 
Patrick Durusau
patrick@durusau.net
Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)

Another Word For It (blog): http://tm.durusau.net
Homepage: http://www.durusau.net
Twitter: patrickDurusau
Received on Thursday, 1 July 2010 09:40:29 UTC