Re: Keeping the Faith

On 29 April 2017 at 00:11, Brent Shambaugh <brent.shambaugh@gmail.com>
wrote:
> General Question:
>
> How do you keep the faith or vision with respect to semantic web and
> linked data? I'm also in an area where there is not a lot of venture
> capital (well some) nor (many) people having a lot of understanding of
> the area. At least it does not score you a talk. Is the field of
> dreams mentality of "if you build it, he will come"?

First and foremost, this effort is not a religion. People do seem to care
about it, and the larger notion of a healthy standards-based,
vendor-neutral etc World Wide Web, with the passion that others do reserve
for religious matters. This may or may not be a bug! That passion can drive
creativity and collaboration but it can also foster stubbornness and tribal
thinking.

W3C's RDF work embodies a lot of good ideas, and has proved useful, but it
is just one tool in the toolkit. When we were working on the RDF specs
nearly 20 years ago, it felt sometimes like they were positioned in a
"david vs goliath" struggle with the XML family of technologies. There were
competing visions for what data on the Web might amount to. In later years
RDF-based approaches get contrasted with JSON or SQL/CSV or whatever, but
the debate often takes roughly the same form. Do we treat data as a graph
representation of factual claims, or do we focus on the concrete form that
such claims might take - as an HTML or XML DOM or a JSON tree or a simple
flat table? And the answer is generally the same --- that there is value in
both perspectives. When we neglect the concrete notation / file format
details, the usability of the concrete formats suffers (c.f. RDF/XML); when
we neglect the abstract commonalities, information becomes needlessly
fragmented across different representations and publication systems.

This community has always tended a little towards blaming two things for
the (real or perceived) failure of its ideas to burst triumphantly into the
technology mainstream. We have blamed poor syntaxes, leading to a range of
specs and experiments endlessly pursuing a more usable notation --- from
RDF/XML through RDFa (and its hybrid cousin, Microdata), JSON-LD, Turtle,
N-Triples, as well as mapping-based systems like GRDDL and CSVW. And we
have also blamed failures in understanding. There is a persistent tone
around here that eventually the wider world will "get it" and see the
point, value, importance etc of the approach to data embodied in RDF,
Semantic Web and Linked Data. I think there is some truth to the claim that
RDF was the right idea at the wrong time, and that the success of graph
databases shows that there is a more mainstream technology audience waiting
for it. But there is also some self-deception here, and failure to face up
to a fairly boring truth. Dealing with RDF data is difficult, annoying,
frustrating and suchlike. Not because of any intrinsic failing in the W3C
specs, tools or practices, but because dealing with highly hetrogenous,
lumpy, quirky dataset with all kind of bits missing, and all kinds of
unanticipated extensions or novel patterns arbitrarily appearing in it, is
just a really hard problem space to be working in. There is something of a
tragedy of the commons pattern here. Any individual project can generally
get by without needing RDF, and may make progress faster focussing on their
exact data format needs using any of XML, JSON, CSV or whatever. But when
we stand back and look at the wider Web, this creates a very fragmented
landscape. This kind of thinking motivated W3C's GRDDL work (using XSLT to
map XML files into RDF, e.g. see http://www.xml.com/pub/a/2000/08/09/rdfdb/
https://www.w3.org/2000/08/w3c-synd/ etc.

Some years ago, Murray Maloney (of SGML and XML fame) popped into W3C
Semantic Web Interest Group meeting we held as part of the TPAC conference.
I forget his exact words but afterwards he made the point that it reminded
him of the (in Brent's terms) faith and vision that people in the SGML/XML
community also had, and that it might be that we were attaching those
things overly specifically to some particular technology. He was right.
Round about that time, Linked Data took off as a variation of the Semantic
Web idea, but with more of an emphasis on open data in the public Web, and
less emphasis on fancy rule systems. Two healthy consequences of that for
RDF was that it re-affirmed the link to the broader Web standards community
--- by focussing on putting data in that actual Web and using related
standards like HTTP well --- and also it tapped into the underlying
motivations Murray had noted. We had perhaps mis-identified our common
interest as being RDF, but for many of us it was more about data sharing /
knowledge sharing / large scale collaborative infrastructure, and RDF was
just a means to an end. RDF is a means to an end, not an end in itself.

If you look at the history there are also plenty of things to feel good
about. When the Web was young, RDF was always talked about in terms of its
rivalry with XML. But then if you look at the actual people involved over
the years, those individuals (I won't namecheck everyone) have had careers
that touch into XML, RDF, open data, JSON, CSV, whatever tool gets the job
done. The rivalries and "XYZ is the ABC killer" framing, aren't the story
of how these technologies inter-relate in practice.

The RDF community has the endearing tendency to over-criticise itself for
not single-handedly saving the planet from its perceived data-sharing
failings. I think we should instead just take a bow and acknowledge that
we've done good here. We built some useful tools and technologies that are
finding a niche, and we've progressed the state of the art around
annoyingly heterogeneous data handling. Is it the last word in anything,
absolutely not. Is RDF (or Perl or XML or ...) "dead", ... absolutely not.
Are factual triples the answer to 'fake news'? Not quite. Could our Web
technologies be improved, the representations made simultaneously more
usable, expressive and useful --- probably/maybe/dunno. People worry too
much. These are good tools in a growing Web standards toolkit and it is
worth continuing to work on them, but also worth reminding ourselves that
this isn't in opposition to the wider technology landscape. It is nothing
but healthy for "RDF people" to take a break from thinking just about these
technologies and to spend some time in related work, e.g. Javascript, Web
components, machine learning, security ... rather than slipping into
thinking about our efforts here as a kind of religious struggle against the
unbelievers...

Thinking of particular practical areas I'd suggest as worth putting time
into: ShACL and Shex for RDF validation may turn out to be very important.
Also for my part, I have worked mostly on Schema.org
<https://research.googleblog.com/2015/12/four-years-of-schemaorg-recent-progress.html>
these last years. It is very widely used across the entire Web, and is
broadly in the "RDF family", but currently tends to be published and
consumed on a page-by-page basis rather than site-by-site. I suspect the
latter is where we'll see more scope for integration with the tools and
techniques of this community (SPARQL etc) and hope to put some time into
that in the coming months.

verbosely,

Dan

(somwhat absentee SemWeb Interest Group chair)

Received on Sunday, 30 April 2017 10:25:16 UTC