Re: The status of Semantic Web community- perspective from Scopus and Web Of Science (WOS) from Dan Brickley on 2010-02-13 (semantic-web@w3.org from February 2010)

From: Dan Brickley <danbri@danbri.org>
Date: Sat, 13 Feb 2010 09:32:36 +0100
To: Ying Ding <dingying@indiana.edu>
Cc: Semantic Web <semantic-web@w3.org>, public-lod@w3.org
Message-ID: <eb19f3361002130032l32382458va0c9e4fff59f132f@mail.gmail.com>

On Fri, Feb 12, 2010 at 8:22 PM, Ying Ding <dingying@indiana.edu> wrote:
> Hi,
>
> If you are interested to know the Semantic Web: Who is who from the
> perspective of Scopus and Web Of Science, recently we conduct a bibliometric
> analysis in this field
> (http://info.slis.indiana.edu/~dingying/Publication/JIS-1098-v4.pdf), which
> might be interesting to you.

It's interesting to see what a traditional - ie. essentially pre-Web -
citation analysis comes up with; however I wouldn't leap so quickly to
claim this this results in 'identifying the most productive players'.

A lot of key SemWeb infrastructure came about through non-academic
collaboration; either industrial or what we might call collaborations
conducted online informally, 'Internet-style'. In fact I'd argue that
the needs of the academic publication process have often been a
retarding factor on this collaborative work. The
traditionally-published academic literature is of course a key part of
the story, but if you look at it alone you will end up with both a
misleading sense of how things got this way, and -worse- misleading
intuitions about how to get more involved and help further the
project. This is why I bother to make a little fuss here.

The phrase 'Semantic Web' from ~2000 was essentially a rebranding of
the then-unfashionable RDF technology. Prior to calling it RDF, the
project was called PICS-NG. These days many call it 'Linked Data'
instead. From http://lists.w3.org/Archives/Public/sw99/ ->
http://www.w3.org/1999/11/SW/Overview.html (Member-only link) 'We
propose to continue the W3C Metadata Activity as a Semantic Web
Development Initiative'. But by this point, the base technology was
already out there, both as a W3C Recommendation and as something in
use: Netscape - the Google of it's time - was using RDF already. For
example back in October 1988
http://web.archive.org/web/19991002043750/www.mailbase.ac.uk/lists/rdf-dev/1998-11/0004.html
R.V.Guha, then at Netscape wrote

"I still see this as a big and important use of RDF. This server
answers over 2 million requests in RDF every day." ... "I do plan to
fix the RDF, but thats with the next version of the browser (I have
about 6M browsers out there which are depending on this older
format)."

Any narrative that puts the start of Semantic Web history in 2000/2001
will confuse people as to where it came from: we had major browser
buy-in 2-3 years previously, after all. And any narrative that omits
the role of MCF - simply because it didn't come through the academic
publication process - risks misleading 'emerging stars' about how to
make an impact on the world rather than just on the citation
databases. Netscape bought into RDF because it grew from MCF, acquired
from Apple with Guha. A reformulation of MCF to use an XML notation
was one of the key inputs into the RDF design; see
http://www.w3.org/TR/NOTE-MCF-XML/ and the earlier MCF White Paper
http://www.guha.com/mcf/wp.html

Now MCF had significant mind-share and presence in the tech world back
in 1996 - http://web.archive.org/web/20000815212707/http://www.xspace.net/hotsauce/
- and even grassroots adoption on sites that wanted to have a '3d fly
thru' using Apple's then-cool visualization plugin. MCF was a direct
ancestor to RSS (also originally an RDF-based Netscape product); it
was triples-based, written in XML, and quite recognisable as RDF's
precursor to anyone who reads the spec. The grassroots, information
linking style of MCF was one of the inspirations behind FOAF too.

However it did not leave any footprint in the academic literature. We
might ask why. Like much of the work around W3C and tech industry
standards, the artifacts it left behind don't often show up in the
citation databases. A white paper here, a Web-based specification
there, ... it's influence cannot easily be measured through academic
citation patterns, despite the fact that without it, the vast majority
of papers mentioned in
http://info.slis.indiana.edu/~dingying/Publication/JIS-1098-v4.pdf
would never have existed.

In my experience, many of the discussions that shaped the early RDF
and Semantic Web efforts were conducted online, using email, often
also IRC chat, and as the years went by, increasingly in blogs and now
microblogs. And many of the people who got a lot done were not
employed in an academic setting where there was an institutionalised
pressure to public in certain kinds of places. This is not to belittle
the critically important contributions that came from those employed
in academia, just to note that the wave of interest and research
funding that followed 200/1 served largely to polish and promote ideas
(and tools, specs) that had already reached prominence via
Internet/Web/industry means. Without that academic buy-in and
associated research funding, the Semantic Project would surely be dead
by now. However, there is a continuing danger of confusing the real
project --- a global collaboration to improve the Web's
information-linking facilities --- with the activity of writing about
it. The two are not the same, we need both, and the lack useful modern
impact metrics makes it easy to conflate the two.

It is not appropriate to entitle an academic citation analysis of the
SemWeb project "Who is who in the field", not because of the bruised
egos of those it omits, but because it risks misleading younger
developers about how to make an impact on the world, rather than just
on the literature. "Who cites whose paper?" might be a more accurate
characterisation.

This is not a problem distinct to the Semantic Web scene. All kinds of
scientific collaborations (the Web's founding use case) can be
conducted with greater speed thanks to the Web. But impact analysis
lags behind, making it hard for those who work openly, rapidly and
collaboratively to show the merits of their approach. Or the same in
Web standards: any account of recent developments in HTML should pay a
lot of attention to Web browsers, to organizations like Mozilla,
Microsoft, Opera, Apple, KDE, WebKit and to fora like #whatwg (an IRC
channel), the whatwg- and W3C- mailing lists, and countless blogs
where the future of HTML is being passionately debated. If you scan
the academic literature concerning HTML5 it is a pale and much-delayed
echo of the real debates. It is hardly suprising that a technology
community - HTML5 - devoted to improving the Web are also using it to
conduct their discussions. I think you'll find, although perhaps to a
lesser degree, the same also to be true of the Semantic Web project...

cheers,

Dan

Received on Saturday, 13 February 2010 08:33:12 UTC