- From: Dan Brickley <danbri@danbri.org>
- Date: Mon, 28 Apr 2008 16:39:29 +0100
- To: Maciej Gawinecki <mgawinecki@gmail.com>
- Cc: semantic-web@w3.org
Maciej Gawinecki wrote:
> Thank you for your help,
A suggestion/question: how might your analysis differ if you took the
perspective that there is only one Web, ... the World Wide one; and that
'Semantic Web' is the name of a (world-wide) project to help improve it.
Just as the 'Mobile Web' initiative aims to progress the state of the
Web art relating to use from mobile devices. If you talk about the
Semantic Web as a new replacement Web, you're bound to be dissapointed.
If you think of it as a collaborative project, hopefully you'll find a
way to get involved and help with it.
Noun phrases can mislead us sometimes. A phrase such as '[the] Semantic
Web' (or 'Mobile ...') can have the unfortunate side-effect that it
slips us into thinking that there are a countable number of "Webs". And
we then look around and we see the these apparently-new "Webs" look like
peas in orbit around the Jupiter "classic Web".
If we focus instead on the notion of their being just one Web, we can
still ask why the proportion of it with an RDF representation is
relatively tiny. But we don't take the absence of a new all-replacing
'thing' as a measure of failure.
Thinking about RDF:
> - decentralization, no central database of content and links
RDF also has this characteristic. The Web itself is our distributed
database of schemas. We're all free to use shared schemas, or our own
application-specific schemas. And by using Web identifiers for our
descriptive terms, we set things up so that mappings (often lossy,
pragmatic mappings) can be created days months or years later, either in
procedural code or using technologies like SPARQL, OWL, RIF. The
important thing is that these agreements and mappings can be documented
later, if at all. People can get on with their immediate business
without asking for permission or forgiveness. There is more
bottlenecking and centralisation in a traditional 'enterprise' SQL-based
environment than in the entire planet-wide Semantic Web.
> - one-way links, requiring no cooperation, approval from the link target
This corresponds to RDF's claim-based design, where anything that can be
read as RDF is free to encode claims about anything else. eg. (for
better or worse) I can talk about you in my FOAF file whether you like
it or not.
> - a simple protocol (HTTP) and markup format (HTTP*) that anyone could
adapt and copy
(assume you meant HTML for the latter (*))
RDF/SemWeb uses HTTP heavily too (but doesn't require it; we can eg.
query SPARQL over XMPP protocol). For formats, a system designed for
improved machine processing is by necessity going to be harder for
humans to create at the byte or character level. But there are various
efforts in play to ensure that we can RDF views of as much data as
possible: by mapping from SQL (which humans have GUIs for, Web based and
desktop); from wellformed or annotated HTML (GRDDL/RDFa), from Wikis,
etc. Semantic Web people are pragmatists, and will pull data in from
wherever it can be found...
> - no established competitors serving the same need
Depending on level of analysis, Gopher was an early competitor; however
the Web was a unifying abstraction that embraced gopher, ftp, telnet etc
as components of our information universe; it embraces RDF too.
> - significant commercial interest in selling more PCs, online
services, net access, etc.
It's the single same Web... if RDF can drive traffic to commercial sites
[yes, a work in progress] then the same business benefits can kick in.
> - no critical mass required to make the Web interesting and useful
I don't see a fundamental difference here. RDF could be used on a single
site quite happily, eg. to provide faceted browse into a collection of
things. For example,
http://www.w3.org/2001/sw/Europe/showcase/sem-portal.html
The Semantic Web isn't a new replacement Web; it's a project, part of
the wider Web project. You can poke around in its origins, eg see -
http://www.w3.org/Talks/WWW94Tim/ or
http://www.w3.org/1999/11/11-WWWProposal/thenandnow
But yes [see below], RDF is at its best when cross-domain data is being
merged; and more data makes this case more compelling than only having a
few files. If the Web was a single page only, we'd probably search it
with 'grep' rather than Google's server farm, after all.
As to your critical points:
> - requires a measure of centralization in order to make sense of
schemas, i.e. the semantics cannot be built in to every client as the
semantics of HTML and HTTP were built in to browsers
RDF is an exercise in 'agreeing to disagree'. By buying into a common
data model (the nodes-and-arcs thing), decentralised parties can invent
whatever classes and properties and URIs they like, benefiting from
shared infrastructure (APIs, data stores, query languages) that are
utterly domain neutral. Furthermore (and by contrast to typical XML
usage) the descriptive vocabularies created by these different parties
can be freely combined *without* prior or centralised agreement by those
parties.
For example, look at
http://search.cpan.org/src/ASCOPE/Net-Flickr-Backup-2.6/README and the
list of namespaces used. I'll repeat them here:
<rdf:RDF
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
xmlns:a="http://www.w3.org/2000/10/annotation-ns"
xmlns:acl="http://www.w3.org/2001/02/acls#"
xmlns:exif="http://nwalsh.com/rdf/exif#"
xmlns:skos="http://www.w3.org/2004/02/skos/core#"
xmlns:cc="http://web.resource.org/cc/"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:exifi="http://nwalsh.com/rdf/exif-intrinsic#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:flickr="x-urn:flickr:"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:i="http://www.w3.org/2004/02/image-regions#">
Now, OK some familiar faces show up when you look behind the scenes at
who created those schemas (well we're a small but growing community!).
However the development of these schemas did not *need* pairwise or
central coordination, and the author of the Perl Net::Flickr::Backup
module (Aaron Straup Cope) absolutely did not need anyones permission to
recombine these schemas to create image descriptions which integrate
data by using them all.
I'll dwell on this point a bit longer as it is a key one, and at risk of
being lost in the social history of the Semantic Web. The push for RDF
came in good part from people who were sick of sitting in metadata
standardisation meetings, and in dealing with scoping overlaps. The RDF
design is heavily decentralistic compared to some other approaches that
could have been taken.
In earlier RDFS drafts we made some of this heritage more explicit, see eg
http://www.w3.org/TR/2000/CR-rdf-schema-20000327/
[[
RDF and the RDF Schema language were also based on metadata research in
the Digital Library community. In particular, RDF adopts a modular
approach to metadata that can be considered an implementation of the
Warwick Framework [WF]. RDF represents an evolution of the Warwick
Framework model in that the Warwick Framework allowed each metadata
vocabulary to be represented in a different syntax. In RDF, all
vocabularies are expressed within a single well defined model. This
allows for a finer grained mixing of machine-processable vocabularies,
and addresses the need [EXTWEB] to create metadata in which statements
can draw upon multiple vocabularies that are managed in a decentralized
fashion by independent communities of expertise.
]]
The Warwick Framework was a conceptualisation of the metadata problem
space from the Dublin Core community in 1996; see
http://www.dlib.org/dlib/july96/lagoze/07lagoze.html ... it proposed
a way of breaking descriptive tasks down into scoped 'packages'.
Quoting from the 1996 dlib paper,
[[
The result of the Warwick Workshop is a container architecture, known
as the Warwick Framework. The framework is a mechanism for aggregating
logically, and perhaps physically, distinct packages of metadata. This
is a modularization of the metadata issue with a number of notable
characteristics.
* It allows the designers of individual metadata sets to focus on
their specific requirements, without concerns for generalization to
ultimately unbounded scope .
* It allows the syntax of metadata sets to vary in conformance with
semantic requirements, community practices, and functional (processing)
requirements for the kind of metadata in question.
* It separates management of and responsibility for specific
metadata sets among their respective "communities of expertise".
* It promotes interoperability by allowing tools and agents to
selectively access and manipulate individual packages and ignore others.
* It permits access to the different metadata sets that are related
to the same object to be separately controlled.
* It flexibly accommodates future metadata sets by not requiring
changes to existing sets or the programs that make use of them.
The separation of metadata sets into packages does not imply that
packages are completely semantically distinct. In fact, it is a feature
of the Warwick Framework that an individual container may hold packages,
each managed and maintained by distinct parties, which have complex
semantic overlap.
]]
In someways RDF is a realisation of this abstract architecture. But with
RDF we really went further in exploring the issue of semantic overlap
amongst different metadata 'packages'. By imposing a common data model
across all metadata packages, we make it possible for apps to express
data and queries that combine for example, rights metadata, geographic,
discovery, workflow, tagging or any other RDF-expressed characteristics.
In this conceptualisation, we are buying more decentralisability at the
expense of imposing a shared data model.
Regarding your point about decentralisation, I think RDF compares rather
well with XML. Anyone can invent an XML schema and deploy it; the
technology allows XML elements and attributes to be used in wildly
varying ways. In RDF's XML syntax(es), the notation is always an
encoding of a set of RDF claims about the world. We have a common set of
rules to help interpret this, making it easier rather than harder to
process and integrate data from unknown namespaces. If I see a new RDF
schema, I know that it defines classes, properties and not a lot else.
This lowers some costs (and raises some others, sure; nothing for free
here). RDF takes expressive power away from those that define schemas,
such that they all have a lot more in common. It's a shared pattern for
schema authors designed to allow them to get on with their job and not
have to fly to meetings with each other. By agreeing that data modeling
work once instead of pairwise, we save on a lot of airfares, and a lot
of teleconferences.
> - requires much more cooperation from data sources (e.g. link targets)
I suspect some confusion about 'link targets' here. In classic Web, a
link target is the thing you're pointing to. In the Semantic Web
project, we can describe anything that the classic Web might link to;
and beyond that, we can use reference-by-description techniques to talk
about things indirectly, via their descriptions. No consent needed. I
can talk about 'the person whose homepage is http://john.example.com/'
for example.
> - is based on a complex markup (RDF) that's difficult for
non-programmers to work with
Two flavours of complexity here:
1. Each RDF notation (RDF/XML, RDFa, ... custom GRDDL-ready formats) has
some (varying) difficulty associated with learning the encoding. And an
associated fragility risk: a misunderstanding or error could mess up the
entire chunk of data if there's a mistake. RDF notations have not
traditionally had any form of recovery from this, ie. nothing like the
'quirks mode' that Web browsers have, where bad HTML is still somehow
converted into a user-facing document.
2. Merely having a distinction between abstract data model vs markup(s)
is a level of indirection that can be confusing, especially without
fantastic tool support, tutorial materials etc.
These are real issues. But HTML itself is also difficult for
non-programmers to work with *well*. Which is why so many sites don't
give reliable cross-browser experience (people code for IE; as MacOSX
Firefox user I suffer often enough when visiting bad HTML sites).
Perhaps the difference here is that crappy HTML coding leads to
sometimes-crappy Web experience; crappy RDF coding leads to ... no data
at all from that document. The use of RDFa in an HTML5 context is where
this part of the discussion goes next: it should be possible to mix
semantic markups into environments where non-draconian error handing is
the rule. The microformats folk do this, for example. We're all still
figuring out exactly what the tradeoffs are here: how much mess to allow
before things become too scruffy for our poor machines to have any idea
what's happening?
> - has to compete with its predecessor and many other technologies
I view this as a misunderstanding. It may be cleanest to think of the
"Semantic Web" simply as a project. When http://www.w3.org/2001/sw/ says
"The Semantic Web is a Web of data" it isn't talking about any other Web
but the one we know and love. Read it as 'The-Web-made-more-Semantic is
a Web of data', perhaps.
> - has very little commercial interest, unclear revenue model, etc.
There may be no 'make money fast' route akin to the crazy dot-com days,
but I see here more a 'chicken and egg' issue (which you allude to
above). While RDF can be used on a single site, unless there is a lot of
it around, nobody's going to bother building a planet-scale index of it.
And unless there's a planet-scale index and it's being used by major
search engines, people won't have an incentive to publish a lot of RDF
in the public Web. If things turn out well, publishing RDF should help
drive users to classic Web sites, where they'll be parted from their
money through various essentially timeless techniques. Some things
change; some stay the same.
Re chicken-and-egg., ... I think we've done a bit to break that cycle in
the FOAF scene. In recent months Google's Social Graph API has been
indexing much of the public FOAF data out there, and more recently still
has been using a real RDF parser. While this isn't currently integrated
into Google's main user facing search, I am very encouraged by these
developments and by those at Yahoo around RDF/RDFa. It has taken a while
but we're getting there.
My other thought re "critical mass" has been that SemWeb adoption is
difficult because as a fundamentally cross-domain technology, we only
really show strong benefits, ie. the technology's key strengths, when
used in several overlapping fields simultaneously. And as a
representation system where data can always be missing, and always be
extended/augmented, it can take a lot of data before there is enough to
reliably index in certain ways. My answer to this (besides FOAF) is to
suggest that SemWeb may perhaps take of in a few large cities first.
Geographical proximity could allow critical mass of early-adopter data
even without things going RDF crazy planet-wide. Some of us put in an EU
project proposal on this a few years back, but the EU reviewers in their
infinite wisdom chose not to fund it. Ah well :)
> - requires a critical mass of participating sites to be interesting
and useful
As I say above, having a mass of data isn't essential, although nice of
course. And the work can be distributed: while the Semantic MediaWiki
folk are showing how built-in RDF facilities could add value to
MediaWiki and Wikipedia, the DBPedia team are already showing an
externally generated RDF version of Wikipedia. Participation too is
optional but not required; 3rd parties can write GRDDL transforms for
XML formats, or D2RQ etc adaptors for existing SQL datasets. There are a
lot of scraper/extractor/converter scripts around, and a few lines of
code can create a huge amount of data.
These are good kinds of questions to ask, but I think somehow all a bit
skewed by thinking of SW as a replacement for the Web, or as a rival for
existing search engines. It may be that some fancy new search engine
comes along that is fundamentally RDF-oriented, but it's also clear that
there are many folk at the existing search engines who are well aware
that the Web is slowly offering more by way of structured (meta)data.
cheers,
Dan
--
http://danbri.org/
Received on Monday, 28 April 2008 15:40:11 UTC