- From: Dan Brickley <danbri@danbri.org>
- Date: Mon, 28 Apr 2008 16:39:29 +0100
- To: Maciej Gawinecki <mgawinecki@gmail.com>
- Cc: semantic-web@w3.org
Maciej Gawinecki wrote: > Thank you for your help, A suggestion/question: how might your analysis differ if you took the perspective that there is only one Web, ... the World Wide one; and that 'Semantic Web' is the name of a (world-wide) project to help improve it. Just as the 'Mobile Web' initiative aims to progress the state of the Web art relating to use from mobile devices. If you talk about the Semantic Web as a new replacement Web, you're bound to be dissapointed. If you think of it as a collaborative project, hopefully you'll find a way to get involved and help with it. Noun phrases can mislead us sometimes. A phrase such as '[the] Semantic Web' (or 'Mobile ...') can have the unfortunate side-effect that it slips us into thinking that there are a countable number of "Webs". And we then look around and we see the these apparently-new "Webs" look like peas in orbit around the Jupiter "classic Web". If we focus instead on the notion of their being just one Web, we can still ask why the proportion of it with an RDF representation is relatively tiny. But we don't take the absence of a new all-replacing 'thing' as a measure of failure. Thinking about RDF: > - decentralization, no central database of content and links RDF also has this characteristic. The Web itself is our distributed database of schemas. We're all free to use shared schemas, or our own application-specific schemas. And by using Web identifiers for our descriptive terms, we set things up so that mappings (often lossy, pragmatic mappings) can be created days months or years later, either in procedural code or using technologies like SPARQL, OWL, RIF. The important thing is that these agreements and mappings can be documented later, if at all. People can get on with their immediate business without asking for permission or forgiveness. There is more bottlenecking and centralisation in a traditional 'enterprise' SQL-based environment than in the entire planet-wide Semantic Web. > - one-way links, requiring no cooperation, approval from the link target This corresponds to RDF's claim-based design, where anything that can be read as RDF is free to encode claims about anything else. eg. (for better or worse) I can talk about you in my FOAF file whether you like it or not. > - a simple protocol (HTTP) and markup format (HTTP*) that anyone could adapt and copy (assume you meant HTML for the latter (*)) RDF/SemWeb uses HTTP heavily too (but doesn't require it; we can eg. query SPARQL over XMPP protocol). For formats, a system designed for improved machine processing is by necessity going to be harder for humans to create at the byte or character level. But there are various efforts in play to ensure that we can RDF views of as much data as possible: by mapping from SQL (which humans have GUIs for, Web based and desktop); from wellformed or annotated HTML (GRDDL/RDFa), from Wikis, etc. Semantic Web people are pragmatists, and will pull data in from wherever it can be found... > - no established competitors serving the same need Depending on level of analysis, Gopher was an early competitor; however the Web was a unifying abstraction that embraced gopher, ftp, telnet etc as components of our information universe; it embraces RDF too. > - significant commercial interest in selling more PCs, online services, net access, etc. It's the single same Web... if RDF can drive traffic to commercial sites [yes, a work in progress] then the same business benefits can kick in. > - no critical mass required to make the Web interesting and useful I don't see a fundamental difference here. RDF could be used on a single site quite happily, eg. to provide faceted browse into a collection of things. For example, http://www.w3.org/2001/sw/Europe/showcase/sem-portal.html The Semantic Web isn't a new replacement Web; it's a project, part of the wider Web project. You can poke around in its origins, eg see - http://www.w3.org/Talks/WWW94Tim/ or http://www.w3.org/1999/11/11-WWWProposal/thenandnow But yes [see below], RDF is at its best when cross-domain data is being merged; and more data makes this case more compelling than only having a few files. If the Web was a single page only, we'd probably search it with 'grep' rather than Google's server farm, after all. As to your critical points: > - requires a measure of centralization in order to make sense of schemas, i.e. the semantics cannot be built in to every client as the semantics of HTML and HTTP were built in to browsers RDF is an exercise in 'agreeing to disagree'. By buying into a common data model (the nodes-and-arcs thing), decentralised parties can invent whatever classes and properties and URIs they like, benefiting from shared infrastructure (APIs, data stores, query languages) that are utterly domain neutral. Furthermore (and by contrast to typical XML usage) the descriptive vocabularies created by these different parties can be freely combined *without* prior or centralised agreement by those parties. For example, look at http://search.cpan.org/src/ASCOPE/Net-Flickr-Backup-2.6/README and the list of namespaces used. I'll repeat them here: <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:a="http://www.w3.org/2000/10/annotation-ns" xmlns:acl="http://www.w3.org/2001/02/acls#" xmlns:exif="http://nwalsh.com/rdf/exif#" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:cc="http://web.resource.org/cc/" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:exifi="http://nwalsh.com/rdf/exif-intrinsic#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:flickr="x-urn:flickr:" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:i="http://www.w3.org/2004/02/image-regions#"> Now, OK some familiar faces show up when you look behind the scenes at who created those schemas (well we're a small but growing community!). However the development of these schemas did not *need* pairwise or central coordination, and the author of the Perl Net::Flickr::Backup module (Aaron Straup Cope) absolutely did not need anyones permission to recombine these schemas to create image descriptions which integrate data by using them all. I'll dwell on this point a bit longer as it is a key one, and at risk of being lost in the social history of the Semantic Web. The push for RDF came in good part from people who were sick of sitting in metadata standardisation meetings, and in dealing with scoping overlaps. The RDF design is heavily decentralistic compared to some other approaches that could have been taken. In earlier RDFS drafts we made some of this heritage more explicit, see eg http://www.w3.org/TR/2000/CR-rdf-schema-20000327/ [[ RDF and the RDF Schema language were also based on metadata research in the Digital Library community. In particular, RDF adopts a modular approach to metadata that can be considered an implementation of the Warwick Framework [WF]. RDF represents an evolution of the Warwick Framework model in that the Warwick Framework allowed each metadata vocabulary to be represented in a different syntax. In RDF, all vocabularies are expressed within a single well defined model. This allows for a finer grained mixing of machine-processable vocabularies, and addresses the need [EXTWEB] to create metadata in which statements can draw upon multiple vocabularies that are managed in a decentralized fashion by independent communities of expertise. ]] The Warwick Framework was a conceptualisation of the metadata problem space from the Dublin Core community in 1996; see http://www.dlib.org/dlib/july96/lagoze/07lagoze.html ... it proposed a way of breaking descriptive tasks down into scoped 'packages'. Quoting from the 1996 dlib paper, [[ The result of the Warwick Workshop is a container architecture, known as the Warwick Framework. The framework is a mechanism for aggregating logically, and perhaps physically, distinct packages of metadata. This is a modularization of the metadata issue with a number of notable characteristics. * It allows the designers of individual metadata sets to focus on their specific requirements, without concerns for generalization to ultimately unbounded scope . * It allows the syntax of metadata sets to vary in conformance with semantic requirements, community practices, and functional (processing) requirements for the kind of metadata in question. * It separates management of and responsibility for specific metadata sets among their respective "communities of expertise". * It promotes interoperability by allowing tools and agents to selectively access and manipulate individual packages and ignore others. * It permits access to the different metadata sets that are related to the same object to be separately controlled. * It flexibly accommodates future metadata sets by not requiring changes to existing sets or the programs that make use of them. The separation of metadata sets into packages does not imply that packages are completely semantically distinct. In fact, it is a feature of the Warwick Framework that an individual container may hold packages, each managed and maintained by distinct parties, which have complex semantic overlap. ]] In someways RDF is a realisation of this abstract architecture. But with RDF we really went further in exploring the issue of semantic overlap amongst different metadata 'packages'. By imposing a common data model across all metadata packages, we make it possible for apps to express data and queries that combine for example, rights metadata, geographic, discovery, workflow, tagging or any other RDF-expressed characteristics. In this conceptualisation, we are buying more decentralisability at the expense of imposing a shared data model. Regarding your point about decentralisation, I think RDF compares rather well with XML. Anyone can invent an XML schema and deploy it; the technology allows XML elements and attributes to be used in wildly varying ways. In RDF's XML syntax(es), the notation is always an encoding of a set of RDF claims about the world. We have a common set of rules to help interpret this, making it easier rather than harder to process and integrate data from unknown namespaces. If I see a new RDF schema, I know that it defines classes, properties and not a lot else. This lowers some costs (and raises some others, sure; nothing for free here). RDF takes expressive power away from those that define schemas, such that they all have a lot more in common. It's a shared pattern for schema authors designed to allow them to get on with their job and not have to fly to meetings with each other. By agreeing that data modeling work once instead of pairwise, we save on a lot of airfares, and a lot of teleconferences. > - requires much more cooperation from data sources (e.g. link targets) I suspect some confusion about 'link targets' here. In classic Web, a link target is the thing you're pointing to. In the Semantic Web project, we can describe anything that the classic Web might link to; and beyond that, we can use reference-by-description techniques to talk about things indirectly, via their descriptions. No consent needed. I can talk about 'the person whose homepage is http://john.example.com/' for example. > - is based on a complex markup (RDF) that's difficult for non-programmers to work with Two flavours of complexity here: 1. Each RDF notation (RDF/XML, RDFa, ... custom GRDDL-ready formats) has some (varying) difficulty associated with learning the encoding. And an associated fragility risk: a misunderstanding or error could mess up the entire chunk of data if there's a mistake. RDF notations have not traditionally had any form of recovery from this, ie. nothing like the 'quirks mode' that Web browsers have, where bad HTML is still somehow converted into a user-facing document. 2. Merely having a distinction between abstract data model vs markup(s) is a level of indirection that can be confusing, especially without fantastic tool support, tutorial materials etc. These are real issues. But HTML itself is also difficult for non-programmers to work with *well*. Which is why so many sites don't give reliable cross-browser experience (people code for IE; as MacOSX Firefox user I suffer often enough when visiting bad HTML sites). Perhaps the difference here is that crappy HTML coding leads to sometimes-crappy Web experience; crappy RDF coding leads to ... no data at all from that document. The use of RDFa in an HTML5 context is where this part of the discussion goes next: it should be possible to mix semantic markups into environments where non-draconian error handing is the rule. The microformats folk do this, for example. We're all still figuring out exactly what the tradeoffs are here: how much mess to allow before things become too scruffy for our poor machines to have any idea what's happening? > - has to compete with its predecessor and many other technologies I view this as a misunderstanding. It may be cleanest to think of the "Semantic Web" simply as a project. When http://www.w3.org/2001/sw/ says "The Semantic Web is a Web of data" it isn't talking about any other Web but the one we know and love. Read it as 'The-Web-made-more-Semantic is a Web of data', perhaps. > - has very little commercial interest, unclear revenue model, etc. There may be no 'make money fast' route akin to the crazy dot-com days, but I see here more a 'chicken and egg' issue (which you allude to above). While RDF can be used on a single site, unless there is a lot of it around, nobody's going to bother building a planet-scale index of it. And unless there's a planet-scale index and it's being used by major search engines, people won't have an incentive to publish a lot of RDF in the public Web. If things turn out well, publishing RDF should help drive users to classic Web sites, where they'll be parted from their money through various essentially timeless techniques. Some things change; some stay the same. Re chicken-and-egg., ... I think we've done a bit to break that cycle in the FOAF scene. In recent months Google's Social Graph API has been indexing much of the public FOAF data out there, and more recently still has been using a real RDF parser. While this isn't currently integrated into Google's main user facing search, I am very encouraged by these developments and by those at Yahoo around RDF/RDFa. It has taken a while but we're getting there. My other thought re "critical mass" has been that SemWeb adoption is difficult because as a fundamentally cross-domain technology, we only really show strong benefits, ie. the technology's key strengths, when used in several overlapping fields simultaneously. And as a representation system where data can always be missing, and always be extended/augmented, it can take a lot of data before there is enough to reliably index in certain ways. My answer to this (besides FOAF) is to suggest that SemWeb may perhaps take of in a few large cities first. Geographical proximity could allow critical mass of early-adopter data even without things going RDF crazy planet-wide. Some of us put in an EU project proposal on this a few years back, but the EU reviewers in their infinite wisdom chose not to fund it. Ah well :) > - requires a critical mass of participating sites to be interesting and useful As I say above, having a mass of data isn't essential, although nice of course. And the work can be distributed: while the Semantic MediaWiki folk are showing how built-in RDF facilities could add value to MediaWiki and Wikipedia, the DBPedia team are already showing an externally generated RDF version of Wikipedia. Participation too is optional but not required; 3rd parties can write GRDDL transforms for XML formats, or D2RQ etc adaptors for existing SQL datasets. There are a lot of scraper/extractor/converter scripts around, and a few lines of code can create a huge amount of data. These are good kinds of questions to ask, but I think somehow all a bit skewed by thinking of SW as a replacement for the Web, or as a rival for existing search engines. It may be that some fancy new search engine comes along that is fundamentally RDF-oriented, but it's also clear that there are many folk at the existing search engines who are well aware that the Web is slowly offering more by way of structured (meta)data. cheers, Dan -- http://danbri.org/
Received on Monday, 28 April 2008 15:40:11 UTC