- From: Jeen Broekstra <jeen@aduna.biz>
- Date: Wed, 14 Sep 2005 10:12:01 +0200
- To: Mailing Lists <list@thirdstation.com>
- CC: semantic-web@w3.org
Mailing Lists wrote: > Does anyone on the list have some real-world stories to share about > using RDF and its tools as a backend technology? The company I work for > maintains a database of metadata. I'd like to explore using RDF instead > of our current schemas. Aduna (http://aduna.biz/) uses the Sesame RDF store (http://www.openrdf.org/) as backend technology in its products: AutoFocus, Metadata Server and Spectacle. These are all tools for navigating/browsing/searching/visualizing large amounts of information (typically things like corporate intranets or databases). > For example: I have a lot of data about books. I'd like to translate > the data into RDF/XML and dump it into an RDF database. Then, taking a > particular book, I'd like to query the database to extract related > information like: other books by the same author, other books with the > same subject code, etc. > > My concerns relate to: > 1) Performance -- Right now we query the database using SQL. Sometimes > it is _very_ slow. That's mainly because the data is distributed across > tables and there are a lot of joins going on. It seems like using RDF > would allow us to use simple queries. Partly true, however, the performance you get with a store is dependent on its backend implementation, and different query patterns may greatly influence performance. For example in Sesame, there are three different backends: in-memory (with file-dump for persistence), native on-disk storage, and RDBMS storage (MySQL/PostgreSQL/Oracle). Each has its own strong points and limitations. The in-memory store is fastest but of course has limits to its scalability (we observed it uses roughly 170 bytes of RAM memory per triple on a 64-bit architecture). The native store is the next-fastest but currently is also limited in scalability, mainly because we have not yet had enough time to further develop the indexing strategy. This also means that for some types of query patterns it is very fast but for others (for which there is no index) it can be deadly slow. The RDBMS backend is typically most scalable but also the slowest in both adding and querying: queries with long path expressions tend to be translated to multi-join SQL queries, which are quite expensive. Of course, a lot here depends on the configuration of your RDBMS. > 2) Scalability -- Our triplestore would be HUGE. I'd estimate 10-20 > Million triples. Is that small or large in RDF circles? It's a lot, but most serious triple stores can handle this I'd say. Sesame certainly can, and I expect Kowari can as well. > 3) Productivity -- It's usually easier for me to envision creating RDF > from our source data than massaging the data to fit into our database > schema. The same goes for when I'm extracting data - it seems like it > would be much easier to express my query as a triple using wildcards for > the data I want. I think you've hit on one of the main benefits of RDF, in fact one of the main reasons our company chose to use RDF as the backend technology for its tooling: flexibility. The open model allows one to flexibly adapt representation of the data, plus you can conceivably map any kind of source to an RDF graph relatively easily. > Any information will be helpful. I'm interested in learning from other > peoples' experiences. It takes time to get this right. Most triple stores are new and therefore not so robust that everything works - for your situation - out of the box. But with some patience and willingness to do an initial investment in setting up your architecture right, you can gain a lot in terms of flexibility later on IMHO. What we have seen in our products (we switched from an internal dedicated storage format to the more generic RDF framework) is an initial sharp performance drop of the tools, followed by a gradual increase (as we understood the problems better and fixed them one by one) again. I'd say that if your primary concern is query performance then perhaps using a triple store is not the way to go, but if flexibility, ease of adaptation, ease of interoperation with other tools (etc.) also play a significant role, it may certainly be worth the cost. HTH, Jeen
Received on Wednesday, 14 September 2005 08:12:46 UTC