- From: Frank Carvalho <dko4342@vip.cybercity.dk>
- Date: Wed, 22 Aug 2007 15:22:37 -0700 (PDT)
- To: semantic-web@w3.org
Hi, and thanks already for some very good and relevant answers. Richard Newman wrote: > I would very much suggest using a dedicated RDF store (any one >would do), rather than storing the XML serialization of the RDF graph >in an XML database. You will gain the ability to run queries against >the graph, rather than just one of its possible tree serializations, >and your scalability problem goes away (for a while, at least). Well, I don't really understand if there is any theoretical difference between querying the XML serialization and the graph itself, if the serialization is in fact a representation of the graph. What do you mean when you say "tree serialization", BTW? The only serialization I work with is a large set of triples. I do reckon though that a dedicated store of course may be a lot more efficient than a general purpose XML database. > cwm is not really designed for large-scale storage. No, I kind of suspected that from it's behaviour. It's really a shame. It was descibed as a sort of RDF swiss army knife, and on small graphs it seems to be able to merge graphs nicely. But when I started to load large graphs, it came up with odd errors. >Take a look at >this list of alternative systems on the ESW Wiki: ><http://esw.w3.org/topic/ >SemanticWebTools#head-805c63479c854babe4657d5184de605910f6d3e2> > > If you're dealing with large graphs (>100M triples), you might find this > list useful. > ><http://esw.w3.org/topic/LargeTripleStores> Very helpful, thank you. I will take a look at those. Eventually I suspect we will be using very large graphs. The current ones are perhaps up to 20M, but given all the tasks we plan on using the graphs for, we are likely to increase this number significantly. > If you need to do reasoning on large graphs, your choices are more >limited, and the kind of reasoning you want to use might dictate your >solution. (I won't reveal any biases on a public forum :D) In fact we don't need reasoning so much yet. It is the "resource description" aspect that currently has the biggest importance for us. We need to be able to do a lot of forward and backward chaining, but if I am not mistaken that really is not the same as reasoning. I do expect to assign some proper Owl interprations to the UML class diagrams - and probably the contents of the entire modelling tool we're using - some day, but as I don't really see how I can explain any specific benefits to my organization by doing so, that idea has a low priority right now. (I drive this project by visible benefits). Brian McBride also wrote, and thank you also, Brian, for your inspiring answers. Comments to the comments: >> Second we are facing a challenge of controlling our >> suppliers, rather than being controlled by them. > >I'm wondering what you mean by control there. It is well known that if >a customer invests heavily in implementing systems that depend on the >characteristics of system components, e.g. using proprietary data >formats or APIs, then this creates a barrier to changing suppliers. I >was expecting you to write that because RDF is based on standards, it >would be in customer's interests to promote its use to give them the >flexibility to change supplier. But that's not what you wrote ... What I meant was that the cooperation with us as customers and our suppliers traditionally has been on the terms of the suppliers. Our organization has a lot of business knowledge, but very little professional IT experience. So historically the suppliers have had succes convincing the organisation to buy suboptimal solutions at a too high price. Our department is there to change that, and to professionalize us as customers. "Control" was perhaps a bad word. "In charge" would have been better. Technically what we do is to establish well-defined webservicees between the many systems we have, and our SOA infrastructure. We have no intentions of dictating the internal designs of the systems - some of them are old COBOL systems anyway. In a SOA, the systems are characterized entirely by their interfaces - as a black box. So we only dictate the interfaces and leave the internal system design to the suppliers (roughly speaking). We rather want to collect the documentation of those (heterogenous) solutions, connect the documentation to the main graphs (by generating more bits of RDF from the documentation), and thus enable impact analysis into the system. The purpose is, of course, to be able to assert the extent and cost of changes, by analysing the amount of derived change it may require. >Ah right. I think there are number of existing solutions that do this - >though not using RDF - e.g. IBM's metadata server. Have you looked at >that. Is there something missing from that solution that RDF would >address? We are frequently being contacted by vendors of metadata systems, and I am not surprised that IBM also has such a product. We are using Telelogic System Architect here. Also, our experiences with suppliers of metadata repositories have been very bad so far. However, my main concern is to avoid vendor lock-in and proprietary internal formats. To do so I believe it is paramount to use open portable standards to carry the meta-information. This is where RDF comes in. RDF is easier to migrate between platforms, using the same core graphs. And it it will be much easier to integrate different sources of metadata, without proprietary point-to-point system integrations. Currenly we have a big issue here about carrying information between Systinet Information Manager and Telelogic System Architect. A direct integration may prove costly. But if both tools had an import/export facility for RDF, they could at least add useful information to the same pool of metadata. I am sure I could extract all essential information from both tools entirely into RDF and make it useful in other tools. It will not solve all problems as that information still has to be interpreted to be useful, but even so, if two different tools share nodes, their graphs will be able to connect, and new information can be extracted. I think it is a big step in the right direction. >It is important to bear in mind that its best to think of RDF in terms >of its abstract syntax, i.e. a graph of nodes, rather than the RDF/XML >concrete syntax. Well, this is also how I think of it. >There are a number of systems around that will store >significant numbers of RDF triples in a relational store. We do one, >Jena (http://jena.sourceforge.net) and there are others - sesame, >mulgari, redland, etc. I'd strongly suggest you take a look at these, >or, if you really feel an XML database is the way to go - I'd like to >understand why. > >An issue with using XML is that that same RDF graph can be represented >many different ways in RDF/XML. This would make your queries dependent >on the particular way that an RDF/XML document happened to represent a >graph - and that's just - well - wrong - you would be programming to an >inappropriate level of abstraction. Yes, this is my experience too. It took me some time to understand the different weird RDF/XML notations I found at the w3c specification, until I started to see it as "syntactic sugar", which in turn means that each block of RDF/XML could be broken down into a number of simple triples. After realising that I started to ignore the more "user-friendly" syntaxes of the w3c spec, and stick to the simplest form. In fact I always reduce the graphs to simple form before I load them into the database. I looked at cwm mainly to see if it could work as a tool to break the graphs down into triples. My first attempts to break down compound expressions into triples with XQuery were not succesful, so currently I'm doing it externally before I ever load the data into the store. The RDF I generate myself is always RDF/XML in its simplest form - as triples - and the way I use the XML database is from the assumption that I deal with triples exclusively. This makes it much easier to build sensible database indexes in the XML database, where you index the node ids. Performance is not spectacular, but is currently at an acceptable enough level to be useful. I use an XML database (eXist) mainly because I have a long history with XML, XSD, XSL, and now XQuery, so I can use knowledge I already have. Also because of the portable nature of XML, XSL and XQuery, and the numerous products empowering XML. (The vendor lock-in issue), and also because I like how the database integrates with web browsers, and is easy to load and maintain etc.. In any case, as long as the core data are triples, I think that a move from RDF/XML to a dedicated RDF store can be done at any time, should it be necessary for performance reasons. >I'd be very interested in talking with you; I'm happy to share our >experience with you and am hoping to learn more about your applications >and requirements to aid in our development efforts. Well, I also hope we can continue this discussion. I have already gotten some useful links. Best Frank Carvalho -- View this message in context: http://www.nabble.com/Introducing-myself---SOA-organised-with-RDF-tf4263503.html#a12283991 Sent from the w3.org - semantic-web mailing list archive at Nabble.com.
Received on Wednesday, 22 August 2007 22:22:41 UTC