- From: Charles McCathieNevile <charles@w3.org>
- Date: Wed, 12 Jun 2002 07:57:41 -0400 (EDT)
- To: <www-archive@w3.org>
An IRC log of a rambling interview of Dan Connolly by Dan Brickley and Charles McCN (chaals/eliza). This is an experiment, which seems to have shown that in order to get more detailed information about tools - something like documentation - the interviewers need to concentrate carefully on the topic. But here it is... danb_lap: how about someone (eg me) interviews you in IRC about your various implementation work? DanCon: interview: by all means; I have made that offer many times. chaalsBRS: So, Mr Connolly. Have you been implementing stuff? DanCon: sure, dude; check my recent-commits. chaalsBRS: Ah. Your recent commits are available for the world to inspect? DanCon: a scaffold: "Semantic Web Travel Tools" http://www.w3.org/2000/10/swap/pim/travel.html unfortunately, most of my commits aren't done in dev.w3.org; unfortunate consequence of various unsolved sysadmin issues. chaalsBRS: Then you are interested in travelling as a framework for thinking about stuff. So you can match itinieraries, or just so you can work out where you are now? DanCon: no, I'm not really interested in travel at all. I loathe manually processing travel itineraries, most of all. ElizaBRS: Why don't you just have an XML DTD to describe your itineraries - what is it that you think RDF would give you in addition? What kind of manual processing are you trying not to do any more? (Or do you just use an XML DTD for your itinerary?) DanCon: I don't control the software that sends itineraries my way. but I did write http://www.w3.org/2001/07dc-bos/grokNavItin.pl to convert it to XML. RDF/XML, in particular. XML and XSLT is fun, but using RDF/XML allows me to merge my data with stuff using other vocabularies: dublin core and such. And we have this nifty rules engine that allows me to delaratively relate my data to other vocabularies. Like an RDF representation of icalendar. which, when you put it all together, allows me to get the itineraries from the travel admin folks into evolution without manual intervention. Voila! danb_lap: DanCon, what practical problems have you encountered when merging data from multiple sources? To what extent does it 'just work'? DanCon: well, the worst practical problem is that semantic web tools don't issue error messages when you give them bad data. They just say "you're a Preson? interesting. I wonder what that is." timbl coded up a tool to report such errors, but it's painfully slow. ElizaBRS: So you have a [tool to convert travel itinieraries in some format|http://www.w3.org/2001/07dc-bos/grokNavItin.pl] to RDF, which means you can merge it with any oother kind of RDF data? DanCon: yes, ElizaBRS Does evolution then automatically understand the RDF data you produce, or are you converting that again for it? DanCon I have a tool that converts ical data in RDF syntax to ical's own syntax. i.e. I'm converting again. ElizaBRS Do you have a way to describe how to connect these two formats together via the chained conversion? DanCon umm... yeah; a Makefile ;-) danb_lap DanCon, to what extent have XML-oriented tools and specs proved useful in this work? XSLT, XML Schema, Namespaces, XML Query, XPath etc? Has working in RDF made some of these tools inapplicable to your app? DanCon namespaces are indispensable. XPath and XSLT come in handy pretty often. I messed around with XML Schema but haven't used it much lately... ooh! except for datatypes; I'm using them, reflected into RDF. Haven't learned XQuery yet. danb_lapfwiw, that exactly matches my experience. I'd add XHTML into the mix, and CSS (for scraping/extracting). Also SOAP has some useful / intriguing parts (the Encoding syntax). ElizaBRS So that if you get data that is labelled as (SABRE?) output you can find tools to convert it to ical? DanCon sorta; these tools are programs, not products. (the difference between a program and a product is: a program works when invoked by the author. A might work even if somebody else invokes it) ElizaBRS If you converted into some other intermediary XML syntax (rather than RDF) what would you lose? DanCon I'm not sure, I have made other uses of the itinerary-in-RDF; I've got tools to project them on a globe, now. ElizaBRSi s RDF just an arbitrary intermediary syntax, or does it offer something useful that you wouldn't have with chaalsML? DanCon It's nice to have a format like RDF that allows me to just slap any old facts I want in the file, without worrying about syntactic clashes between namespaces. I dunno if it's essential, but it's nice. as I say, a downside is that if I slap a fact in with a misspelled term, it's not detected, usually... i.e. the tools for detecting that sort of problem are really raw. ElizaBRS Are the error-checking tools better for arbitraryML? DanCon well, DTD tools are certainly more mature. witness validator.w3.org and XML schema checking is coming along. RelaxNG is pretty straightfoward. schematron is more flexible... hmm... schematron might be an interesting architecture for finding typos in RDF documents faster than timbl's validate.n3 danb_lap Schematron is wonderful. Simple, useful, modest. ElizaBRS But isn't it possible to have an XML schema (small-s) for some RDF-encoded information? Which would enable you to leverage the more mature tools but still have RDF. DanCon yes, I could develop an XML schema for my vocabularies; but I usually change them so fast that it's hard to justify the cost. ElizaBRS I am assuming here that you can twist an RDF graph into a particular shape if you use it often enough to justify. snapshot and chump it... DanCon I suppose I could twist RDF into certain shapes... yes, in fact I do that as a pre-processing step before XSLT processing, more than occasionally. ElizaBRS But you do things with the information in RDF where you haven't bothered to twist them around first? DanCon er... "do things"? I don't understand the question. ElizaBRS Sometimes you twist the information into the right XML form for XSLT. So I presume that sometimes you use RDF tools to merge and manipulate it as RDF (I think you said as much already but I am looking for an explicit confirmation) DanCon ah, yes, I often use RDF tools on the data... tools that take RDF in any shape. (modulo bugs) ElizaBRS Some examples? (This answers the question above about what you would lose by using a different XML intermediary representation - the ability to use the RDF tools) DanCon well, for example, I write down constraints on the itinerary... like "itineraries that get me home before 6pm on Tuesday are better than those that don't" -- stuff that my wife and I have agreed -- in an extended form of RDF, that includes rules. Then I use these rules to check the itinerary. i.e. i use them to select among multiple proposed itineraries. ElizaBRS "an extended form of RDF". Could you explain this a bit more? DanCon well, RDF 1.0 encodes simple sets of facts. "The cat is grey. The dog is brown. Something is blue." We're experimenting with what we think will be the next level of RDF... stuff like "If the cat is green, then it's sick." ElizaBRS Presumably tools that can only understand RDF 1.0 cannot understand the extended logic. But are you talking about new syntaxes, or just tools that have more pieces they understand? DanCon the experiment uses a new syntax. We haven't decided whether that's strictly necessary or not yet. I'm starting to think it will. ElizaBRS So presumably you have tools that can convert from one syntax to another? (or at least from the old to the new) DanCon yes, old (RDF/XML) to new (RDF/n3) works smoothly. as I say, going the other direction is still an area of research. ElizaBRS Are you using tools that work with each syntax, or converting everything to one syntax? DanCon interestingly, the new syntax is sometimes used as a quick-n-dirty authoring tool for RDF/xml. RDF/xml is kinda unmanageable in a text editor. But the analagous subset of RDF/n3 is fairly easy to read/write. my main tool, cwm, reads and writes both. ElizaBRS (And do you have a sense of whether what you are doing is typical among the user community?) DanCon I don't claim anything I'm doing is typical! DanCon hm... I know at least one other developer, Mike Dean, is doing a lot of similar work. ElizaBRS But you must have a gut feeling for whether it is unique, or something that a lot of people are doing. And maybe you know there is a large group of people doing things differently... DanCon "doing things"... again... I don't get the gist of the question. ElizaBRS Apparently Chaals thinks that RDF/xml is reasonable to deal with in a text editor, but mostly uses a graphic interface to edit RDF. danb_lap There are a few tools that consume and produce N3 / RDF-rules content. Cwm, cwmclone, Euler; Eep too?. I don't know how good the interop is... DanConI know of two people that edit RDF/xml by hand with a text editor. They're weird. ;-) danb_lap In RDF IG mailing list circles, I have noticed some concerns that doing N3 is too niche. danb_lap wonders if he is one of the two (or a 3rd) ElizaBRS Do you know of a lot of people who are using RDF/xml who don't thin there is a need for RDF/n3, or is RDF/xml only useful for people doing simplistic things danb_lap My brain has a built in RDF parser now DanCon using graphical tools to edit RDF is something i'm jealous of. I hope the tools mature on platforms I use! DanCon ElizaBRS, we've come a long way from travel tools. I'm not really hear to speak about such generalities. ElizaBRS agrees. It must be late here... danb_lap OK, travel tools. A question about merging. If you merge from 20 sources, and they each describe some of the same events (eg. conferences, meetings, flights) slightly differently, has that proved annoying in your practical experience? DanCon can't say. I haven't tried merging from 20 different sources. I have made use of data from, say 4 or 5 different sources. It took some effort, but a satisfyingly small amount of effort. sources like: census gazetter to map zip codes to lat/long. had to scrape that one from HTML. danb_lap I've run into a few cases where the same entity got described just slightly differently from 2 or more trusted sources. Things get a bit heuristic at that point, but bearable. DanCon Mike Dean of daml.org made some airport lat/long stuff available in RDF. (I think he scrapes HTML to get it). That was easier to use. ElizaBRS How many scrapers or transformations do you use? DanCon I have used probably 5 to 15, mostly scrapers I developed, but not all. ElizaBRS One of which is the transform from a travel itinerary you mentioned earlier... DanCon oh; you include transformations? now we're up into the 50s or so. ElizaBRS How many sources of RDF are you using with that? Presuming that there is more than just the airport lat/long stuff that you are getting from someone else. DanCon independent sources? or should I count, say, the W3C tech report index as a separate source from the list of W3C working groups? DanCon oops; I think I missed the question... you're asking how many sources go into the travel tools... ElizaBRS Yes. Is all your RDF coming from your own scrapers, or are you merging things you scrape with things someone else provides s RDF (possibly scraped but not by you) DanCon I'd say 3 so far: (1) the travel agent, (2) the daml.org lat/long data, and (3) some stuff I maintain by hand. DanCon e.g. mappings from airport names used by the travel agent to international airport codes. (and mappings from airports to timezones; I should be able to scrape those, but I haven't bothered yet.) ElizaBRS You maintain a list of airport to code mappings? Presumably this is because you haven't found someone else that does? DanCon yeah; I haven't found any way, other than maintaining a table by hand, to correlate the names used by the travel agent in their ASCII format, e.g. "CHICAGO OHARE", with the IATA codes (ORD) DanCon hmm... is there a foaf: or contact: analog to mailbox: for "business that owns phone tel:..."? ElizaBRS And you are merging this with ical data that you are transforming to RDF, then transforming the information back to ical? DanCon the ical stuff is one way so far; just to ical, not back. ElizaBRS So you are just adding data to an ical collection? DanCon yup. my evolution calendar, in fact. I think timbl's working on getting data out, checking consistency, and a fairly general sync protocol. I have my doubts about the latter. ElizaBRS Why? DanCon it's a fairly long, technical story; I can't think of a short explanation. ElizaBRS Does it take a technical person to produce a transformation / scraper? DanCon well, yes... somebody technical enough to do perl/python/XSLT or Java hacking, generally.DanConoops; I keep missing the word "transformation" ElizaBRS Or can you conceive (or point to ;-) tools that allow people to build them by simpler means than programming? DanCon scrapers go at the edge of the semantic web; they're messy. Leave that to the hackers, I suggest. as for transformations... there are graphical query tools that are approaching general usability. I don't see why transformation tools shouldn't go that way. yy[Z] Elizabrs sed, cut, tr, bash, and wget - all you need ElizaBRS For example, could you build a wizard that allowed someone who understood HTML-type markup to identify the things to collect and where to putn them in a visually-represented labelled graph? DanCon by "scraper" I mean: something that, heuristically, converts from HTML/text/whatever to RDF. I'm using "transformation" to mean an RDF->RDF mapping, which is by nature more precise. wizard: I gather JimH's group at u. md are doing just that. see recent chump entries. hmm... so maybe scrapers can be developed by non-hackers after all. But it shouldn't be done lightly. You've got to be aware of the issues around formalizing stuff. yy[Z] would seem easier to port html and RDF into openoffice and add agents to gradually bridge the middle ElizaBRS So you would expect to see tools for making scrapers that require understanding of markup, and of the theory of programming, without requiring "actual programming" skills? DanCon not the theory of programming; perl hackers don't even need that ;-) I said the issues around formalizing stuff... people will readily conclude "Chicago, IL" and "Chicago" mean the same thing. But machines are too stupid for that. To successfully formalize something, you have to realize how stupid machines are. ElizaBRS There is at least one tool that allows the creation of queries by making a visual template. DanCone.g. my church sends out a directory in a fairly formal looking spreadsheet... columns for last name, first name, street address, phone number. but they sometimes sneak more than one phone number into the "phone number" field; one for work, one for home. a straightfoward spreadsheet-to-RDF scraper will probably not grok such shortcuts. ElizaBRS Can you see tools that will allow people to say "last name" in your church directory is the same as "family name" in my telephone directory? DanCon last/family: sure. that's pretty much the "hello world" of RDF/n3. er... the "hello world" of RDF schema, in fact. danbrire schematron, http://www.ascc.net/xml/resource/schematron/wai.xml seems halfway between 'scraper' and 'schema' (http://www.ascc.net/xml/resource/schematron/WAI-example.html for more info) ElizaBRS or to say that the numbers after the first comma in the phone number field are "other phone", and the number before it are "primary phone" DanCon comma... now you're parsing below the XML element level... doable, but messy. perl is the tool of choice for that sort of thing, but that's a lot of rope to hang yourself with. XSLT is less rope, and slightly more tedium... we recently added a string:scrape feature to cwm for just such occastions. Very quick-n-dirty! ElizaBRS When I get travel information is it in the form 12JUN NCE-LHR EZ295 0945 1020 Y. I should be able to compare this to your information by understanding the formalism and using a tool to convert this to RDF using the vocabulary you have, no? deltab by the way, have you seen perl6's proposed regex syntax? you'd write something like this:deltabpri_phone := (<telno>) add_phone := (<[,;]> \s* <telno>)* / DanCon 12JUN... yes, I think so. (what's "EZ295"?) ElizaBRS EZ295 is the flight number (easyJet 295) DanCon ah. yes. deltaband the Y? DanCon fare code I think. I haven't messed much with those. ElizaBRS Y == fare code. I could build a table of fare codes by airline and find out whether your journey will be more comfortable than mine ;-) Or at least find out that I don't know... ElizaBRS apologises for being fairly general here and not getting to the details of tools (which was I think the original idea) and goes to sleep. ElizaBRS next time we should concentrate on code. Thanks Dan and others.
Received on Wednesday, 12 June 2002 07:57:46 UTC