RDF and Travel tools (general and rambling) (fwd)

An IRC log of a rambling interview of Dan Connolly by Dan Brickley and
Charles McCN (chaals/eliza). This is an experiment, which seems to have shown
that in order to get more detailed information about tools - something like
documentation - the interviewers need to concentrate carefully on the topic.
But here it is...

danb_lap: how about someone (eg me) interviews you in IRC about your various
implementation work?

DanCon: interview: by all means; I have made that offer many times.

chaalsBRS: So, Mr Connolly. Have you been implementing stuff?

DanCon: sure, dude; check my recent-commits.

chaalsBRS: Ah. Your recent commits are available for the world to inspect?

DanCon: a scaffold: "Semantic Web Travel Tools"
http://www.w3.org/2000/10/swap/pim/travel.html unfortunately, most of my
commits aren't done in dev.w3.org;  unfortunate consequence of various
unsolved sysadmin issues.

chaalsBRS: Then you are interested in travelling as a framework for thinking
about stuff. So you can match itinieraries, or just so you can work out where
you are now?

DanCon: no, I'm not really interested in travel at all. I loathe manually
processing travel itineraries, most of all.

ElizaBRS: Why don't you just have an XML DTD to describe your itineraries -
what is it that you think RDF would give you in addition? What kind of manual
processing are you trying not to do any more? (Or do you just use an XML DTD
for your itinerary?)

DanCon: I don't control the software that sends itineraries my way.  but I
did write http://www.w3.org/2001/07dc-bos/grokNavItin.pl to convert it to
XML. RDF/XML, in particular. XML and XSLT is fun, but using RDF/XML allows me
to merge my data with stuff using other vocabularies: dublin core and such.
And we have this nifty rules engine that allows me to delaratively relate my
data to other vocabularies. Like an RDF representation of icalendar. which,
when you put it all together, allows me to get the itineraries from the
travel admin folks into evolution without manual intervention.  Voila!

danb_lap: DanCon, what practical problems have you encountered when merging
data from multiple sources? To what extent does it 'just work'?

DanCon: well, the worst practical problem is that semantic web tools don't
issue error messages when you give them bad data. They just say "you're a
Preson?  interesting. I wonder what that is." timbl coded up a tool
to report such errors, but it's painfully slow.


ElizaBRS: So you have a [tool to convert travel itinieraries in some
format|http://www.w3.org/2001/07dc-bos/grokNavItin.pl] to RDF, which means
you can merge it with any oother kind of RDF data?

DanCon: yes,

ElizaBRS Does evolution then automatically understand the RDF data you
produce, or are you converting that again for it?

DanCon I have a tool that converts ical data in RDF syntax to ical's own
syntax. i.e. I'm converting again.

ElizaBRS Do you have a way to describe how to connect these two formats
together via the chained conversion?

DanCon umm... yeah; a Makefile ;-)

danb_lap DanCon, to what extent have XML-oriented tools and specs proved
useful in this work? XSLT, XML Schema, Namespaces, XML Query, XPath etc? Has
working in RDF made some of these tools inapplicable to your app?

DanCon namespaces are indispensable. XPath and XSLT come in handy pretty
often. I messed around with XML Schema but haven't used it much lately...
ooh! except for datatypes; I'm using them, reflected into RDF. Haven't
learned XQuery yet.

danb_lapfwiw, that exactly matches my experience. I'd add XHTML into the mix,
and CSS (for scraping/extracting). Also SOAP has some useful / intriguing
parts (the Encoding syntax).

ElizaBRS So that if you get data that is labelled as (SABRE?) output you can
find tools to convert it to ical?

DanCon sorta; these tools are programs, not products. (the difference between
a program and a product is: a program works when invoked by the author. A
might work even if somebody else invokes it)

ElizaBRS If you converted into some other intermediary XML syntax (rather
than RDF) what would you lose?

DanCon I'm not sure, I have made other uses of the itinerary-in-RDF; I've got
tools to project them on a globe, now.

ElizaBRSi s RDF just an arbitrary intermediary syntax, or does it offer
something useful that you wouldn't have with chaalsML?

DanCon It's nice to have a format like RDF that allows me to just slap any
old facts I want in the file, without worrying about syntactic clashes
between namespaces. I dunno if it's essential, but it's nice. as I say,
a downside is that if I slap a fact in with a misspelled term, it's not
detected, usually... i.e. the tools for detecting that sort of problem are
really raw.

ElizaBRS Are the error-checking tools better for arbitraryML?

DanCon well, DTD tools are certainly more mature. witness validator.w3.org
and XML schema checking is coming along. RelaxNG is pretty straightfoward.
schematron is more flexible... hmm... schematron might be an interesting
architecture for finding typos in RDF documents faster than timbl's
validate.n3

danb_lap Schematron is wonderful. Simple, useful, modest.

ElizaBRS But isn't it possible to have an XML schema (small-s) for some
RDF-encoded information? Which would enable you to leverage the more mature
tools but still have RDF.

DanCon yes, I could develop an XML schema for my vocabularies; but I usually
change them so fast that it's hard to justify the cost.

ElizaBRS I am assuming here that you can twist an RDF graph into a particular
shape if you use it often enough to justify. snapshot and chump it...

DanCon I suppose I could twist RDF into certain shapes... yes, in fact I do
that as a pre-processing step before XSLT processing, more than occasionally.

ElizaBRS But you do things with the information in RDF where you haven't
bothered to twist them around first?

DanCon er... "do things"? I don't understand the question.

ElizaBRS Sometimes you twist the information into the right XML form for
XSLT. So I presume that sometimes you use RDF tools to merge and manipulate
it as RDF (I think you said as much already but I am looking for an explicit
confirmation)

DanCon ah, yes, I often use RDF tools on the data... tools that take RDF in
any shape. (modulo bugs)

ElizaBRS Some examples? (This answers the question above about what you would
lose by using a different XML intermediary representation - the ability to
use the RDF tools)

DanCon well, for example, I write down constraints on the itinerary... like
"itineraries that get me home before 6pm on Tuesday are better than those
that don't" -- stuff that my wife and I have agreed -- in an extended form of
RDF, that includes rules. Then I use these rules to check the
itinerary. i.e. i use them to select among multiple proposed
itineraries.

ElizaBRS "an extended form of RDF". Could you explain this a bit more?

DanCon well, RDF 1.0 encodes simple sets of facts. "The cat is grey. The dog
is brown. Something is blue." We're experimenting with what we think will be
the next level of RDF... stuff like "If the cat is green, then it's sick."

ElizaBRS Presumably tools that can only understand RDF 1.0 cannot understand
the extended logic. But are you talking about new syntaxes, or just tools
that have more pieces they understand?

DanCon the experiment uses a new syntax. We haven't decided whether that's
strictly necessary or not yet.  I'm starting to think it will.

ElizaBRS So presumably you have tools that can convert from one syntax to
another? (or at least from the old to the new)

DanCon yes, old (RDF/XML) to new (RDF/n3) works smoothly. as I say, going the
other direction is still an area of research.

ElizaBRS Are you using tools that work with each syntax, or converting
everything to one syntax?

DanCon interestingly, the new syntax is sometimes used as a quick-n-dirty
authoring tool for RDF/xml. RDF/xml is kinda unmanageable in a text editor.
But the analagous subset of RDF/n3 is fairly easy to read/write. my main
tool, cwm, reads and writes both.

ElizaBRS (And do you have a sense of whether what you are doing is typical
among the user community?)

DanCon I don't claim anything I'm doing is typical!

DanCon hm... I know at least one other developer, Mike Dean, is doing a lot
of similar work.

ElizaBRS But you must have a gut feeling for whether it is unique, or
something that a lot of people are doing. And maybe you know there is a large
group of people doing things differently...

DanCon "doing things"... again... I don't get the gist of the question.

ElizaBRS Apparently Chaals thinks that RDF/xml is reasonable to deal with in
a text editor, but mostly uses a graphic interface to edit RDF.

danb_lap There are a few tools that consume and produce N3 / RDF-rules
content. Cwm, cwmclone, Euler; Eep too?. I don't know how good the interop
is...

DanConI know of two people that edit RDF/xml by hand with a text editor.
They're weird. ;-)

danb_lap In RDF IG mailing list circles, I have noticed some concerns that
doing N3 is too niche. danb_lap wonders if he is one of the two (or a 3rd)

ElizaBRS Do you know of a lot of people who are using RDF/xml who don't thin
there is a need for RDF/n3, or is RDF/xml only useful for people doing
simplistic things

danb_lap My brain has a built in RDF parser now

DanCon using graphical tools to edit RDF is something i'm jealous of. I hope
the tools mature on platforms I use!

DanCon ElizaBRS, we've come a long way from travel tools. I'm not really hear
to speak about such generalities.

ElizaBRS agrees. It must be late here...

danb_lap OK, travel tools. A question about merging. If you merge from 20
sources, and they each describe some of the same events (eg. conferences,
meetings, flights)  slightly differently, has that proved annoying in your
practical experience?

DanCon can't say. I haven't tried merging from 20 different sources. I have
made use of data from, say 4 or 5 different sources. It took some effort, but
a satisfyingly small amount of effort. sources like:  census gazetter to map
zip codes to lat/long. had to scrape that one from HTML.

danb_lap I've run into a few cases where the same entity got described just
slightly differently from 2 or more trusted sources. Things get a bit
heuristic at that point, but bearable.

DanCon Mike Dean of daml.org made some airport lat/long stuff available in
RDF. (I think he scrapes HTML to get it).  That was easier to use.

ElizaBRS How many scrapers or transformations do you use?

DanCon I have used probably 5 to 15, mostly scrapers I developed, but not
all.

ElizaBRS One of which is the transform from a travel itinerary you mentioned
earlier...

DanCon oh; you include transformations? now we're up into the 50s or so.

ElizaBRS How many sources of RDF are you using with that? Presuming
that there is more than just the airport lat/long stuff that you are getting
from someone else.

DanCon independent sources? or should I count, say, the W3C tech report index
as a separate source from the list of W3C working groups?

DanCon oops; I think I missed the question... you're asking how many sources
go into the travel tools...

ElizaBRS Yes. Is all your RDF coming from your own scrapers, or are you
merging things you scrape with things someone else provides s RDF (possibly
scraped but not by you)

DanCon I'd say 3 so far: (1) the travel agent, (2) the daml.org lat/long
data, and (3) some stuff I maintain by hand.

DanCon e.g. mappings from airport names used by the travel agent to
international airport codes. (and mappings from airports to timezones; I
should be able to scrape those, but I haven't bothered yet.)

ElizaBRS You maintain a list of airport to code mappings?  Presumably this is
because you haven't found someone else that does?

DanCon yeah; I haven't found any way, other than maintaining a table by hand,
to correlate the names used by the travel agent in their ASCII format, e.g.
"CHICAGO OHARE", with the IATA codes (ORD)

DanCon hmm... is there a foaf: or contact: analog to mailbox: for "business
that owns phone tel:..."?

ElizaBRS And you are merging this with ical data that you are transforming to
RDF, then transforming the information back to ical?

DanCon the ical stuff is one way so far; just to ical, not back.

ElizaBRS So you are just adding data to an ical collection?

DanCon yup. my evolution calendar, in fact. I think timbl's working
on getting data out, checking consistency, and a fairly general sync
protocol. I have my doubts about the latter.

ElizaBRS Why?

DanCon it's a fairly long, technical story; I can't think of a short
explanation.

ElizaBRS Does it take a technical person to produce a transformation /
scraper?

DanCon well, yes... somebody technical enough to do perl/python/XSLT or Java
hacking, generally.DanConoops; I keep missing the word "transformation"

ElizaBRS Or can you conceive (or point to ;-)  tools that allow people to
build them by simpler means than programming?

DanCon scrapers go at the edge of the semantic web; they're messy.  Leave
that to the hackers, I suggest. as for transformations... there are graphical
query tools that are approaching general usability. I don't see why
transformation tools shouldn't go that way.

yy[Z] Elizabrs sed, cut, tr, bash, and wget - all you need

ElizaBRS For example, could you build a wizard that allowed someone who
understood HTML-type markup to identify the things to collect and where to
putn them in a visually-represented labelled graph?

DanCon by "scraper" I mean: something that, heuristically, converts from
HTML/text/whatever to RDF. I'm using "transformation" to mean an RDF->RDF
mapping, which is by nature more precise. wizard: I gather JimH's group at u.
md are doing just that. see recent chump entries. hmm... so maybe scrapers
can be developed by non-hackers after all. But it shouldn't be done lightly.
You've got to be aware of the issues around formalizing stuff.

yy[Z] would seem easier to port html and RDF into openoffice and add agents
to gradually bridge the middle

ElizaBRS So you would expect to see tools for making scrapers that require
understanding of markup, and of the theory of programming, without requiring
"actual programming"  skills?

DanCon not the theory of programming; perl hackers don't even need that ;-) I
said the issues around formalizing stuff... people will readily conclude
"Chicago, IL" and "Chicago" mean the same thing. But machines are too stupid
for that. To successfully formalize something, you have to realize how stupid
machines are.

ElizaBRS There is at least one tool that allows the creation of queries by
making a visual template.

DanCone.g. my church sends out a directory in a fairly formal looking
spreadsheet... columns for last name, first name, street address, phone
number. but they sometimes sneak more than one phone number into the "phone
number" field; one for work, one for home. a straightfoward
spreadsheet-to-RDF scraper will probably not grok such shortcuts.

ElizaBRS Can you see tools that will allow people to say "last name" in your
church directory is the same as "family name" in my telephone directory?

DanCon last/family: sure. that's pretty much the "hello world" of
RDF/n3. er... the "hello world" of RDF schema, in fact.

danbrire schematron,
http://www.ascc.net/xml/resource/schematron/wai.xml seems halfway between
'scraper' and 'schema'
(http://www.ascc.net/xml/resource/schematron/WAI-example.html for more info)

ElizaBRS or to say that the numbers after the first comma in the phone number
field are "other phone", and the number before it are "primary phone"

DanCon comma... now you're parsing below the XML element level...  doable,
but messy. perl is the tool of choice for that sort of thing, but that's a
lot of rope to hang yourself with. XSLT is less rope, and slightly more
tedium... we recently added a string:scrape feature to cwm for just
such occastions. Very quick-n-dirty!

ElizaBRS When I get travel information is it in the form 12JUN NCE-LHR EZ295
0945 1020 Y. I should be able to compare this to your information by
understanding the formalism and using a tool to convert this to RDF using the
vocabulary you have, no?

deltab by the way, have you seen perl6's proposed regex syntax? you'd write
something like this:deltabpri_phone := (<telno>) add_phone := (<[,;]> \s*
<telno>)* /

DanCon 12JUN... yes, I think so. (what's "EZ295"?)

ElizaBRS EZ295 is the flight number (easyJet 295)

DanCon ah.  yes.

deltaband the Y?

DanCon fare code I think. I haven't messed much with those.

ElizaBRS Y == fare code. I could build a table of fare codes by airline
and find out whether your journey will be more comfortable than mine
;-) Or at least find out that I don't know...

ElizaBRS apologises for being fairly general here and not getting to the
details of tools (which was I think the original idea) and goes to sleep.

ElizaBRS next time we should concentrate on code.

Thanks Dan and others.

Received on Wednesday, 12 June 2002 07:57:46 UTC