- From: Melvin Carvalho <melvincarvalho@gmail.com>
- Date: Wed, 28 Mar 2012 10:20:42 +0200
- To: Dan Brickley <danbri@danbri.org>
- Cc: Jeni Tennison <jeni@jenitennison.com>, tom.heath@talis.com, public-lod community <public-lod@w3.org>
- Message-ID: <CAKaEYhKOG+Zb6wUprAKBfbEm1giy6BxupUq-Z0W3eFkdrqaErw@mail.gmail.com>
On 28 March 2012 03:30, Dan Brickley <danbri@danbri.org> wrote: > On 27 March 2012 20:23, Melvin Carvalho <melvincarvalho@gmail.com> wrote: > > > I'm curious as to why this is difficult to explain. Especially since I > also > > have difficulties explaining the benefits of linked data. However, > normally > > the road block I hit is explaining why URIs are important. > > > > Alice: So, you want to share your in-house thesaurus in the Web as > 'Linked Data' in SKOS? > > Bob: Yup, I saw [inspirational materials] online and a few blog posts, > it looks easy enough. We've exported it as RDF/XML SKOS already. Here, > take a look... > > [data stick changes hands] > > Alice: Cool! And .. yup it's wellformed XML, and here see I parsed it > with a real RDF parser (made by Dave Beckett who worked on the last > W3C spec for this stuff, beats me actually checking it myself) and it > didn't complain. So looks fine! Ok so we'll need to chunk this up > somehow so there's one little record per term from your thesaurus, and > links between them... ...and it's generally good to make human facing > pages as well as machine-oriented RDF ones too. > > Bob: Ok, so that'll be microformats no wait microdata ah yeah, RDFa, > right? Which version? > > Alice: well RDFa yes, microdata is a kind of cousin, a mix of thinking > from microdata and microformats communities. But I meant that you'd > make a version of each page for computers to use (RDF/XML like your > test export here), ... and you'd make some kind of HTML page for more > human readers also. The stuff you mention is more about doing both > within the same format... > > Bob: Great. Which one's the most standard? What should I use? > > Alice: Well I guess it depends what you mean by standard. > [skips digression about whatwg and w3c etc notions of standards process] > [skips digression about XHTML vs XML-ish polyglot HTML vs resolutely > non-XML HTML5 flavours] > [skips digression about qnames in HTML and RDFa 1.1 versus 1.0] > > ...you might care to look at using basic HTML5 document with say the > Lite version of RDFa 1.1 (which is pretty much finished but not an > official stable standard yet at W3C) > > Bob: [makes a note]. Ok, but that's just the human-facing page, > anyway. We'd put up RDF/XML for machines too, right? Well maybe that's > not necessary I guess. I was reading something about GRDDL and XSLT > that automates the conversion, ... should we maybe generate the > RDF/XML from the HTML+RDFa or vice versa? or just have some php hack > generate both from MySQL since that's where the stuff ultimately lives > right now anyway...? > > Alice: Um, well it's pretty much your choice. Do you need RDF/XML too? > Well..... maybe, not sure... it depends. There are more RDF/XML > parsers around, they're more mature, ... but increasingly tools will > consume all kinds of data as RDF. So it might not matter. Depends why > you're doing this, really. > > Bob: Er ok, maybe we ought to do both for now, ... belt-and-braces, > ... maybe watch the stats and see what's being picked up? I'm doing > this because of promise of interestingly unexpected re-use and so on, > which makes details hard to predict by definition. > > Alice: Sounds like a plan. Ok, so each node in your RDF graph, ... > we'll need to give it a URI. You know that's like the new word for > URL, > but that includes identifiers for real world things too. > > Bob: Sure sure, I read that. Makes sense. And I can have a URI, my > homepage can have a URI, I'm not my home page blah-de-blah? > > Alice: You got it. > > Bob: Ok, so what URLs should I give the concepts in this thesaurus? > They've got all kinds of strings attached, but we've also got nicely > managed numeric IDs too. > > Alice: Right so maybe something short (URls can never be too > short...), ... so maybe if you host at your example.org server, > http://example.com/demothes/c1 then same but /c2 /c3 etc. > > ... or well you could use #c1 or #c2 etc. That's pretty much up to > you. There are pros-and-cons in both directions. > > Bob: whatever's easiest. It's a pretty plain apache2 setup, with php > if we want it, or we can batch create files if that makes more sense; > this data doesn't change much. > > Alice: Well how big is the thesaurus...? > > Bob: a couple thousand terms, each with a few relations and bits of > text; maybe more if we dig out the translations (humm should we > language negotiate those somehow?) > > Alice: Let's talk about that another day, maybe? > > Bob: And hmm the translations are versioned a bit differently? Should > we put version numbers in somewhere so it's unambiguous which > version of the translation we're using? > > Alice: Let's talk about that another day, too. > > Bob: OK, where were we? http://example.com/demothes/c1 ... sure, that > sounds fine. > > ... we'd put some content negotiated apache thing there, and make c1 > send HTML if there's a browser, or rdf/xml if they want that stuff > instead? Default to the browser / HTML version maybe? > > Alice: something like that could work. There are some howtos around. > Oh, but if c1 isn't an information resource, you'll need to redirect > with a 303 HTTP code. It's like you said with people and homepages, to > make clear which is which. > > Bob: Oh-kay... so in our SKOS graph, it's a mix of things, the bulk is > a load of descriptions of skos:Concept and there's a bit of metadata > in there about some docs, and the admin contact info, ... but yeah > it's mostly the concepts (which seems to be the skos way to talk about > thesaurus terms, sort of abstracted a bit to make translations easier, > right?) > > Alice: Yup. Well, ... remember we're breaking up your graph into > bits... like one chunk per page? > > Bob: Ah right, so is that one node in the graph per page? per ... erm, > how do they call it? [counts on fingers] subject-predicate-object... > er subject, right? Each object in my graph, er like OO object I mean, > entity, thingy... > > Alice: -thingy is good- > > Bob: Each thing in the graph, goes in one page, more or less? > > Alice: more or less. It's up to you, I guess there are best practices, > roughly the bulk of it, one page per concept, ... and then the > metadata etc you might do differently > > Bob: Ok, so c1 is one concept, c2 is another, ... they'd have links to > each other in the ... the RDF/XML files, right? And I guess the HTML > too, sure > > Alice: Sure > > Bob: so the html rdfa stuff would be <a href='c2'>something and > rel='broader' if c1 was broader than c2? > > Alice: er it might be broaderTerm, or broaderConcept, I forget... > [searches] > > Bob: ah look, yeah skos:broader, ... ok so if c2 is more broad, er > broader, more general, than c1, ... we put in the c1 HTML page a link > over to c2, and add some RDFa too, to say what the link means in > semantic rdf speak as well as clickable-link? > > Alice: [tips head on side], ... sorry I always get this stuff back to > front. Ok, slowly. c2 is broader than c1, ... 'broader' points to the > one that's broader, like you know more general, ... so let's say c1 is > the specific, detailed one. In the c1 HTML page, we'd ... > > Bob: [interrupting] would that be c1.html? like concept ID dot h t m > l, as a pattern? > > Alice: yes, you could call it that, ... it's up to you really but > obviously it's sort of conventional. But then there's another > convention of keeping > the file types out of URLs > > Bob: So in the filesystem they might be a bunch of batch-generated > HTML files called c1.html c2.html etc, but I'd keep that secret or > obscure or hide it with apache config somehow? > > Alice: For example, yes. But ok, so c1.html would be like "blah blah > blah, and then a paragraph describing concept c1 from your > thesaurus, ... which is (we say) some pretty specific topic, like er, > say "allergy to pine nuts'... and maybe c2 is just 'pine nuts' > > Bob: Well it's an engineering terminology thesaurus, but sure. I get > the idea. So we'd do <a href='/demothes/c2' rel='broader' ... > > Alice: in rdfa 1.1 lite that's property='broader', erm > property='skos:broader', ... but sure, something like that. you might > put the relationship first, it reads better. I think it means the same > formally. > > Bob: right right, ... and in c2 HTML page, we'd do the link back the > other way? is there a skos word for the opposite of broader, > skos:narrower? [searches] ... ok looks like it, ... so I'd use that? > it's sort of redundant I guess if you crawl all the pages, ... but you > have to find the pages and links somehow, ... what if I started with > some linked data agent thing on c2.html, how would it find c1.html to > find that > c1.html says that c2.html is broader? > > Alice: Good point. We can work some of this out later. There are also > sitemap files, so in page links aren't the only way to find stuff. > It's all sort of emerging best practices territory. Lots of early > adopters figuring things out, if we get this working, maybe you could > write up a case study? > > Bob: Or you could just tell me what to do. Hey, whatever happened to > rev= ... is that still in XHTML? > > Alice: Which version? I mean, ... can we talk about this later? > > Bob: Right right. But couldn't I put "rev='skos:broader'" in c2.html, ... > > Alice: [patiently] ... you could, yes. Or both... there's a lot of > flexibility in this system. In many ways it's a huge strength... > > Bob: Oh hang on, I found > http://www.w3schools.com/TAGS/att_link_rev.asp and it says rev isn't > supported by browsers; is that a problem. > > Alice: We're getting off the point a bit, ... Anyway I think Hixie > took it out of HTML5 because it wasn't being used and people found it > confusing. Or last time I looked anyway, I think it was gone. > > Bob: Righto. I can see that. So anyway, we'll make a load of HTML > pages that describe our concepts... > > Alice: Yup, and we'll redirect /demothes/c2 to a page about c2, ... so > things don't mix up information resources with non information > resources. Oh and I'm not sure w3schools is always the best reference > on this stuff... > > Bob: things on the Web and things that aren't on the Web. Ok, if not > w3schools, where should I check? > > Alice: [ignoring w3schools question] ... exactly. things that aren't > _on_ the Web. Or _in_ if you prefer. Like your concepts are a kind of > abstraction so they're not really on the Web, ... they're just > _described_ in the Web. > > Bob: so we redirect to c1.html etc? > > Alice: Sure we could do that, or if you want to keep the suffix out of > the URL, which is considered good hygeine by some, you might > for example use /demothes/about_c1 ... that's quite clean > > Bob: And if we get a content negotiated request for rdf/xml ...? ... > send that instead, ... no redirecting > > Alice: something like that, I'll check the docs for you later. It's a > bit fiddly but there are some examples around we can copy from, > httpd.conf etc > > Bob: Great. And if someone asks for the rdf/xml version of about_c1? > > Alice: Not sure, I'll have to think a bit, but ... well sending the > rdf along sounds ok. It's not quite the same as asking for c1 but ... > well sure. Why not? > > Bob: What was the other option? #c1 ? No messing around with redirects > there? Easier to bookmark? > > Alice: Well yeah, ... and to link to, ... but your data isn't tiny, > ... a few thousand concepts you said. Could be a big page fetch each > time. > > Bob: Is that a problem? How big is too big? We can cache internally so > it's not hitting the db, right? Will intelligent agents and so on be > reading this a lot? Do they choke on big files? > > Alice: Well, maybe not so intelligent. But the way URIs and URLs work, > when there's a # in them, ... that doesn't get sent to the server > and so the server doesn't see the #c1 or #c2 or #c9999 bit, ... so it > can only really send you the whole lot and the consuming code has to > make sense of it by remembering what it asked for... > > Bob: ...well maybe this is still easier. And we can content negotiate > still, right? > > Alice: sure. HTML+RDFa or RDF/XML or ... you heard of turtle and > ntriples and there's this thing called json-ld ... but don't worry > about that for now. Let's just think about RDF/XML and HTML+RDFa > today, eh? > > Alice: [thinking...] well maybe just one of those would do, ... but > it's not hard to generate both. > > Bob: Alright, so one big HTML+RDFa file with the thesaurus in it, in > SKOS triples but prettied up a bit with CSS? Sounds ok... > > Alice: and a big RDF/XML doc too, if they ask for that instead > > Bob: got it. So ... hang on, back up a bit, ... if we're in one big > HTML page, and I'm at the er what did you say, 'allergy to pine nuts' > section, ... and I want to link to show that this concept has a ... a > broader one which is just 'pine nuts', ... I put in '<a href='c1' > property="skos:broader"> within the c2 bit? > > Alice: c1 was the broader one, I forget? > > Bob: er c2 was broader, general ... Pine Nuts only. So yup, within > pine nuts section of this big HTML page at /demothes, we'd link up (or > down, guess it doesn't matter the page order?) to the #c1 section. > Remind me, I always mix up, is that <a name="c1"> or <a id="c2">? > > Alice: it's a little bit complicated [searches] but > http://stackoverflow.com/questions/484719/html-anchors-with-name-or-id > seems to cover it... ...er but look it's a bit fiddly this way, never > mind the HTML attribute name for now we can look that up ... you don't > want to call it c1 exactly, because that's the name of your concept > > Bob: And concepts aren't information resources? > > Alice: well obviously they sort of are _informational_ so that's why > some people don't like that terminology, ... but that doesn't matter, > the thing is they're not ... you know HTTP endpointy things, ... like > data objects attached to a Web server, ... they're more abstract > > Bob: and so also they're not bits of an HTML page either? Right? So if > I go linking with <a href="/demothes#c1" blabla, that's implying that > c1 is a bit of a Web page... so that's an information resource, ... > and really it's not because it's a thesaurus concept which is more a > sort of social entity or conceptual or mental or something, ... not > inside my server or page like a concrete information object? > > Alice: Y...es. Well you're mixing two things here a bit. Or three. > Hang on. Two. Right: > First. We slipped from talking about the target of HTML hyperlinks > (the id/name attribute stuff) to the markup at the start end of the > link. <a href="/demothes#c1" is fine, so long as you're not really > pointing at a page that has a section with name (or id) of 'c1'. It's > the name end, the target stuff, that you can't put the thingy's URI > into. It's ok to point because ... you're sort of saying something. > But if you write the target markup, you're saying that c2 is part of > the page. Which it isn't. > > Bob: o...k. Seems oddly asmmetrical somehow. But the ' > href="/demothes#c1" ' HTML ... it's pointing at a page, right? And if > we go to it in a browser (unless it has a bunch of funny extensions, > ... I got in a mess one time with Firefox addons I was trying, ...) > ... we go to it in a browser we'll get an HTML page. And there'll be a > bit of the page decribing our concepts c1 and c2 ... and in theory > links can jump you down the classic way, to where you want to read? > That's nice to have in documentation. > > Alice: Yes yes, ... just we don't name the page target parts, erm > anchors, with that same name. As the skos thing, concept, I mean. > > Bob: Because it's not a webby thing it's a real worldy thing, even if > it's still sort of about information? Like a book in my hand also is? > > Alice: Exactly > > Bob: [beams] > > Alice: So, ... right, ... we don't name the in-page targets the same > as the things those bits of page describe? > > Bob: Ok, so like with the other design, we could call the page bits > #c2_bit_of_the_page or something less verbose, just not #c2 because we > already used that ID for something 'off Web', the concept itself? > > Alice: Yes. > > Bob: Doesn't that screw up scrolling? > > Alice: Well you could use some jquery thing I found and that's quite > nice actually because it scrolls smoothly and degrades gracefully and > ... wait wait I'm talking nonsense, ... sorry. It's fine. You just put > in two anchors and two links? > > Bob: They're called anchors at both ends of the link, right? Sort of > nautical idea... ... in RDF links too? > > Alice: Er yeah yes. It's <a>. We don't really talk about anchors so > much in the abstract rdf model. But it's a similar idea, hence the > Linked Data thing? > > Bob: So the subject is an anchor, the predicate is like a kind of link > (that's a 'rel' or whatever?) and the object in the triple is an > anchor too? > > Alice: Well not exactly. You're ... well, sure. Yes. If that helps you > think about it, ... but what I meant to say was forget about jquery, > it'll still scroll and stuff in the browser, so long as you have > anchors like <a name="c2_bit_of_page" and <a name="c2">. Then for each > semantic link, you can link to the semantic target - like c2 - with > that, ... and in the human facing link, ... link to c2_bit_of_page. > Hmm no wait, you'd want to hide the human one because when you click > it it won't go to the part of the page, and you shouldn't put in a > name="c2". Sorry, I'm tired. > > Alice: Ok sure, look is scrolling to the bit of the page important for > you? Maybe the jquery thing could work? Or you can do something in > CSS. It can't be that hard. There are lots of ways to do things with > RDFa. If you get stuck we can ask in IRC or Twitter, people are really > helpful (though not everyone agrees about this hash stuff and 303s) > > Bob: So which is simplest, really? Really big files: bad with #, ... > but # lets you bookmark, ... but makes linking down to the right bit > of the page somehow confusing...? > > ...could I rewrite the links in Javascript maybe? Ok ok, ... how's > this! How about we make the HTML+RDFa page all nice and semantic, but > put in a javascript that when the page loads --- only in browsers --- > it rewrites all the links to be #bit_of_page links, ... then when > clicked ... boom you hop to the right bit of the page. And still > there's a big RDF/XML file content negotiable for older tools that > don't read the HTML+RDFa .... everyone's happy! > > Alice: You still need to update all the URIs in your text RDF/XML file > to be this new pattern we agreed, ... and that javascript thing is > half > sick and half clever, ... maybe it'd be fine. There might be browser > addons that get confused if javascript is messing with stuff. But we > got distracted. I'm not sure of anything using this stuff much, you > might check in tabulator at least to see what it does. > > But we were talking about <a href="/demothes#c1" blabla, > > ... and I said you'd slipped from talking about the 'target' end of > the anchor link, to the 'source' end. And that the source in RDFa > could mention things that weren't (and shouldn't) be actual HTML page > targets. But also you were mixing up a bit, ... the idea of an > information resource like a thing that's up there via some Web server > and giving you content negotiated formats, ... with the idea of bits > of a page being an information resource. > > ... but at least we agreed that your skos concepts aren't either of > those; they're abstractions. So the Web pages sort of describe them, > ... and the bits of a big rdfa html page if we go with the # option > > Bob: ... we didn't really get on to the RDF/XML version > > Alice: that's a bit simpler, in a way... because only machines look at > it and it doesn't care about prettyness or usability or browser > behaviour. > > Bob: couldn't we put in some xslt,.. and make it work for both? like a > stylesheet to make it into html+rdfa, ... browsers do that now don't > they? > > Alice: In theory maybe; in practice that's not really something people > seem to do. But look, we're going with the 'put it all in one big > page' # version, so we just make the rdf/xml use those as the URIs and > ... well we're done. > > Bob: Easy! So I just change some URLs and upload the RDF/XML, ... then > write some script and make an HTML page with the right kinds of links. > .... > > Alice: And we'll figure out some way to make it jump to bits of the > page? Maybe... > > Bob: Can we do backlinks inside the page? > > Alice: like broader concept links back to the narrower? > > Bob: I can find some RDFa parser and put that in dynamically? or > better to spell it out in the actual markup so the triples are there? > > Alice: yes, maybe better > > Bob: but ... that'll make the doc even bigger, ... if every broader > term triple has an inverse link too.... guess it doesn't matter, > webservers zip stuff on the fly don't they? > > Alice: they can, yeah. but look you can always do the http 303 thing > if you're worried about size of the file,... chunk it up. Do the slash > thing. Are you expecting your thesaurus to grow at all? > > Bob: it can be nice having one page per thingy, after all > > Alice: they're both easier in different ways > > Bob: thanks, you've been a big help. Would it be alright to just > upload the skos dump file for now, ... maybe I'll zip it > > Alice: we ought to fix those URIs sometime > > Bob: maybe tomorrow? > > Alice: maybe tomorrow... > Ah the, "torture of choice", as the Germans say :) What about one page per word: http://example.org/words/happy Each page contains exactly one data point at #this, #word (maybe #?) etc. Then mark it up in RFFa with the relations to the other words?
Received on Wednesday, 28 March 2012 08:21:19 UTC