RE: See Other

"I'm doing this because of promise of interestingly unexpected re-use"

Dan - you are not convincing me of the cost-benefit trade off here...!

-----Original Message-----
From: Dan Brickley [mailto:danbri@danbri.org]
Sent: 28 March 2012 02:31
To: Melvin Carvalho
Cc: Jeni Tennison; tom.heath@talis.com; public-lod community
Subject: See Other

On 27 March 2012 20:23, Melvin Carvalho <melvincarvalho@gmail.com> wrote:

> I'm curious as to why this is difficult to explain.  Especially since
> I also have difficulties explaining the benefits of linked data.
> However, normally the road block I hit is explaining why URIs are important.



Alice: So, you want to share your in-house thesaurus in the Web as 'Linked Data' in SKOS?

Bob: Yup, I saw [inspirational materials] online and a few blog posts, it looks easy enough. We've exported it as RDF/XML SKOS already. Here, take a look...

[data stick changes hands]

Alice: Cool! And .. yup it's wellformed XML, and here see I parsed it with a real RDF parser (made by Dave Beckett who worked on the last W3C spec for this stuff, beats me actually checking it myself) and it didn't complain. So looks fine! Ok so we'll need to chunk this up somehow so there's one little record per term from your thesaurus, and links between them... ...and it's generally good to make human facing pages as well as machine-oriented RDF ones too.

Bob: Ok, so that'll be microformats no wait microdata ah yeah, RDFa, right? Which version?

Alice: well RDFa yes, microdata is a kind of cousin, a mix of thinking from microdata and microformats communities. But I meant that you'd make a version of each page for computers to use (RDF/XML like your test export here), ... and you'd make some kind of HTML page for more human readers also. The stuff you mention is more about doing both within the same format...

Bob: Great. Which one's the most standard?  What should I use?

Alice: Well I guess it depends what you mean by standard.
[skips digression about whatwg and w3c etc notions of standards process] [skips digression about XHTML vs XML-ish polyglot HTML vs resolutely non-XML HTML5 flavours] [skips digression about qnames in HTML and RDFa 1.1 versus 1.0]

...you might care to look at using basic HTML5 document with say the Lite version of RDFa 1.1 (which is pretty much finished but not an official stable standard yet at W3C)

Bob: [makes a note]. Ok, but that's just the human-facing page, anyway. We'd put up RDF/XML for machines too, right? Well maybe that's not necessary I guess. I was reading something about GRDDL and XSLT that automates the conversion, ... should we maybe generate the RDF/XML from the HTML+RDFa or vice versa? or just have some php hack generate both from MySQL since that's where the stuff ultimately lives right now anyway...?

Alice: Um, well it's pretty much your choice. Do you need RDF/XML too?
Well..... maybe, not sure... it depends. There are more RDF/XML parsers around, they're more mature, ... but increasingly tools will consume all kinds of data as RDF. So it might not matter. Depends why you're doing this, really.

Bob: Er ok, maybe we ought to do both for now, ... belt-and-braces, ... maybe watch the stats and see what's being picked up? I'm doing this because of promise of interestingly unexpected re-use and so on, which makes details hard to predict by definition.

Alice: Sounds like a plan. Ok, so each node in your RDF graph, ...
we'll need to give it a URI. You know that's like the new word for URL, but that includes identifiers for real world things too.

Bob: Sure sure, I read that. Makes sense. And I can have a URI, my homepage can have a URI, I'm not my home page blah-de-blah?

Alice: You got it.

Bob: Ok, so what URLs should I give the concepts in this thesaurus?
They've got all kinds of strings attached, but we've also got nicely managed numeric IDs too.

Alice: Right so maybe something short (URls can never be too short...), ... so maybe if you host at your example.org server,
http://example.com/demothes/c1  then same but /c2 /c3 etc.

... or well you could use #c1 or #c2 etc. That's pretty much up to you. There are pros-and-cons in both directions.

Bob: whatever's easiest. It's a pretty plain apache2 setup, with php if we want it, or we can batch create files if that makes more sense; this data doesn't change much.

Alice: Well how big is the thesaurus...?

Bob: a couple thousand terms, each with a few relations and bits of text; maybe more if we dig out the translations (humm should we language negotiate those somehow?)

Alice: Let's talk about that another day, maybe?

Bob:  And hmm the translations are versioned a bit differently? Should we put version numbers in somewhere so it's unambiguous which version of the translation we're using?

Alice: Let's talk about that another day, too.

Bob: OK, where were we? http://example.com/demothes/c1 ... sure, that sounds fine.

... we'd put some content negotiated apache thing there, and make c1 send HTML if there's a browser, or rdf/xml if they want that stuff instead? Default to the browser / HTML version maybe?

Alice: something like that could work. There are some howtos around.
Oh, but if c1 isn't an information resource, you'll need to redirect with a 303 HTTP code. It's like you said with people and homepages, to make clear which is which.

Bob: Oh-kay... so in our SKOS graph, it's a mix of things, the bulk is a load of descriptions of skos:Concept and there's a bit of metadata in there about some docs, and the admin contact info, ...  but yeah it's mostly the concepts (which seems to be the skos way to talk about thesaurus terms, sort of abstracted a bit to make translations easier, right?)

Alice: Yup. Well, ... remember we're breaking up your graph into bits... like one chunk per page?

Bob: Ah right, so is that one node in the graph per page? per ... erm, how do they call it? [counts on fingers] subject-predicate-object...
er subject, right? Each object in my graph, er like OO object I mean, entity, thingy...

Alice: -thingy is good-

Bob: Each thing in the graph, goes in one page, more or less?

Alice: more or less. It's up to you, I guess there are best practices, roughly the bulk of it, one page per concept, ... and then the metadata etc you might do differently

Bob: Ok, so c1 is one concept, c2 is another, ... they'd have links to each other in the ... the RDF/XML files, right? And I guess the HTML too, sure

Alice: Sure

Bob: so the html rdfa stuff would be <a href='c2'>something and rel='broader' if c1 was broader than c2?

Alice: er it might be broaderTerm, or broaderConcept, I forget... [searches]

Bob: ah look, yeah skos:broader, ... ok so if c2 is more broad, er broader, more general, than c1, ... we put in the c1 HTML page a link over to c2, and add some RDFa too, to say what the link means in semantic rdf speak as well as clickable-link?

Alice: [tips head on side], ... sorry I always get this stuff back to front. Ok, slowly. c2 is broader than c1, ... 'broader' points to the one that's broader, like you know more general, ... so let's say c1 is the specific, detailed one. In the c1 HTML page, we'd ...

Bob: [interrupting] would that be c1.html? like concept ID dot h t m l, as a pattern?

Alice: yes, you could call it that, ... it's up to you really but obviously it's sort of conventional. But then there's another convention of keeping the file types out of URLs

Bob: So in the filesystem they might be a bunch of batch-generated HTML files called c1.html c2.html etc, but I'd keep that secret or obscure or hide it with apache config somehow?

Alice: For example, yes. But ok, so c1.html would be like "blah blah blah, and then a paragraph describing concept c1 from your  thesaurus, ... which is (we say) some pretty specific topic, like er, say "allergy to pine nuts'... and maybe c2 is just 'pine nuts'

Bob: Well it's an engineering terminology thesaurus, but sure. I get the idea. So we'd do <a href='/demothes/c2' rel='broader' ...

Alice: in rdfa 1.1 lite that's property='broader', erm property='skos:broader', ... but sure, something like that. you might put the relationship first, it reads better. I think it means the same formally.

Bob: right right, ... and in c2 HTML page, we'd do the link back the other way? is there a skos word for the opposite of broader, skos:narrower? [searches] ... ok looks like it, ... so I'd use that?
it's sort of redundant I guess if you crawl all the pages, ... but you have to find the pages and links somehow, ... what if I started with some linked data agent thing on c2.html, how would it find c1.html to find that c1.html says that c2.html is broader?

Alice: Good point. We can work some of this out later. There are also sitemap files, so in page links aren't the only way to find stuff.
It's all sort of emerging best practices territory. Lots of early adopters figuring things out, if we get this working, maybe you could write up a case study?

Bob: Or you could just tell me what to do. Hey, whatever happened to rev= ... is that still in XHTML?

Alice: Which version? I mean, ... can we talk about this later?

Bob: Right right. But couldn't I put "rev='skos:broader'" in c2.html, ...

Alice: [patiently] ... you could, yes. Or both... there's a lot of flexibility in this system. In many ways it's a huge strength...

Bob: Oh hang on, I found
http://www.w3schools.com/TAGS/att_link_rev.asp and it says rev isn't supported by browsers; is that a problem.

Alice: We're getting off the point a bit, ... Anyway I think Hixie took it out of HTML5 because it wasn't being used and people found it confusing. Or last time I looked anyway, I think it was gone.

Bob: Righto. I can see that. So anyway, we'll make a load of HTML pages that describe our concepts...

Alice: Yup, and we'll redirect /demothes/c2 to a page about c2, ... so things don't mix up information resources with non information resources. Oh and I'm not sure w3schools is always the best reference on this stuff...

Bob: things on the Web and things that aren't on the Web. Ok, if not w3schools, where should I check?

Alice: [ignoring w3schools question] ... exactly. things that aren't _on_ the Web. Or _in_ if you prefer. Like your concepts are a kind of abstraction so they're not really on the Web, ... they're just _described_ in the Web.

Bob: so we redirect to c1.html etc?

Alice: Sure we could do that, or if you want to keep the suffix out of the URL, which is considered good hygeine by some, you might for example use /demothes/about_c1 ... that's quite clean

Bob: And if we get a content negotiated request for rdf/xml ...? ...
send that instead, ... no redirecting

Alice: something like that, I'll check the docs for you later. It's a bit fiddly but there are some examples around we can copy from, httpd.conf etc

Bob: Great. And if someone asks for the rdf/xml version of about_c1?

Alice: Not sure, I'll have to think a bit, but ... well sending the rdf along sounds ok. It's not quite the same as asking for c1 but ...
well sure. Why not?

Bob: What was the other option? #c1 ? No messing around with redirects there? Easier to bookmark?

Alice: Well yeah, ... and to link to, ... but your data isn't tiny, ... a few thousand concepts you said. Could be a big page fetch each time.

Bob: Is that a problem? How big is too big? We can cache internally so it's not hitting the db, right? Will intelligent agents and so on be reading this a lot? Do they choke on big files?

Alice: Well, maybe not so intelligent. But the way URIs and URLs work, when there's a # in them, ... that doesn't get sent to the server  and so the server doesn't see the #c1 or #c2 or #c9999 bit, ... so it can only really send you the whole lot and the consuming code has to make sense of it by remembering what it asked for...

Bob: ...well maybe this is still easier. And we can content negotiate still, right?

Alice: sure. HTML+RDFa or RDF/XML or ... you heard of turtle and ntriples and there's this thing called json-ld ... but don't worry about that for now. Let's just think about RDF/XML and HTML+RDFa today, eh?

Alice: [thinking...] well maybe just one of those would do, ... but it's not hard to generate both.

Bob: Alright, so one big HTML+RDFa file with the thesaurus in it, in SKOS triples but prettied up a bit with CSS? Sounds ok...

Alice: and a big RDF/XML doc too, if they ask for that instead

Bob: got it. So ... hang on, back up a bit, ... if we're in one big HTML page, and I'm at the er what did you say, 'allergy to pine nuts'
section, ... and I want to link to show that this concept has a ... a broader one which is just 'pine nuts', ... I put in '<a href='c1'
property="skos:broader"> within the c2 bit?

Alice: c1 was the broader one, I forget?

Bob: er c2 was broader, general ... Pine Nuts only. So yup, within pine nuts section of this big HTML page at /demothes, we'd link up (or down, guess it doesn't matter the page order?) to the #c1 section.
Remind me, I always mix up, is that <a name="c1"> or <a id="c2">?

Alice: it's a little bit complicated [searches] but http://stackoverflow.com/questions/484719/html-anchors-with-name-or-id

seems to cover it... ...er but look it's a bit fiddly this way, never mind the HTML attribute name for now we can look that up ... you don't want to call it c1 exactly, because that's the name of your concept

Bob: And concepts aren't information resources?

Alice: well obviously they sort of are _informational_ so that's why some people don't like that terminology, ... but that doesn't matter, the thing is they're not ... you know HTTP endpointy things, ... like data objects attached to a Web server, ... they're more abstract

Bob: and so also they're not bits of an HTML page either? Right? So if I go linking with <a href="/demothes#c1" blabla, that's implying that
c1 is a bit of a Web page... so that's an information resource, ...
and really it's not because it's a thesaurus concept which is more a sort of social entity or conceptual or mental or something, ... not inside my server or page like a concrete information object?

Alice: Y...es. Well you're mixing two things here a bit. Or three.
Hang on. Two. Right:
  First. We slipped from talking about the target of HTML hyperlinks (the id/name attribute stuff) to the markup at the start end of the link. <a href="/demothes#c1" is fine, so long as you're not really pointing at a page that has a section with name (or id) of 'c1'. It's the name end, the target stuff, that you can't put the thingy's URI into. It's ok to point because ... you're sort of saying something.
But if you write the target markup, you're saying that c2 is part of the page. Which it isn't.

Bob: o...k. Seems oddly asmmetrical somehow. But the '
href="/demothes#c1" ' HTML ... it's pointing at a page, right? And if we go to it in a browser (unless it has a bunch of funny extensions, ... I got in a mess one time with Firefox addons I was trying, ...) ... we go to it in a browser we'll get an HTML page. And there'll be a bit of the page decribing our concepts c1 and c2 ... and in theory links can jump you down the classic way, to where you want to read?
That's nice to have in documentation.

Alice: Yes yes, ... just we don't name the page target parts, erm anchors, with that same name. As the skos thing, concept, I mean.

Bob: Because it's not a webby thing it's a real worldy thing, even if it's still sort of about information? Like a book in my hand also is?

Alice: Exactly

Bob: [beams]

Alice: So, ... right, ... we don't name the in-page targets the same as the things those bits of page describe?

Bob: Ok, so like with the other design, we could call the page bits #c2_bit_of_the_page or something less verbose, just not #c2 because we already used that ID for something 'off Web', the concept itself?

Alice: Yes.

Bob: Doesn't that screw up scrolling?

Alice: Well you could use some jquery thing  I found and that's quite nice actually because it scrolls smoothly and degrades gracefully and ... wait wait I'm talking nonsense, ... sorry. It's fine. You just put in two anchors and two links?

Bob: They're called anchors at both ends of the link, right? Sort of nautical idea... ... in RDF links too?

Alice: Er yeah yes. It's <a>. We don't really talk about anchors so much in the abstract rdf model. But it's a similar idea, hence the Linked Data thing?

Bob: So the subject is an anchor, the predicate is like a kind of link (that's a 'rel' or whatever?) and the object in the triple is an anchor too?

Alice: Well not exactly. You're ... well, sure. Yes. If that helps you think about it, ...  but what I meant to say was forget about jquery, it'll still scroll and stuff in the browser, so long as you have anchors like <a name="c2_bit_of_page" and <a name="c2">. Then for each semantic link, you can link to the semantic target - like c2 - with that, ... and in the human facing link, ... link to c2_bit_of_page.
Hmm no wait, you'd want to hide the human one because when you click it it won't go to the part of the page, and you shouldn't put in a name="c2". Sorry, I'm tired.

Alice: Ok sure, look is scrolling to the bit of the page important for you? Maybe the jquery thing could work? Or you can do something in CSS. It can't be that hard. There are lots of ways to do things with RDFa. If you get stuck we can ask in IRC or Twitter, people are really helpful (though not everyone agrees about this hash stuff and 303s)

Bob: So which is simplest, really? Really big files: bad with #, ...
but # lets you bookmark, ... but makes linking down to the right bit of the page somehow confusing...?

...could I rewrite the links in Javascript maybe? Ok ok, ... how's this! How about we make the HTML+RDFa page all nice and semantic, but put in a javascript that when the page loads --- only in browsers --- it rewrites all the links to be #bit_of_page links, ... then when clicked ... boom you hop to the right bit of the page. And still there's a big RDF/XML file content negotiable for older tools that don't read the HTML+RDFa .... everyone's happy!

Alice: You still need to update all the URIs in your text RDF/XML file to be this new pattern we agreed, ... and that javascript thing is half sick and half clever, ... maybe it'd be fine. There might be browser addons that get confused if javascript is messing with stuff. But we got distracted. I'm not sure of anything using this stuff much, you might check in tabulator at least to see what it does.

But we were talking about <a href="/demothes#c1" blabla,

... and I said you'd slipped from talking about the 'target' end of the anchor link,  to the 'source' end. And that the source in RDFa could mention things that weren't (and shouldn't) be actual HTML page targets. But also you were mixing up a bit, ... the idea of an information resource like a thing that's up there via some Web server and giving you content negotiated formats, ... with the idea of bits of a page being an information resource.

 ... but at least we agreed that your skos concepts aren't either of those; they're abstractions. So the Web pages sort of describe them, ... and the bits of a big rdfa html page if we go with the # option

Bob: ... we didn't really get on to the RDF/XML version

Alice: that's a bit simpler, in a way... because only machines look at it and it doesn't care about prettyness or usability or browser behaviour.

Bob: couldn't we put in some xslt,.. and make it work for both? like a stylesheet to make it into html+rdfa, ... browsers do that now don't they?

Alice: In theory maybe; in practice that's not really something people seem to do. But look, we're going with the 'put it all in one big page' # version, so we just make the rdf/xml use those as the URIs and ... well we're done.

Bob: Easy! So I just change some URLs and upload the RDF/XML, ... then write some script and make an HTML page with the right kinds of links.
....

Alice: And we'll figure out some way to make it jump to bits of the page? Maybe...

Bob: Can we do backlinks inside the page?

Alice: like broader concept links back to the narrower?

Bob: I can find some RDFa parser and put that in dynamically? or better to spell it out in the actual markup so the triples are there?

Alice: yes, maybe better

Bob: but ... that'll make the doc even bigger, ... if every broader term triple has an inverse link too.... guess it doesn't matter, webservers zip stuff on the fly don't they?

Alice: they can, yeah. but look you can always do the http 303 thing if you're worried about size of the file,... chunk it up. Do the slash thing. Are you expecting your thesaurus to grow at all?

Bob: it can be nice having one page per thingy, after all

Alice: they're both easier in different ways

Bob: thanks, you've been a big help. Would it be alright to just upload the skos dump file for now, ... maybe I'll zip it

Alice: we ought to fix those URIs sometime

Bob: maybe tomorrow?

Alice: maybe tomorrow...

Received on Wednesday, 28 March 2012 08:33:12 UTC