See Other

On 27 March 2012 20:23, Melvin Carvalho <melvincarvalho@gmail.com> wrote:

> I'm curious as to why this is difficult to explain.  Especially since I also
> have difficulties explaining the benefits of linked data.  However, normally
> the road block I hit is explaining why URIs are important.



Alice: So, you want to share your in-house thesaurus in the Web as
'Linked Data' in SKOS?

Bob: Yup, I saw [inspirational materials] online and a few blog posts,
it looks easy enough. We've exported it as RDF/XML SKOS already. Here,
take a look...

[data stick changes hands]

Alice: Cool! And .. yup it's wellformed XML, and here see I parsed it
with a real RDF parser (made by Dave Beckett who worked on the last
W3C spec for this stuff, beats me actually checking it myself) and it
didn't complain. So looks fine! Ok so we'll need to chunk this up
somehow so there's one little record per term from your thesaurus, and
links between them... ...and it's generally good to make human facing
pages as well as machine-oriented RDF ones too.

Bob: Ok, so that'll be microformats no wait microdata ah yeah, RDFa,
right? Which version?

Alice: well RDFa yes, microdata is a kind of cousin, a mix of thinking
from microdata and microformats communities. But I meant that you'd
make a version of each page for computers to use (RDF/XML like your
test export here), ... and you'd make some kind of HTML page for more
human readers also. The stuff you mention is more about doing both
within the same format...

Bob: Great. Which one's the most standard?  What should I use?

Alice: Well I guess it depends what you mean by standard.
[skips digression about whatwg and w3c etc notions of standards process]
[skips digression about XHTML vs XML-ish polyglot HTML vs resolutely
non-XML HTML5 flavours]
[skips digression about qnames in HTML and RDFa 1.1 versus 1.0]

...you might care to look at using basic HTML5 document with say the
Lite version of RDFa 1.1 (which is pretty much finished but not an
official stable standard yet at W3C)

Bob: [makes a note]. Ok, but that's just the human-facing page,
anyway. We'd put up RDF/XML for machines too, right? Well maybe that's
not necessary I guess. I was reading something about GRDDL and XSLT
that automates the conversion, ... should we maybe generate the
RDF/XML from the HTML+RDFa or vice versa? or just have some php hack
generate both from MySQL since that's where the stuff ultimately lives
right now anyway...?

Alice: Um, well it's pretty much your choice. Do you need RDF/XML too?
Well..... maybe, not sure... it depends. There are more RDF/XML
parsers around, they're more mature, ... but increasingly tools will
consume all kinds of data as RDF. So it might not matter. Depends why
you're doing this, really.

Bob: Er ok, maybe we ought to do both for now, ... belt-and-braces,
... maybe watch the stats and see what's being picked up? I'm doing
this because of promise of interestingly unexpected re-use and so on,
which makes details hard to predict by definition.

Alice: Sounds like a plan. Ok, so each node in your RDF graph, ...
we'll need to give it a URI. You know that's like the new word for
URL,
but that includes identifiers for real world things too.

Bob: Sure sure, I read that. Makes sense. And I can have a URI, my
homepage can have a URI, I'm not my home page blah-de-blah?

Alice: You got it.

Bob: Ok, so what URLs should I give the concepts in this thesaurus?
They've got all kinds of strings attached, but we've also got nicely
managed numeric IDs too.

Alice: Right so maybe something short (URls can never be too
short...), ... so maybe if you host at your example.org server,
http://example.com/demothes/c1  then same but /c2 /c3 etc.

... or well you could use #c1 or #c2 etc. That's pretty much up to
you. There are pros-and-cons in both directions.

Bob: whatever's easiest. It's a pretty plain apache2 setup, with php
if we want it, or we can batch create files if that makes more sense;
this data doesn't change much.

Alice: Well how big is the thesaurus...?

Bob: a couple thousand terms, each with a few relations and bits of
text; maybe more if we dig out the translations (humm should we
language negotiate those somehow?)

Alice: Let's talk about that another day, maybe?

Bob:  And hmm the translations are versioned a bit differently? Should
we put version numbers in somewhere so it's unambiguous which
version of the translation we're using?

Alice: Let's talk about that another day, too.

Bob: OK, where were we? http://example.com/demothes/c1 ... sure, that
sounds fine.

... we'd put some content negotiated apache thing there, and make c1
send HTML if there's a browser, or rdf/xml if they want that stuff
instead? Default to the browser / HTML version maybe?

Alice: something like that could work. There are some howtos around.
Oh, but if c1 isn't an information resource, you'll need to redirect
with a 303 HTTP code. It's like you said with people and homepages, to
make clear which is which.

Bob: Oh-kay... so in our SKOS graph, it's a mix of things, the bulk is
a load of descriptions of skos:Concept and there's a bit of metadata
in there about some docs, and the admin contact info, ...  but yeah
it's mostly the concepts (which seems to be the skos way to talk about
thesaurus terms, sort of abstracted a bit to make translations easier, right?)

Alice: Yup. Well, ... remember we're breaking up your graph into
bits... like one chunk per page?

Bob: Ah right, so is that one node in the graph per page? per ... erm,
how do they call it? [counts on fingers] subject-predicate-object...
er subject, right? Each object in my graph, er like OO object I mean,
entity, thingy...

Alice: -thingy is good-

Bob: Each thing in the graph, goes in one page, more or less?

Alice: more or less. It's up to you, I guess there are best practices,
roughly the bulk of it, one page per concept, ... and then the
metadata etc you might do differently

Bob: Ok, so c1 is one concept, c2 is another, ... they'd have links to
each other in the ... the RDF/XML files, right? And I guess the HTML
too, sure

Alice: Sure

Bob: so the html rdfa stuff would be <a href='c2'>something and
rel='broader' if c1 was broader than c2?

Alice: er it might be broaderTerm, or broaderConcept, I forget... [searches]

Bob: ah look, yeah skos:broader, ... ok so if c2 is more broad, er
broader, more general, than c1, ... we put in the c1 HTML page a link
over to c2, and add some RDFa too, to say what the link means in
semantic rdf speak as well as clickable-link?

Alice: [tips head on side], ... sorry I always get this stuff back to
front. Ok, slowly. c2 is broader than c1, ... 'broader' points to the
one that's broader, like you know more general, ... so let's say c1 is
the specific, detailed one. In the c1 HTML page, we'd ...

Bob: [interrupting] would that be c1.html? like concept ID dot h t m
l, as a pattern?

Alice: yes, you could call it that, ... it's up to you really but
obviously it's sort of conventional. But then there's another
convention of keeping
the file types out of URLs

Bob: So in the filesystem they might be a bunch of batch-generated
HTML files called c1.html c2.html etc, but I'd keep that secret or
obscure or hide it with apache config somehow?

Alice: For example, yes. But ok, so c1.html would be like "blah blah
blah, and then a paragraph describing concept c1 from your
 thesaurus, ... which is (we say) some pretty specific topic, like er,
say "allergy to pine nuts'... and maybe c2 is just 'pine nuts'

Bob: Well it's an engineering terminology thesaurus, but sure. I get
the idea. So we'd do <a href='/demothes/c2' rel='broader' ...

Alice: in rdfa 1.1 lite that's property='broader', erm
property='skos:broader', ... but sure, something like that. you might
put the relationship first, it reads better. I think it means the same
formally.

Bob: right right, ... and in c2 HTML page, we'd do the link back the
other way? is there a skos word for the opposite of broader,
skos:narrower? [searches] ... ok looks like it, ... so I'd use that?
it's sort of redundant I guess if you crawl all the pages, ... but you
have to find the pages and links somehow, ... what if I started with
some linked data agent thing on c2.html, how would it find c1.html to
find that
c1.html says that c2.html is broader?

Alice: Good point. We can work some of this out later. There are also
sitemap files, so in page links aren't the only way to find stuff.
It's all sort of emerging best practices territory. Lots of early
adopters figuring things out, if we get this working, maybe you could
write up a case study?

Bob: Or you could just tell me what to do. Hey, whatever happened to
rev= ... is that still in XHTML?

Alice: Which version? I mean, ... can we talk about this later?

Bob: Right right. But couldn't I put "rev='skos:broader'" in c2.html, ...

Alice: [patiently] ... you could, yes. Or both... there's a lot of
flexibility in this system. In many ways it's a huge strength...

Bob: Oh hang on, I found
http://www.w3schools.com/TAGS/att_link_rev.asp and it says rev isn't
supported by browsers; is that a problem.

Alice: We're getting off the point a bit, ... Anyway I think Hixie
took it out of HTML5 because it wasn't being used and people found it
confusing. Or last time I looked anyway, I think it was gone.

Bob: Righto. I can see that. So anyway, we'll make a load of HTML
pages that describe our concepts...

Alice: Yup, and we'll redirect /demothes/c2 to a page about c2, ... so
things don't mix up information resources with non information
resources. Oh and I'm not sure w3schools is always the best reference
on this stuff...

Bob: things on the Web and things that aren't on the Web. Ok, if not
w3schools, where should I check?

Alice: [ignoring w3schools question] ... exactly. things that aren't
_on_ the Web. Or _in_ if you prefer. Like your concepts are a kind of
abstraction so they're not really on the Web, ... they're just
_described_ in the Web.

Bob: so we redirect to c1.html etc?

Alice: Sure we could do that, or if you want to keep the suffix out of
the URL, which is considered good hygeine by some, you might
for example use /demothes/about_c1 ... that's quite clean

Bob: And if we get a content negotiated request for rdf/xml ...? ...
send that instead, ... no redirecting

Alice: something like that, I'll check the docs for you later. It's a
bit fiddly but there are some examples around we can copy from,
httpd.conf etc

Bob: Great. And if someone asks for the rdf/xml version of about_c1?

Alice: Not sure, I'll have to think a bit, but ... well sending the
rdf along sounds ok. It's not quite the same as asking for c1 but ...
well sure. Why not?

Bob: What was the other option? #c1 ? No messing around with redirects
there? Easier to bookmark?

Alice: Well yeah, ... and to link to, ... but your data isn't tiny,
... a few thousand concepts you said. Could be a big page fetch each
time.

Bob: Is that a problem? How big is too big? We can cache internally so
it's not hitting the db, right? Will intelligent agents and so on be
reading this a lot? Do they choke on big files?

Alice: Well, maybe not so intelligent. But the way URIs and URLs work,
when there's a # in them, ... that doesn't get sent to the server
 and so the server doesn't see the #c1 or #c2 or #c9999 bit, ... so it
can only really send you the whole lot and the consuming code has to
make sense of it by remembering what it asked for...

Bob: ...well maybe this is still easier. And we can content negotiate
still, right?

Alice: sure. HTML+RDFa or RDF/XML or ... you heard of turtle and
ntriples and there's this thing called json-ld ... but don't worry
about that for now. Let's just think about RDF/XML and HTML+RDFa
today, eh?

Alice: [thinking...] well maybe just one of those would do, ... but
it's not hard to generate both.

Bob: Alright, so one big HTML+RDFa file with the thesaurus in it, in
SKOS triples but prettied up a bit with CSS? Sounds ok...

Alice: and a big RDF/XML doc too, if they ask for that instead

Bob: got it. So ... hang on, back up a bit, ... if we're in one big
HTML page, and I'm at the er what did you say, 'allergy to pine nuts'
section, ... and I want to link to show that this concept has a ... a
broader one which is just 'pine nuts', ... I put in '<a href='c1'
property="skos:broader"> within the c2 bit?

Alice: c1 was the broader one, I forget?

Bob: er c2 was broader, general ... Pine Nuts only. So yup, within
pine nuts section of this big HTML page at /demothes, we'd link up (or
down, guess it doesn't matter the page order?) to the #c1 section.
Remind me, I always mix up, is that <a name="c1"> or <a id="c2">?

Alice: it's a little bit complicated [searches] but
http://stackoverflow.com/questions/484719/html-anchors-with-name-or-id
seems to cover it... ...er but look it's a bit fiddly this way, never
mind the HTML attribute name for now we can look that up ... you don't
want to call it c1 exactly, because that's the name of your concept

Bob: And concepts aren't information resources?

Alice: well obviously they sort of are _informational_ so that's why
some people don't like that terminology, ... but that doesn't matter,
the thing is they're not ... you know HTTP endpointy things, ... like
data objects attached to a Web server, ... they're more abstract

Bob: and so also they're not bits of an HTML page either? Right? So if
I go linking with <a href="/demothes#c1" blabla, that's implying that
c1 is a bit of a Web page... so that's an information resource, ...
and really it's not because it's a thesaurus concept which is more a
sort of social entity or conceptual or mental or something, ... not
inside my server or page like a concrete information object?

Alice: Y...es. Well you're mixing two things here a bit. Or three.
Hang on. Two. Right:
  First. We slipped from talking about the target of HTML hyperlinks
(the id/name attribute stuff) to the markup at the start end of the
link. <a href="/demothes#c1" is fine, so long as you're not really
pointing at a page that has a section with name (or id) of 'c1'. It's
the name end, the target stuff, that you can't put the thingy's URI
into. It's ok to point because ... you're sort of saying something.
But if you write the target markup, you're saying that c2 is part of
the page. Which it isn't.

Bob: o...k. Seems oddly asmmetrical somehow. But the '
href="/demothes#c1" ' HTML ... it's pointing at a page, right? And if
we go to it in a browser (unless it has a bunch of funny extensions,
... I got in a mess one time with Firefox addons I was trying, ...)
... we go to it in a browser we'll get an HTML page. And there'll be a
bit of the page decribing our concepts c1 and c2 ... and in theory
links can jump you down the classic way, to where you want to read?
That's nice to have in documentation.

Alice: Yes yes, ... just we don't name the page target parts, erm
anchors, with that same name. As the skos thing, concept, I mean.

Bob: Because it's not a webby thing it's a real worldy thing, even if
it's still sort of about information? Like a book in my hand also is?

Alice: Exactly

Bob: [beams]

Alice: So, ... right, ... we don't name the in-page targets the same
as the things those bits of page describe?

Bob: Ok, so like with the other design, we could call the page bits
#c2_bit_of_the_page or something less verbose, just not #c2 because we
already used that ID for something 'off Web', the concept itself?

Alice: Yes.

Bob: Doesn't that screw up scrolling?

Alice: Well you could use some jquery thing  I found and that's quite
nice actually because it scrolls smoothly and degrades gracefully and
... wait wait I'm talking nonsense, ... sorry. It's fine. You just put
in two anchors and two links?

Bob: They're called anchors at both ends of the link, right? Sort of
nautical idea... ... in RDF links too?

Alice: Er yeah yes. It's <a>. We don't really talk about anchors so
much in the abstract rdf model. But it's a similar idea, hence the
Linked Data thing?

Bob: So the subject is an anchor, the predicate is like a kind of link
(that's a 'rel' or whatever?) and the object in the triple is an
anchor too?

Alice: Well not exactly. You're ... well, sure. Yes. If that helps you
think about it, ...  but what I meant to say was forget about jquery,
it'll still scroll and stuff in the browser, so long as you have
anchors like <a name="c2_bit_of_page" and <a name="c2">. Then for each
semantic link, you can link to the semantic target - like c2 - with
that, ... and in the human facing link, ... link to c2_bit_of_page.
Hmm no wait, you'd want to hide the human one because when you click
it it won't go to the part of the page, and you shouldn't put in a
name="c2". Sorry, I'm tired.

Alice: Ok sure, look is scrolling to the bit of the page important for
you? Maybe the jquery thing could work? Or you can do something in
CSS. It can't be that hard. There are lots of ways to do things with
RDFa. If you get stuck we can ask in IRC or Twitter, people are really
helpful (though not everyone agrees about this hash stuff and 303s)

Bob: So which is simplest, really? Really big files: bad with #, ...
but # lets you bookmark, ... but makes linking down to the right bit
of the page somehow confusing...?

...could I rewrite the links in Javascript maybe? Ok ok, ... how's
this! How about we make the HTML+RDFa page all nice and semantic, but
put in a javascript that when the page loads --- only in browsers ---
it rewrites all the links to be #bit_of_page links, ... then when
clicked ... boom you hop to the right bit of the page. And still
there's a big RDF/XML file content negotiable for older tools that
don't read the HTML+RDFa .... everyone's happy!

Alice: You still need to update all the URIs in your text RDF/XML file
to be this new pattern we agreed, ... and that javascript thing is
half
sick and half clever, ... maybe it'd be fine. There might be browser
addons that get confused if javascript is messing with stuff. But we
got distracted. I'm not sure of anything using this stuff much, you
might check in tabulator at least to see what it does.

But we were talking about <a href="/demothes#c1" blabla,

... and I said you'd slipped from talking about the 'target' end of
the anchor link,  to the 'source' end. And that the source in RDFa
could mention things that weren't (and shouldn't) be actual HTML page
targets. But also you were mixing up a bit, ... the idea of an
information resource like a thing that's up there via some Web server
and giving you content negotiated formats, ... with the idea of bits
of a page being an information resource.

 ... but at least we agreed that your skos concepts aren't either of
those; they're abstractions. So the Web pages sort of describe them,
... and the bits of a big rdfa html page if we go with the # option

Bob: ... we didn't really get on to the RDF/XML version

Alice: that's a bit simpler, in a way... because only machines look at
it and it doesn't care about prettyness or usability or browser
behaviour.

Bob: couldn't we put in some xslt,.. and make it work for both? like a
stylesheet to make it into html+rdfa, ... browsers do that now don't
they?

Alice: In theory maybe; in practice that's not really something people
seem to do. But look, we're going with the 'put it all in one big
page' # version, so we just make the rdf/xml use those as the URIs and
... well we're done.

Bob: Easy! So I just change some URLs and upload the RDF/XML, ... then
write some script and make an HTML page with the right kinds of links.
....

Alice: And we'll figure out some way to make it jump to bits of the
page? Maybe...

Bob: Can we do backlinks inside the page?

Alice: like broader concept links back to the narrower?

Bob: I can find some RDFa parser and put that in dynamically? or
better to spell it out in the actual markup so the triples are there?

Alice: yes, maybe better

Bob: but ... that'll make the doc even bigger, ... if every broader
term triple has an inverse link too.... guess it doesn't matter,
webservers zip stuff on the fly don't they?

Alice: they can, yeah. but look you can always do the http 303 thing
if you're worried about size of the file,... chunk it up. Do the slash
thing. Are you expecting your thesaurus to grow at all?

Bob: it can be nice having one page per thingy, after all

Alice: they're both easier in different ways

Bob: thanks, you've been a big help. Would it be alright to just
upload the skos dump file for now, ... maybe I'll zip it

Alice: we ought to fix those URIs sometime

Bob: maybe tomorrow?

Alice: maybe tomorrow...

Received on Wednesday, 28 March 2012 01:31:25 UTC