Re: See Other from Melvin Carvalho on 2012-03-28 (public-lod@w3.org from March 2012)

From: Melvin Carvalho <melvincarvalho@gmail.com>
Date: Wed, 28 Mar 2012 10:20:42 +0200
To: Dan Brickley <danbri@danbri.org>
Cc: Jeni Tennison <jeni@jenitennison.com>, tom.heath@talis.com, public-lod community <public-lod@w3.org>
Message-ID: <CAKaEYhKOG+Zb6wUprAKBfbEm1giy6BxupUq-Z0W3eFkdrqaErw@mail.gmail.com>
On 28 March 2012 03:30, Dan Brickley <danbri@danbri.org> wrote:

> On 27 March 2012 20:23, Melvin Carvalho <melvincarvalho@gmail.com> wrote:
>
> > I'm curious as to why this is difficult to explain.  Especially since I
> also
> > have difficulties explaining the benefits of linked data.  However,
> normally
> > the road block I hit is explaining why URIs are important.
>
>
>
> Alice: So, you want to share your in-house thesaurus in the Web as
> 'Linked Data' in SKOS?
>
> Bob: Yup, I saw [inspirational materials] online and a few blog posts,
> it looks easy enough. We've exported it as RDF/XML SKOS already. Here,
> take a look...
>
> [data stick changes hands]
>
> Alice: Cool! And .. yup it's wellformed XML, and here see I parsed it
> with a real RDF parser (made by Dave Beckett who worked on the last
> W3C spec for this stuff, beats me actually checking it myself) and it
> didn't complain. So looks fine! Ok so we'll need to chunk this up
> somehow so there's one little record per term from your thesaurus, and
> links between them... ...and it's generally good to make human facing
> pages as well as machine-oriented RDF ones too.
>
> Bob: Ok, so that'll be microformats no wait microdata ah yeah, RDFa,
> right? Which version?
>
> Alice: well RDFa yes, microdata is a kind of cousin, a mix of thinking
> from microdata and microformats communities. But I meant that you'd
> make a version of each page for computers to use (RDF/XML like your
> test export here), ... and you'd make some kind of HTML page for more
> human readers also. The stuff you mention is more about doing both
> within the same format...
>
> Bob: Great. Which one's the most standard?  What should I use?
>
> Alice: Well I guess it depends what you mean by standard.
> [skips digression about whatwg and w3c etc notions of standards process]
> [skips digression about XHTML vs XML-ish polyglot HTML vs resolutely
> non-XML HTML5 flavours]
> [skips digression about qnames in HTML and RDFa 1.1 versus 1.0]
>
> ...you might care to look at using basic HTML5 document with say the
> Lite version of RDFa 1.1 (which is pretty much finished but not an
> official stable standard yet at W3C)
>
> Bob: [makes a note]. Ok, but that's just the human-facing page,
> anyway. We'd put up RDF/XML for machines too, right? Well maybe that's
> not necessary I guess. I was reading something about GRDDL and XSLT
> that automates the conversion, ... should we maybe generate the
> RDF/XML from the HTML+RDFa or vice versa? or just have some php hack
> generate both from MySQL since that's where the stuff ultimately lives
> right now anyway...?
>
> Alice: Um, well it's pretty much your choice. Do you need RDF/XML too?
> Well..... maybe, not sure... it depends. There are more RDF/XML
> parsers around, they're more mature, ... but increasingly tools will
> consume all kinds of data as RDF. So it might not matter. Depends why
> you're doing this, really.
>
> Bob: Er ok, maybe we ought to do both for now, ... belt-and-braces,
> ... maybe watch the stats and see what's being picked up? I'm doing
> this because of promise of interestingly unexpected re-use and so on,
> which makes details hard to predict by definition.
>
> Alice: Sounds like a plan. Ok, so each node in your RDF graph, ...
> we'll need to give it a URI. You know that's like the new word for
> URL,
> but that includes identifiers for real world things too.
>
> Bob: Sure sure, I read that. Makes sense. And I can have a URI, my
> homepage can have a URI, I'm not my home page blah-de-blah?
>
> Alice: You got it.
>
> Bob: Ok, so what URLs should I give the concepts in this thesaurus?
> They've got all kinds of strings attached, but we've also got nicely
> managed numeric IDs too.
>
> Alice: Right so maybe something short (URls can never be too
> short...), ... so maybe if you host at your example.org server,
> http://example.com/demothes/c1  then same but /c2 /c3 etc.
>
> ... or well you could use #c1 or #c2 etc. That's pretty much up to
> you. There are pros-and-cons in both directions.
>
> Bob: whatever's easiest. It's a pretty plain apache2 setup, with php
> if we want it, or we can batch create files if that makes more sense;
> this data doesn't change much.
>
> Alice: Well how big is the thesaurus...?
>
> Bob: a couple thousand terms, each with a few relations and bits of
> text; maybe more if we dig out the translations (humm should we
> language negotiate those somehow?)
>
> Alice: Let's talk about that another day, maybe?
>
> Bob:  And hmm the translations are versioned a bit differently? Should
> we put version numbers in somewhere so it's unambiguous which
> version of the translation we're using?
>
> Alice: Let's talk about that another day, too.
>
> Bob: OK, where were we? http://example.com/demothes/c1 ... sure, that
> sounds fine.
>
> ... we'd put some content negotiated apache thing there, and make c1
> send HTML if there's a browser, or rdf/xml if they want that stuff
> instead? Default to the browser / HTML version maybe?
>
> Alice: something like that could work. There are some howtos around.
> Oh, but if c1 isn't an information resource, you'll need to redirect
> with a 303 HTTP code. It's like you said with people and homepages, to
> make clear which is which.
>
> Bob: Oh-kay... so in our SKOS graph, it's a mix of things, the bulk is
> a load of descriptions of skos:Concept and there's a bit of metadata
> in there about some docs, and the admin contact info, ...  but yeah
> it's mostly the concepts (which seems to be the skos way to talk about
> thesaurus terms, sort of abstracted a bit to make translations easier,
> right?)
>
> Alice: Yup. Well, ... remember we're breaking up your graph into
> bits... like one chunk per page?
>
> Bob: Ah right, so is that one node in the graph per page? per ... erm,
> how do they call it? [counts on fingers] subject-predicate-object...
> er subject, right? Each object in my graph, er like OO object I mean,
> entity, thingy...
>
> Alice: -thingy is good-
>
> Bob: Each thing in the graph, goes in one page, more or less?
>
> Alice: more or less. It's up to you, I guess there are best practices,
> roughly the bulk of it, one page per concept, ... and then the
> metadata etc you might do differently
>
> Bob: Ok, so c1 is one concept, c2 is another, ... they'd have links to
> each other in the ... the RDF/XML files, right? And I guess the HTML
> too, sure
>
> Alice: Sure
>
> Bob: so the html rdfa stuff would be <a href='c2'>something and
> rel='broader' if c1 was broader than c2?
>
> Alice: er it might be broaderTerm, or broaderConcept, I forget...
> [searches]
>
> Bob: ah look, yeah skos:broader, ... ok so if c2 is more broad, er
> broader, more general, than c1, ... we put in the c1 HTML page a link
> over to c2, and add some RDFa too, to say what the link means in
> semantic rdf speak as well as clickable-link?
>
> Alice: [tips head on side], ... sorry I always get this stuff back to
> front. Ok, slowly. c2 is broader than c1, ... 'broader' points to the
> one that's broader, like you know more general, ... so let's say c1 is
> the specific, detailed one. In the c1 HTML page, we'd ...
>
> Bob: [interrupting] would that be c1.html? like concept ID dot h t m
> l, as a pattern?
>
> Alice: yes, you could call it that, ... it's up to you really but
> obviously it's sort of conventional. But then there's another
> convention of keeping
> the file types out of URLs
>
> Bob: So in the filesystem they might be a bunch of batch-generated
> HTML files called c1.html c2.html etc, but I'd keep that secret or
> obscure or hide it with apache config somehow?
>
> Alice: For example, yes. But ok, so c1.html would be like "blah blah
> blah, and then a paragraph describing concept c1 from your
>  thesaurus, ... which is (we say) some pretty specific topic, like er,
> say "allergy to pine nuts'... and maybe c2 is just 'pine nuts'
>
> Bob: Well it's an engineering terminology thesaurus, but sure. I get
> the idea. So we'd do <a href='/demothes/c2' rel='broader' ...
>
> Alice: in rdfa 1.1 lite that's property='broader', erm
> property='skos:broader', ... but sure, something like that. you might
> put the relationship first, it reads better. I think it means the same
> formally.
>
> Bob: right right, ... and in c2 HTML page, we'd do the link back the
> other way? is there a skos word for the opposite of broader,
> skos:narrower? [searches] ... ok looks like it, ... so I'd use that?
> it's sort of redundant I guess if you crawl all the pages, ... but you
> have to find the pages and links somehow, ... what if I started with
> some linked data agent thing on c2.html, how would it find c1.html to
> find that
> c1.html says that c2.html is broader?
>
> Alice: Good point. We can work some of this out later. There are also
> sitemap files, so in page links aren't the only way to find stuff.
> It's all sort of emerging best practices territory. Lots of early
> adopters figuring things out, if we get this working, maybe you could
> write up a case study?
>
> Bob: Or you could just tell me what to do. Hey, whatever happened to
> rev= ... is that still in XHTML?
>
> Alice: Which version? I mean, ... can we talk about this later?
>
> Bob: Right right. But couldn't I put "rev='skos:broader'" in c2.html, ...
>
> Alice: [patiently] ... you could, yes. Or both... there's a lot of
> flexibility in this system. In many ways it's a huge strength...
>
> Bob: Oh hang on, I found
> http://www.w3schools.com/TAGS/att_link_rev.asp and it says rev isn't
> supported by browsers; is that a problem.
>
> Alice: We're getting off the point a bit, ... Anyway I think Hixie
> took it out of HTML5 because it wasn't being used and people found it
> confusing. Or last time I looked anyway, I think it was gone.
>
> Bob: Righto. I can see that. So anyway, we'll make a load of HTML
> pages that describe our concepts...
>
> Alice: Yup, and we'll redirect /demothes/c2 to a page about c2, ... so
> things don't mix up information resources with non information
> resources. Oh and I'm not sure w3schools is always the best reference
> on this stuff...
>
> Bob: things on the Web and things that aren't on the Web. Ok, if not
> w3schools, where should I check?
>
> Alice: [ignoring w3schools question] ... exactly. things that aren't
> _on_ the Web. Or _in_ if you prefer. Like your concepts are a kind of
> abstraction so they're not really on the Web, ... they're just
> _described_ in the Web.
>
> Bob: so we redirect to c1.html etc?
>
> Alice: Sure we could do that, or if you want to keep the suffix out of
> the URL, which is considered good hygeine by some, you might
> for example use /demothes/about_c1 ... that's quite clean
>
> Bob: And if we get a content negotiated request for rdf/xml ...? ...
> send that instead, ... no redirecting
>
> Alice: something like that, I'll check the docs for you later. It's a
> bit fiddly but there are some examples around we can copy from,
> httpd.conf etc
>
> Bob: Great. And if someone asks for the rdf/xml version of about_c1?
>
> Alice: Not sure, I'll have to think a bit, but ... well sending the
> rdf along sounds ok. It's not quite the same as asking for c1 but ...
> well sure. Why not?
>
> Bob: What was the other option? #c1 ? No messing around with redirects
> there? Easier to bookmark?
>
> Alice: Well yeah, ... and to link to, ... but your data isn't tiny,
> ... a few thousand concepts you said. Could be a big page fetch each
> time.
>
> Bob: Is that a problem? How big is too big? We can cache internally so
> it's not hitting the db, right? Will intelligent agents and so on be
> reading this a lot? Do they choke on big files?
>
> Alice: Well, maybe not so intelligent. But the way URIs and URLs work,
> when there's a # in them, ... that doesn't get sent to the server
>  and so the server doesn't see the #c1 or #c2 or #c9999 bit, ... so it
> can only really send you the whole lot and the consuming code has to
> make sense of it by remembering what it asked for...
>
> Bob: ...well maybe this is still easier. And we can content negotiate
> still, right?
>
> Alice: sure. HTML+RDFa or RDF/XML or ... you heard of turtle and
> ntriples and there's this thing called json-ld ... but don't worry
> about that for now. Let's just think about RDF/XML and HTML+RDFa
> today, eh?
>
> Alice: [thinking...] well maybe just one of those would do, ... but
> it's not hard to generate both.
>
> Bob: Alright, so one big HTML+RDFa file with the thesaurus in it, in
> SKOS triples but prettied up a bit with CSS? Sounds ok...
>
> Alice: and a big RDF/XML doc too, if they ask for that instead
>
> Bob: got it. So ... hang on, back up a bit, ... if we're in one big
> HTML page, and I'm at the er what did you say, 'allergy to pine nuts'
> section, ... and I want to link to show that this concept has a ... a
> broader one which is just 'pine nuts', ... I put in '<a href='c1'
> property="skos:broader"> within the c2 bit?
>
> Alice: c1 was the broader one, I forget?
>
> Bob: er c2 was broader, general ... Pine Nuts only. So yup, within
> pine nuts section of this big HTML page at /demothes, we'd link up (or
> down, guess it doesn't matter the page order?) to the #c1 section.
> Remind me, I always mix up, is that <a name="c1"> or <a id="c2">?
>
> Alice: it's a little bit complicated [searches] but
> http://stackoverflow.com/questions/484719/html-anchors-with-name-or-id
> seems to cover it... ...er but look it's a bit fiddly this way, never
> mind the HTML attribute name for now we can look that up ... you don't
> want to call it c1 exactly, because that's the name of your concept
>
> Bob: And concepts aren't information resources?
>
> Alice: well obviously they sort of are _informational_ so that's why
> some people don't like that terminology, ... but that doesn't matter,
> the thing is they're not ... you know HTTP endpointy things, ... like
> data objects attached to a Web server, ... they're more abstract
>
> Bob: and so also they're not bits of an HTML page either? Right? So if
> I go linking with <a href="/demothes#c1" blabla, that's implying that
> c1 is a bit of a Web page... so that's an information resource, ...
> and really it's not because it's a thesaurus concept which is more a
> sort of social entity or conceptual or mental or something, ... not
> inside my server or page like a concrete information object?
>
> Alice: Y...es. Well you're mixing two things here a bit. Or three.
> Hang on. Two. Right:
>  First. We slipped from talking about the target of HTML hyperlinks
> (the id/name attribute stuff) to the markup at the start end of the
> link. <a href="/demothes#c1" is fine, so long as you're not really
> pointing at a page that has a section with name (or id) of 'c1'. It's
> the name end, the target stuff, that you can't put the thingy's URI
> into. It's ok to point because ... you're sort of saying something.
> But if you write the target markup, you're saying that c2 is part of
> the page. Which it isn't.
>
> Bob: o...k. Seems oddly asmmetrical somehow. But the '
> href="/demothes#c1" ' HTML ... it's pointing at a page, right? And if
> we go to it in a browser (unless it has a bunch of funny extensions,
> ... I got in a mess one time with Firefox addons I was trying, ...)
> ... we go to it in a browser we'll get an HTML page. And there'll be a
> bit of the page decribing our concepts c1 and c2 ... and in theory
> links can jump you down the classic way, to where you want to read?
> That's nice to have in documentation.
>
> Alice: Yes yes, ... just we don't name the page target parts, erm
> anchors, with that same name. As the skos thing, concept, I mean.
>
> Bob: Because it's not a webby thing it's a real worldy thing, even if
> it's still sort of about information? Like a book in my hand also is?
>
> Alice: Exactly
>
> Bob: [beams]
>
> Alice: So, ... right, ... we don't name the in-page targets the same
> as the things those bits of page describe?
>
> Bob: Ok, so like with the other design, we could call the page bits
> #c2_bit_of_the_page or something less verbose, just not #c2 because we
> already used that ID for something 'off Web', the concept itself?
>
> Alice: Yes.
>
> Bob: Doesn't that screw up scrolling?
>
> Alice: Well you could use some jquery thing  I found and that's quite
> nice actually because it scrolls smoothly and degrades gracefully and
> ... wait wait I'm talking nonsense, ... sorry. It's fine. You just put
> in two anchors and two links?
>
> Bob: They're called anchors at both ends of the link, right? Sort of
> nautical idea... ... in RDF links too?
>
> Alice: Er yeah yes. It's <a>. We don't really talk about anchors so
> much in the abstract rdf model. But it's a similar idea, hence the
> Linked Data thing?
>
> Bob: So the subject is an anchor, the predicate is like a kind of link
> (that's a 'rel' or whatever?) and the object in the triple is an
> anchor too?
>
> Alice: Well not exactly. You're ... well, sure. Yes. If that helps you
> think about it, ...  but what I meant to say was forget about jquery,
> it'll still scroll and stuff in the browser, so long as you have
> anchors like <a name="c2_bit_of_page" and <a name="c2">. Then for each
> semantic link, you can link to the semantic target - like c2 - with
> that, ... and in the human facing link, ... link to c2_bit_of_page.
> Hmm no wait, you'd want to hide the human one because when you click
> it it won't go to the part of the page, and you shouldn't put in a
> name="c2". Sorry, I'm tired.
>
> Alice: Ok sure, look is scrolling to the bit of the page important for
> you? Maybe the jquery thing could work? Or you can do something in
> CSS. It can't be that hard. There are lots of ways to do things with
> RDFa. If you get stuck we can ask in IRC or Twitter, people are really
> helpful (though not everyone agrees about this hash stuff and 303s)
>
> Bob: So which is simplest, really? Really big files: bad with #, ...
> but # lets you bookmark, ... but makes linking down to the right bit
> of the page somehow confusing...?
>
> ...could I rewrite the links in Javascript maybe? Ok ok, ... how's
> this! How about we make the HTML+RDFa page all nice and semantic, but
> put in a javascript that when the page loads --- only in browsers ---
> it rewrites all the links to be #bit_of_page links, ... then when
> clicked ... boom you hop to the right bit of the page. And still
> there's a big RDF/XML file content negotiable for older tools that
> don't read the HTML+RDFa .... everyone's happy!
>
> Alice: You still need to update all the URIs in your text RDF/XML file
> to be this new pattern we agreed, ... and that javascript thing is
> half
> sick and half clever, ... maybe it'd be fine. There might be browser
> addons that get confused if javascript is messing with stuff. But we
> got distracted. I'm not sure of anything using this stuff much, you
> might check in tabulator at least to see what it does.
>
> But we were talking about <a href="/demothes#c1" blabla,
>
> ... and I said you'd slipped from talking about the 'target' end of
> the anchor link,  to the 'source' end. And that the source in RDFa
> could mention things that weren't (and shouldn't) be actual HTML page
> targets. But also you were mixing up a bit, ... the idea of an
> information resource like a thing that's up there via some Web server
> and giving you content negotiated formats, ... with the idea of bits
> of a page being an information resource.
>
>  ... but at least we agreed that your skos concepts aren't either of
> those; they're abstractions. So the Web pages sort of describe them,
> ... and the bits of a big rdfa html page if we go with the # option
>
> Bob: ... we didn't really get on to the RDF/XML version
>
> Alice: that's a bit simpler, in a way... because only machines look at
> it and it doesn't care about prettyness or usability or browser
> behaviour.
>
> Bob: couldn't we put in some xslt,.. and make it work for both? like a
> stylesheet to make it into html+rdfa, ... browsers do that now don't
> they?
>
> Alice: In theory maybe; in practice that's not really something people
> seem to do. But look, we're going with the 'put it all in one big
> page' # version, so we just make the rdf/xml use those as the URIs and
> ... well we're done.
>
> Bob: Easy! So I just change some URLs and upload the RDF/XML, ... then
> write some script and make an HTML page with the right kinds of links.
> ....
>
> Alice: And we'll figure out some way to make it jump to bits of the
> page? Maybe...
>
> Bob: Can we do backlinks inside the page?
>
> Alice: like broader concept links back to the narrower?
>
> Bob: I can find some RDFa parser and put that in dynamically? or
> better to spell it out in the actual markup so the triples are there?
>
> Alice: yes, maybe better
>
> Bob: but ... that'll make the doc even bigger, ... if every broader
> term triple has an inverse link too.... guess it doesn't matter,
> webservers zip stuff on the fly don't they?
>
> Alice: they can, yeah. but look you can always do the http 303 thing
> if you're worried about size of the file,... chunk it up. Do the slash
> thing. Are you expecting your thesaurus to grow at all?
>
> Bob: it can be nice having one page per thingy, after all
>
> Alice: they're both easier in different ways
>
> Bob: thanks, you've been a big help. Would it be alright to just
> upload the skos dump file for now, ... maybe I'll zip it
>
> Alice: we ought to fix those URIs sometime
>
> Bob: maybe tomorrow?
>
> Alice: maybe tomorrow...
>

Ah the, "torture of choice", as the Germans say :)

What about one page per word:

http://example.org/words/happy

Each page contains exactly one data point at #this, #word (maybe #?) etc.

Then mark it up in RFFa with the relations to the other words?
Received on Wednesday, 28 March 2012 08:21:19 UTC