- From: Nathan <nathan@webr3.org>
- Date: Fri, 25 Feb 2011 05:01:01 +0000
- To: AWWSW TF <public-awwsw@w3.org>
- CC: Jonathan Rees <jar@creativecommons.org>
Following on from a chain of emails on the RDF-WG, Sandro just wrote up the following: At the foot of the email I've added some notes as to how I personally see this in relation to what we discuss on this list -------- Original Message -------- Subject: [GRAPHS] g-box, g-snap, and g-text Resent-Date: Fri, 25 Feb 2011 03:27:09 +0000 Resent-From: public-rdf-wg@w3.org Date: Thu, 24 Feb 2011 22:25:56 -0500 From: Sandro Hawke <sandro@w3.org> Organisation: World Wide Web Consortium (W3C) To: public-rdf-wg <public-rdf-wg@w3.org> References: <5AE83D03-F381-446A-BE1B-F0C520B8A887@ihmc.us> <4D6630C0.70808@vu.nl> <3E151CCF-6378-403E-BA23-D25C2D610127@ihmc.us> <4D66EDB5.5090409@webr3.org> I'm still having trouble following the discussion due to ambiguity of terms. But I don't want us to argue about terms at this stage. So I'd like to propose some temporary terms. They are intentionally a little quirky and not suitable for use in our final specs. Instead, they are meant to be short and unambiguous and relatively memorable. At the end of this email, I try to connect them to other people's terms. Here they are: 1. A "g-box" is a container, like a "set" data structure in programming. It holds some RDF arcs, with their nodes. (Alternatively, it holds some RDF triples.). G-boxes can overlap, sharing some of the same nodes and arcs. Two g-boxes can happen to have the same contents (right now) while being distinct g-boxes. G-boxes contents can change: today a particular g-box might contain the triples { my:a my:b _:x. my:a my:c _:x }, and tomorrow it might instead contain { my:a my:b _:x. my:a my:c2 _:x }. 2. A "g-snap" as an idealized snapshot of a g-box; it's a mathematical set of RDF arcs, with their nodes. (Alternatively, a mathematical set of RDF triples.) Like g-boxes, g-snaps can overlap, sharing nodes and arcs. Unlike g-boxes, it makes no sense to talk about g-snaps changing: they are defined to be exactly the collection of their elements. If a g-snap were to "change" it would simply be a different g-snap. If two g-snaps have the same nodes/arcs, they are really the same g-snap. The contents of a g-box at any point in time are a g-snap. 3. A "g-text" is a particular sequence of characters or bytes which conveys a particular g-snap in some language (eg turtle or rdf/xml). If you can parse a g-text, you know what is in the g-snap it conveys (except blank nodes, as discussed below). You can tell someone exactly what is in a particular g-box at some instant by sending them a g-text. (You send them the g-text which conveys the g-snap which is the current state/contents of that g-box.) Are those terms and descriptions clear enough? Are there edge cases they are missing? Now, about URIs: * A g-box can exist without any name or persistent way of referring to it; it can exist as a data structure in a running program, or I suppose it can exists in someone's mind. Long-lived g-boxes probably SHOULD be given a preferred single working URL, but there might be times when you do don't want to give it any, or when you want to give it several URLs. * You can convey a g-snap with a g-text, but I don't think you usually want to name them with URIs. Sometimes you want to put a g-snap into a URI, but that's rare, since in many cases g-snaps are too long for most URI-handling software. For constrained applications, though, where overrun is unlikely or okay, you can embed a g-text somewhere in an http URI (eg, as a query parameter), or maybe use "data:" URI. And blank nodes? I think it works like this: * Two g-snaps can contain the same blank node. A simple example of this is to take a g-snap containing at least one blank node, then construct another by adding the triple { my:a my:b my:c }. The original g-snap and the one resulting from the union both contain the same blank nodes. * By a similar argument, I believe two g-boxes can also contain the same blank node, although not all software will support this. Given a g-box A, I could construct A' to contain whatever A contains and also { my:a my:b my:c }. This happens sometimes in real programs; I'd be curious to know which RDF APIs disallow sharing blank nodes between their graph-storage instances; my experience is they allow it when it's not a problem (eg they are both in memory right now). * In general, while g-texts do convey g-snaps, they do not identify the blank nodes in them. So, in fact, if you go g-snap A --> g-text --> g-snap A' A=A' only if it does not contain blank nodes, because parsing a g-snap results in all-new blank nodes. We might define new RDF syntaxes which allow for several g-texts to be grouped in such a way that blank nodes can be shared between them. This is an issue for our work item, "Either [the turtle] syntax or a related syntax should also support multiple graphs and graph stores." How's that sound? Make sense? Okay, relating to other people's terms... "Tokens", as I read today's email, seem to mostly be g-texts but sometimes be something that can change over time, and thus be a container for a g-text, something we might call a "g-text-box". I think this later meaning conflates things in a way which will cause problems, eg for understanding content-negotiation. "Graphs" in the RDF Semantics are g-snaps. "Named Graphs", as in SPARQL 1.0, are g-boxes which happen to each be assigned a URI. "Graph Literals", as suggested by N3 (and disagreeing with Nathan, sorry), are a feature of an RDF syntax that allows you to denote a g-snap by a special kind of term (a "graph literal"). In n3, it looks like: { _:x my:says { _x: foaf:name "Sandro Hawke" } }. One can approximate this with every RDF syntax by using a suitably-defined URI scheme or datatype, such as: { _:x my:says "_:x <http://xmlns.com/foaf/0.1/name> \"Sandro Hawke\""^^my:turtleCode } This isn't as convenient as the N3 approach, and doesn't doesn't allow blank nodes to be shared (in the second example, the _:x's are not connected), but it does work in existing RDF syntaxes. I'd better stop now. ================ nathan's notes: in linked data terms, I very much see an "information resource" as being a g-box, identified by an URL (absolute-IRI with an http scheme), which when poked (when you GET it) returns back a g-snap (rdf graph) which represents the current state of the g-box/ir encoded in a lexical data format, a g-text (a representation). the things described by the statements in a g-box can be given a name (fragment), which when concatenated to the g-box name (URL) creates an IRI with a fragment which can be used to refer to those things (this was what I was getting at with WebNames, the g-box/ir acts as a namespace), where I believe the "error" (if we can call it that) lies in web-arch, is that the following may be wrong: "the significance of the URI with the hash on is a function of the language of document you get when you dereference the thing before the hash." [1] and rather that it could be: "the hash is a local name scoped within namespace referred to by the absolute-IRI before the hash, these local names can be used by applications operating within the scope of the namespace or in representations retrieved from the namespace where the namespace corresponds to a dereferencable resource". anyhow, to focus more on what we cover and the current state of affairs, the toucan problem can be seen as using an absolute-IRI to refer to both the g-box and a thing primarily described by statements in the g-box - in other cases we can see people using the absolute-IRI to refer to the g-box (ir), a g-snap (it's current state) and a g-text (representation of it's current state) this could be what one could term a "fixed resource", and in the conneg world we can sometimes see a "resource" (can't say it's an IR) as being a mapping to several different g-box's (for instance when you conneg over language) or as being mapped to a single g-snap at a specific point in term, which can be transferred using conneg'd g-texts (same value at time t, different representations of that value available). I'd suggest that the ambiguity of IRs in the RDF world could be cleared up by saying that absolute-IRIs of the http variety always refer to g-boxes and never the things described by the statements within a g-box. That g-snap's are never identified by an URI by default (unless it's a data uri encoding a graph literal), and likewise that g-texts are always unnamed by default. To scope out from sem-web land, the notion of a "box" ties in nicely with REST imho, a "rest:resource" is a "box" which has a set of values (states) over time, at a single instant the value of that box is a "snap" (snapshot of it's state) and this is encoded in "text" (a representation) to be transferred over the wire. The box can be seen as a magic box, because it may hold a single thing for as long as the box exists (a static image for instance) or it may hold something that does things (a web application for instance) - so the absolute-IRI names the "box" and the fragments are for use by things in that box (like a web application with recomposable states, a document with named sections, a video with cues, or a description which refers to concepts). To quote sandro, "I'd better stop now."! is that making sense though? [1] http://lists.w3.org/Archives/Public/www-tag/2002Mar/0150.html Best, Nathan
Received on Friday, 25 February 2011 06:02:12 UTC