- From: <Patrick.Stickler@nokia.com>
- Date: Mon, 12 Nov 2001 17:25:01 +0200
- To: w3c-rdfcore-wg@w3.org
Definition of X Proposal, with examples This is the definition of my X Proposal, as it has been named, expressed (unfortunately) in non-mathematical terms, to the best of my ability. Although I am currently digesting the present MT and the other proposals being offered, I have tried to avoid including any direct discussion of those other proposals as I felt it would lengthen an already long document and possibly add confusion as to the boundaries between the different proposals. Certainly, this proposal has many points of intersection with the others, and those hopefully will be obvious, but this document is expressed independently of the other proposals. I have organized the content in the following (non-conventional) manner: I first provide a glossary of terms. This is so that you know right off how I am using terms that are recognizable to you but may not match identically your expected meaning. It also provides a brief introduction to new terms that I use in this description, as a preview of things to come. I then provide a list of assumptions, assertions, and my summary of the problem and the solution offered by this proposal, so that they do not get lost in the subsequent discussion and missed. This is then immediately followed by a detailed discussion of the problem space and my proposed solution, including a discussion of where URVs fit into all this. I provide examples througout which hopefully clearly illustrate the concepts outlined in this proposal. It may be the case that this proposal may be too radical for the present scope of our charter, and its adoption may correspond to a new version (either major or minor) of RDF. If this is so, then I assert that (a) the data typing problem cannot be properly solved within the constraints of our present charter, in line with common interpretation of the RDF and RDFS specs, and (b) we must commence the definition of such a new version of RDF as soon as possible. I hope that this will become evident from the discussion below. ====================================================================== GLOSSARY OF TERMS value space An abstract set of entities sharing common properties (very loose definition) value A member of a value space representation space A set of concrete representations mapping to values in a value space which facilitate automated operations in terms of those values -- e.g. the reification of a value space within an computer system representation Within a representation space, a concrete representation of a value in the corresponding value space canonical representation space A representation space where each value in the value space has only one possible representation in the representation space (the internal representation space of a computer system is a canonical representation space) lexical space An set of concrete lexical representations (strings) which represent values in a specific value space, defined in terms of a lexical grammar lexical form Within a lexical space, a concrete lexical representation (string) of a value in the corresponding value space, which is valid according to the defined lexical grammar canonical lexical space A lexical space where each value in the value space has only one possible representation in the lexical space data type An explicit lexical space whose members map to values in an explicit value space (RDF) literal A string typed (RDF) literal A lexical form local type A data type associated directly with an occurrence of a value serving as the object of a statement global type A data type associated globally with all occurrences of a value serving as the object of a statement having a particular predicate (i.e. via an rdfs:range definition) descriptive range A range definition for a particular predicate defining a global type for all values of that predicate prescriptive range A range constraint for a particular predicate defining a global type which all local types for all values must be equivalent to (either identical to, or a subclass of, the defined range class) node The basic construct of an RDF graph, per this proposal node facet A primitive property of a graph node serving as the label of an arc arc A named relation between two nodes, from the perspective of one node (source node) towards the other (target node), corresponding to a facet LNode A node representing a resource labeled by an RDF Literal UNode A node representing a resource labeled by a URI Reference SNode A node representing an RDF Statement BNode A node representing an anonymous resource with no label qualifying statement A statement where the subject is represented by an SNode statement qualification A limitation on the applicability of a statement for certain processes; such as scope, source, authority, or authentication literal match The binding of a statement to a query where the statement and query are expressed in the same vocabulary and in terms of the same data typing scheme inferred match The binding of a statement to a query where the statement and query are not expressed in the same vocabulary and/or in terms of the same data typing scheme but which are deemed equivalent according to rdfs:subClassOf or rdfs:subPropertyOf relations between those vocabularies level 0 graph A maximal representation of an X Proposal graph where every node from every statement is distinct, and having no compression whatsoever level 1 merge A transformation on a level 0 graph such that all UNodes with identitical uriref labels and SNodes where subject, predicate, and object nodes are all UNodes with identitical uriref labels respectively are merged level 1 graph A graph which is derived from a level 0 graph by means of a level 1 merge, either virtually or destructively ====================================================================== PROPOSAL IN A NUTSHELL Assumptions and Assertions: The representation and interpretation of data types should be: a. consistent b. explicitly defined by the RDF specification c. as neutral as possible with regards to data type scheme d. compatible with XML Schema data types The solution adopted must: a. not deviate significantly from the present specification, either with regards to XML serialization or graph representation b. be sufficiently future proof to allow for extension to address known or future issues with minimal impact to existing systems No interpretation of data types will be provided by RDF. Any interpretation of RDF encoded knowledge based on a defined correlation between an RDF node and a particular data type is application specific and beyond the scope of RDF. RDF will only concern itself with the specification of relationships between nodes and types, and the preservation of such information for interpretation in contexts outside the scope of RDF, not the interpretation itself. Typed literals constitute lexical forms within a given lexical space and which map to values in a given value space. The proper interpretation of a typed literal requires both the lexical form and the identity of the lexical and value space for which the lexical form is expressed. Separation of a lexical form from either the lexical space or value space for which it was originally expressed renders it uninterpretable in a reliable manner. The rdfs:range property may function as either prescriptive or descriptive, depending on the presence or absence of a local type for the object of a statement. In order for rdfs:range to function prescriptively, there must be both: a. a range value defined for the property of a statement b. a local type defined for the object of the statement In the absence of a local type, and in the presence of a range definition for a given property, the type of the object of a statement is taken to be that defined as the range of the property. Query processes, while not explicitly defined by the RDF specification, should be taken into account with regards to the representation and interpretation of RDF encoded knowledge. Query processes which employ inference based on rdfs:subPropertyOf relations may bind objects to predicates which are superordinate to the predicate of the original statement. Query processes which employ inference based on rdfs:subClassOf relations may bind literals to types which are superordinate to the type originally defined for the literals. Query processes which bind a non-locally typed literal to a superordinate predicate different from that of the original statement and which may have a range defined which differs from the range defined for the original predicate effectively separate the lexical form embodied in that literal from the lexical space for which it was originally expressed, rendering it uninterpretable in a reliable manner. Query processes which bind a locally typed literal to a superordinate type different from that originally defined for the literal effectively separate the lexical form embodied in that literal from the lexical space for which it was originally expressed, rendering it uninterpretable in a reliable manner. ---------------------------------------------------------------------- Conclusions: In the absence of a local type, range may be descriptive. In the absence of a local type, range cannot be prescriptive. In the presence of a local type, range may be prescriptive. We MUST impose the requirement that all data type classes define a value space that is a proper subset of the value space of all superordinate data type classes. We CANNOT impose the requirement that all data type classes define a lexical space that is a proper subset of the lexical space of all superordinate data type classes. The reliable interpretation of non-locally typed literals by rdfs:range definitions requires the absolute persistent preservation of the binding between predicate and object per the original statement. The reliable interpretation of locally typed literals requires the absolute persistent preservation of the binding between object and type per the original statement. ---------------------------------------------------------------------- Proposed Solution: The basis for the graph representation, and all operations and interpretations, should be the explicit reification of the statement. An RDF graph should represent the statements which constitute knowledge, and the present RDF graph model should be seen as a higher level resource-centric view or interpretation of that underlying statement-centric graph. Thus, rather than the present graph representation: [urn:foo] --- urn:someProperty ---> "bar" we should have instead, for every statement, a canonical underlying representation as follows: [ ] | ---- ID ----------> 1 | ---- type --------> SNode | ---- subject -----> [ ] | | | ------ ID ------> 2 | | | ------ type ----> UNode | | | ------ label ---> <urn:foo> | ---- predicate ---> [ ] | | | ------ ID ------> 3 | | | ------ type ----> UNode | | | ------ label ---> <urn:someProperty> | -----object ------> [ ] | ------ ID ------> 4 | ------ type ----> LNode | ------ label ---> "bar" which can be more concisely represented as: [1,S] | ---- subject -----> [2,U,urn:foo] | ---- predicate ---> [3,U,urn:someProperty] | -----object ------> [4,L,bar] or minimally represented as [1,S,2,3,4] [2,U,urn:foo] [3,U,urn:someProperty] [4,L,bar] This model and its graph notation, along with two possible implementational representations (in Java and Relational Tables) are described in detail below. Again, the current RDF graph representation is merely a resource-centric logical view or interpretation of the latter representation, and the latter statement-centric representation is the key to the data type solution and is the heart of this proposal. The statement-centric graph representation provides the key constructs necessary for preserving the relationships between predicate and non-locally typed value and local type and value, such that meaningful constraints on query operations and interpretation of query results can be defined. It also provides the key construct necessary for addressing the needs of statement qualification, such as source, authority, scope, and authentication; as well as for the differentiation between general statements, asserted statements, and inferred statements. A query on an RDF graph always matches and returns complete statements, not object values or other partial knowledge, and if a statement is matched by inference, then either the original statement is returned as-is (such that all original knowledge is preserved and available for reliable interpretation) or the query engine is responsible for deriving and returning an entirely new statement from the original statement, expressed in terms of the query ontology and data type scheme, taking into account all issues relating to mapping and conversion of literals to conform to the lexical and value space of the query ontology and data type scheme (and since it has the original statement to work with, it has all the information needed for reliable interpretation). Thus, statements are not just first-class constructs in the graph, they are the *primary* constructs of the graph and the basis for interpretation and interaction with graph encoded knowledge. All of this is discussed in detail below. ====================================================================== DISCUSSION Descriptive vs. Prescriptive role of rdfs:range Given the present RDF graph model, and "standard" behavior of inference derived binding based on triples with non-locally typed literal objects, the rdfs:range property may only be safely descriptive of a literal value's data type iff RDF requires that any data type that is a rdfs:subClassOf any other data type constitute a perfect subset of both the value space and lexical space of the superordinate data type, and that any property that is an rdfs:subPropertyOf another property have a range defined that is a data type which is either equivalent to or a decendant of the range type for the superordinate property. This ensures that if a non-locally typed literal value is bound by inference to a superordinate property than for which it was originally defined, any application which determines the type of that literal via the defined range for the superordinate property, will be able to interpret its lexical form reliably to obtain the properly corresponding value. If the above constraints cannot be enforced, and we continue with the present graph model where inference may separate a value from the predicate of the original statement, then rdfs:range can only serve a prescriptive purpose, to ensure that locally typed literal values correspond to the specified data type. Furthermore, it means that non-locally typed literals may not have a reliable interpretation in all inference derived contexts and therefore, it should be strongly advisable to always specify the type of literals when they are defined. It should be noted, that the XML Schema simple data types do *not* conform to the above constraints, even if RDF were to impose them. Furthermore, there are likely to be numerous data type schemes which also do not conform to such tight constraints, and thus it would be imprudent and impractical to propose the adoption of such constraints. HOWEVER, the above is only true for the present graph model... By basing the graph model on the reification of the statement, and defining the behavior of query processes such that original statements are returned in their original state, and defining inference processes such that inferred matches return the original statement unchanged (only the match being inferred) or generate new statements derived from the original statements (including dealing with all lexical issues for the interpretation of lexical forms embodied in literals), we can ensure that even if a given statement is matched by inference, the entire original statement is returned -- providing also the original predicate by which, via its range definition, the type and lexical form of a non-locally typed literal can be properly interpreted, either by the recieving client or by the query engine itself for the purpose of deriving an inferred statement accordingly. Thus, the answer to both data type integrity and reliable interpretation of untyped literals by property range *and* the qualification of statements for scope, source, authority, authentication, etc. are addressed by the following proposed graph model, which has as its foundation the reification of the statement itself. ====================================================================== PROPOSED CANONICAL GRAPH REPRESENTATION The following is a graph representation which is based on the reified statement as its foundational construct. The current RDF graph model may be seen as a logical view or interpretation of this proposed model, and thereby, this model does not conflict with, nor replace the current graph model, but rather serves as a new foundational layer below it, as a basis for the MT interpretation of RDF encoded knowledge. Types (classes) of graph nodes: SNode Statement Node UNode URIRef Labled Node LNode Literal Labled Node BNode Blank Node The distinction between the types of nodes is relevant both for the allowed/required facets as well as for merge operations performed on graphs of different levels of representation (explained below). Facets (properties) of graph nodes: ID SysID type (SNode|UNode|LNode|UNode) label for UNode, URI Reference for LNode, RDF Literal subject for SNode, SysID predicate for SNode, SysID object for SNode, SysID NOTE: Although facets constitute "properties" of graph nodes, they are not represented by RDF Statements, but are primitives of the underlying graph representation. A node is required to have one and only one facet value for the properties ID and type, and may have at most one facet value for the property label. An SNode is required to have one and only one facet value for each of the properties subject, predicate, and object. The value of a label for a UNode must be a URI Reference. The value of a label for an LNode must be an RDF Literal. Thus, an RDF Statement is reified by an SNode and that reification is the basis for this revised graph model and its interpretation. ---------------------------------------------------------------------- Graph notation: A node is represented by a comma separated sequence of ID, type, and (if present) label which is surrounded by square brackets. The type is represented by an uppercase character S, U, L, or B denoting an SNode, UNode, LNode, or BNode respectively. I.e. '[' ID ',' [SULB] ( ',' label )? ']' E.g. [1,S] [3,U,urn:someProperty] [4,L,bar] [9,B] Subject, predicate, and object facets may be represented by arcs with the facet name serving as the arc label and the arc represented by an arrow terminating in the value of the facet. E.g. [1,S] ---- subject -----> [2,U,urn:foo] In cases where the graph is too large to explicitly make the connection, the node ID value can be shown instead. I.e. [1,S] ---- subject -----> 2 An absolute minimal representation can be provided as a list of node definitions such that for SNodes, the values of the subject, predicate, and object facets are listed by node ID in that order, and the arcs are implicit. E.g. [1,S,2,3,4] [2,U,urn:foo] [3,U,urn:someProperty] [4,L,bar] NOTE: If UUID values (or similar) are employed as system identifiers for the values of ID facets, then knowledge encoded in this graph representation would be fully portable without modification across disparate systems and applications. ---------------------------------------------------------------------- Asserted and Inferred Statements: At this level of representation, an SNode does not necessarily represent an asserted statement nor an explicitly defined statement (e.g. loaded from some serialized instance). Assertion and nature of definition are qualifications of the statement (statements about the statement) and such qualifications are defined in terms of RDF Statements and not in terms of graph primitives (facets of SNodes). A statement is just a statement. Its significance, status, role, purpose, relevance, etc. in a given context must be inferred from its qualifications. This is outlined in more detail immediately below. ---------------------------------------------------------------------- Qualification of Statements Although the issues relating to the reification and general qualification of statements have been deferred to future working groups, this proposal includes as a component a treatment of these issues as the mechanism by which statements are differentiated for assertion and inference on the basis of this reification. This same treatment also serves to address other types of statement qualification such as scope, source, authority, and authentication. NOTE: If this treatment of statement qualification and reification is deemed acceptable to the WG, we may choose to readdress some of the recently deferred issues and outline their solution in terms of this proposed treatment. It must be stressed that, according to this proposal, the key to solving the data type problem -- namely, the reified statement construct -- is also the key to solving the statement qualification problem, thus this proposal essentially kills both birds with one stone. For a given process or operation, relevant statements can be identified by specifying qualifications as either inclusive (only statements matching those qualifications) or exclusive (no statements matching those qualifications), and of course a combination of inclusive and exclusive qualifications can be defined. Several examples are provided below illustrating statement qualification based on the following treatment and the proposed statement-centric graph representation. Ontology for Statement Qualification: rdfq:scope domain = rdf:Statement, range = {URI Ref} rdfq:source domain = rdf:Statement, range = {URI Ref} rdfq:authentication domain = rdf:Statement, range = {URI Ref} rdfq:attributedTo domain = rdf:Statement, range = {URI Ref} rdfq:assertedBy domain = rdf:Statement, range = {URI Ref} The latter four qualification properties are sub-properties of rdfq:scope which constitutes a generic qualification property. Note that the concept of an 'inferred statement' is defined in terms of the authority which asserts the statement or to which the statement is attributed; where that authority may be the system itself or a inference agent employed by the system. Whether a given operation wishes to include only asserted statements from trusted authorities, or also include statements attributed to trusted authorities (hearsay) or include all statements is up to the particular application. It should be stressed that because qualifications are statements, and statements are always reified in this graph model, the qualifications themselves may be qualified. NOTE: In the examples that follow, for convenience I use qualified names having xsd:, rdf:, rdfs: and rdfq: prefixes for vocabulary terms corresponding to XML Schema data types, RDF, RDFS, and the above qualification ontology. Such qnames are enclosed in curly brackets. The curly brackets are not part of the notation (which does not understand namespaces). These qnames are not to be confused with URN or URV encodings such as urn:foo or xsd:lang:en which are complete URIs. I trust this distinction will be clear in all examples. I also employ local URI refs without expansion (e.g. #green) though in practice, all URI refs should have their fully expanded representation. Example 1: "John says that Mary says that Bob says the sky is green": -----------> [1,S] | | | ---- subject ------> [2,U,#Sky] | | | ---- predicate ----> [3,U,#is] | | | ---- object -------> [4,U,#green] | -----------------> [5,S] | | | | --- subject ---- | | | ---- predicate ----> [6,U,{rdfq:attributedTo}] | | | ---- object -------> [7,U,#Bob] | ---------------------> [8,S] | | | | --------- subject ---- | | | ---- predicate ----> [9,U,{rdfq:attributedTo}] | | | ---- object -------> [10,U,#Mary] | | [11,S] | | ------------- subject ---- | ---- predicate ----> [12,U,{rdfq:assertedBy}] | ---- object -------> [13,U,#John] Or in maximally condensed form: [1,S,2,3,4] [2,U,#Sky] [3,U,#is] [4,U,#green] [5,S,1,6,7] [6,U,{rdfq:attributedTo}] [7,U,#Bob] [8,S,5,9,10] [9,U,{rdfq:attributedTo}] [10,U,#Mary] [11,S,8,12,13] [12,U,{rdfq:assertedBy}] [13,U,#John] Example 2: Typed, Scoped Values Here is an example of how this same treatment provides for general qualification of statements, including statements defining scoping and data type association (here's how this proposal addresses the core problem of data typing): -----------> [1,S] | | | ---- subject ------> [2,U,#status] | | | ---- predicate ----> [3,U,{rdf:label}] | | | ---- object -------> [4,L,Status] | | [5,S] | | --- subject ---- | ---- predicate ----> [6,U,{rdfq:scope}] | ---- object -------> [7,L,en] ^ ^ --------------------------------------| | | | | [8,S] | | | | | ---- subject ------------- | | | ---- predicate ----> [9,U,{rdf:type}] | | | ---- object -------> [10,U,{xsd:lang}] | -----------------> [11,S] | | | | --- subject ---- | ---- predicate ----> [12,U,{rdf:label}] | | | ---- object -------> [13,L,English] | | [14,S] | | --------- subject ---- | ---- predicate ----> [15,U,{rdfq:scope}] | ---- object -------> [16,L,en] ^ ^ ... ----| | | ... ------> [17,S] | | | ---- subject ------------- | ---- predicate ----> [18,U,{rdf:type}] | ---- object -------> [19,U,{xsd:lang}] or [1,S,2,3,4] [2,U,#status] [3,U,{rdf:label}] [4,L,Status] [5,S,1,6,7] [6,U,{rdfq:scope}] [7,L,en] [8,S,7,9,10] [9,U,{rdf:type}] [10,U,{xsd:lang}] [11,S,7,12,13] [12,U,{rdf:label}] [13,L,English] [14,S,11,15,16] [15,U,{rdfq:scope}] [16,L,en] [17,S,16,15,16] [18,U,{rdf:type}] [19,L,{xsd:lang}] ... NOTE: Notice the infinite recursion required for labeling and typing of locally typed literals. See below for examples of how URVs alleviate this problem without recourse to rdfs:range definitions or application specific knowledge. Furthermore, this potentially infinite body of knowledge must be defined for *every* instance of such qualification, resulting in a gross proliferation of needlessly redundant statements. See the URV example below for a better way to encode such knowledge. ---------------------------------------------------------------------- Levels of Graph Compression/Distillation The proposed graph model includes the definition of two levels of graph representation: Level 0: Maximal Representation Every node from every statement is distinct. No compression whatsoever. Level 1: URI Ref Equivalence UNodes with identitical uriref labels and SNodes where subject, predicate, and object nodes are all UNodes with identitical uriref labels respectively are merged There may be additional levels of graph compression, based on inference or other criteria, but those are undefined by this graph model. An API can provide access to statements at any of the defined levels (presuming all are maintained) and the upper levels can be simulated at run time as needed. A given system, however, may choose to maintain knowledge only at a higher level (e.g. level 1) performing merge operations on insertion of statements in to the system, for the sake of storage efficiency, as this can reduce a graph's size considerably and the utility of a level 0 representation is limited. The direct benefits of level 1 compression are discussed further below (though they should be immediately evident). ---------------------------------------------------------------------- Constraints on Query and Inference Behavior A query applied to an RDF graph, based on this proposed representation, must return only SNodes, not LNodes, UNodes, or BNodes. Statements can be filtered as needed/desired either during or after execution of a query according to any specified qualifications, such as excluding non-asserted statements or statements not having a particular scope or trusted authority. Statements which are returned by a query are returned as originally defined. Any statements which were matched by inference rather than literal match must be returned in their original form, or may be mapped by the query API to new statements using the query vocabulary and data type schemes of the query properties, without change to the original statements. Whether the API interns the new statements in the knowledge base, or only treats them as transient statements to be discarded after returning the query results is up to the specific implementation or process. Queries may differentiate between non-asserted statements, asserted statements, and inferred statements as needed/desired, as this distinction is just like any other statement qualification. ====================================================================== POSSIBLE IMPLEMENTATIONS Representation 1: Relational Table Model Table Schema 1: Node Field 1: ID(UUID) Field 2: Type('UNode'|'LNode'|'SNode'|'BNode') Field 3: Label(URIREF|LITERAL|'nil') Field 4: Subject(UUID|nil) Field 5: Predicate(UUID|nil) Field 6: Object(UUID|nil) ---------------------------------------------------------------------- Representation 2: Linked Object Model (skeletal) abstract public class Node { protected UUID id; } public class SNode extends Node { protected UUID subject; protected UUID predicate; protected UUID object; } public class UNode extends Node { protected URIREF label; } public class LNode extends Node { protected LITERAL label; } public class BNode extends Node; It is presumed that the above object model is combined with a dictionary, map, hash table or other similar mechanism by which individual nodes can be located by either label or node ID. ====================================================================== RELATION OF GRAPH MODEL TO URV ENCODING The use of a URI Ref to identify a resource is an implicit agreement or contract with all others making statements that everyone using that URI Ref is talking about the same 'thing'. Thus, there is the expectation that all statements relating to such a 'thing' would combine upon syndication to provide for a consolidated body of knowledge about that 'thing'. The benefit of a "destructive" (non-virtual) level 1 merge is to substantially reduce graph real-estate where all UNodes are combined. Thus, the more UNodes that are combined, the greater the compression. And in fact, is conceivable that a level 1 merge would be applied on all input automatically. By encoding typed data literals in URVs, all such values are able to be merged in a level 1 merge, rather than remain as locally qualified LNodes, thus achieving substantial reduction in graph real-estate. E.g. in a large knowledge base about people where individuals' ages are defined as nonNegativeInteger values, rather than have one age value node for each person, along with a complete statement qualifying that node for type, a URV encoding allows for a single UNode to be shared for each equivalent age value, for all persons having the same age. Thus, in a context of millions of persons, one could achieve substantial compression in the graph with regards to knowledge about age, without any loss of information whatsoever. Furthermore, one can more efficiently locate persons of a particular age by simply extracting all age statements with that URV as the object, thus increasing search efficiency. A query API can hide the details of the URV encoding, if so desired, and resultant level 1 merge compression by always expanding the value out to a normalized LNode with associated rdf:type qualification statement. The verbose, potentially infinite example shown above can be redefined using URVs in a more concise, finite form as follows (showing a level 1 merge compression): -----------> [1,S] | | | ---- subject ------> [2,U,#status] | | | ---- predicate ----> [3,U,{rdf:label}] | | | ---- object -------> [4,L,Status] | | [5,S] | | --- subject ---- | ---- predicate ----> [6,U,{rdfq:scope}] | | | ---- object -------> [7,U,xsd:lang:en] <-- ^ ^ ^ | . . . . . . . . . . . . . . . . . . . . . . . |.| . . . . .|. . | . | | | | --------------------------------------| | | | | | | | | [8,S] | | | | | | | | | ---- subject ------------- | | | | | | | ---- predicate --> [9,U,{rdf:type}] | | | | | | | ---- object ------------------------- | | | -----------------> [10,S] | | | | | | --- subject ---- | | | | | ---- predicate ----> [11,U,{rdf:label}] | | | | | ---- object -------> [12,L,English] | | | | [13,S] | | | | --------- subject ---- | | | ---- predicate ----> [14,U,{rdfq:scope}] | | | ---- object ------------------------------ Note that the knowledge below the dotted line (. . . .) is defined globally only once for the resource xsd:lang:en even if that resource is used millions of times to qualify a statement. Without a means such as URV encoding to define first class resources (with URI identity) this knowledge would have had to be duplicated those millions of times. Hopefully the practical benefit of URV encoding, and this proposed graph representation and iterpretation, are clear from this example. ====================================================================== SERIALIZATION AND MAPPING TO GRAPH REPRESENTATION Statement qualification properties can be defined as attribute values on certain RDF/XML elements, with interpretations as follows: rdf:RDF Qualifications apply to all statements in instance scope rdf:Description Qualifications apply to all statements in description scope (property element) Qualifications apply only to specific statement All qualification property attributes may take multiple whitespace separated values, which are expanded into individual qualification statements. Example 1: Instance Level <rdf:RDF rdfq:scope="urn:bas"> ... </rdf:RDF> Defines the following qualifying statement for all statements in the RDF instance: [#A,S] | ---- subject ----> [...] | ---- predicate --> [#B,U,{rdfq:scope}] | ---- object -----> [#C,U,urn:bas] where #X, #A, #B, and #C are instantiated to node IDs for each qualifying statement and the subject ID, type and (if present) label correspond to the qualified statement. This level of definition is especially useful for defining qualifications for source, authentication, and authority which typically are shared for all statements in a given instance. Example 2: Description Level <rdf:Description rdf:about="urn:boo" rdfq:scope="urn:bas"> <x:property1 rdf:resource="urn:foo"/> <x:property2 rdf:resource="urn:bar"/> </rdf:Description> Defines the same qualifying statement as in example 1 above for both statements, one each for x:property1 and x:property2. Example 3: Property Level <rdf:Description rdf:about="urn:boo"> <x:property1 rdf:resource="urn:foo" rdfq:scope="urn:bas"/> <x:property2 rdf:resource="urn:bar"/> </rdf:Description> Defines the same qualifying statement as in example 1 above, but only for the x:property1 statement. Example 4: Equivalence between Description and Explicit Reification The following two serializations have identitical representation in the graph, according to this proposal: Serialization 1: <rdf:Description rdf:about="urn:boo"> <x:property rdf:resource="urn:foo" rdfq:scope="urn:bas"/> </rdf:Description> Serialization 2: <rdf:Statement rdfID="X"> <rdf:subject rdf:resource="urn:boo"/> <rdf:predicate rdf:resource="{x:property}"/> <rdf:object rdf:resource="urn:foo"/> </rdf:Statement> <rdf:Description rdf:about="#X"> <rdfq:scope rdf:resource="urn:bas"/> </rdf:Description> Graph Representation: ----> [1,S] | | | ---- subject ----> [2,U,urn:boo] | | | ---- predicate --> [3,U,{x:property}] | | | ---- object -----> [4,U,urn:foo] | [5,S] | | | ---- subject --- | ---- predicate --> [6,U,{rdfq:scope}] | ---- object -----> [7,U,urn:bas] --- That's all folks... ;-) Patrick -- Patrick Stickler Phone: +358 50 483 9453 Senior Research Scientist Fax: +358 7180 35409 Nokia Research Center Email: patrick.stickler@nokia.com
Received on Monday, 12 November 2001 10:25:12 UTC