Comments in bold green are stylistic and from bwm. Suggested replace text has no emphasis.
In general, there is a lot of redundancy in some areas of the text. I've taken a fairly ruthless green pen to it.
Section 2 needs a fair amount of work refining the text. I'd have preferred to suggest alteranatives, but there is just too much to do in the time I have today.
Copyright ©2002 W3C® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
The Resource Description Framework (RDF) is a data format for representing metadata about Web resources, and other information.
Align with similar statements in other docs. See suggestion.
This document defines the abstract graph syntax on which RDF is based, and which serves to link its XML serialization to its formal semantics. It also describes some other technical aspects of RDF that do not fall under the topics of formal semantics, XML serialization syntax or RDF schema and vocabulary definitions (which are each covered by a separate document in this series). These include: discussion of design goals, meaning of RDF documents, key concepts, datatyping, character normalization and handling of URI references.
Suggest: This document defines the abstract syntax on which RDF is based, and which serves to link its various concrete syntaxes to its formal semantics. It also includes discussion of design goals, meaning of RDF documents, key concepts, datatyping, character normalization and handling of URI references.
This is a W3C RDF Core Working Group Working Draft produced as part of the W3C Semantic Web Activity (Activity Statement).
This document is being released for review by W3C Members and other interested parties to encourage feedback and comments, especially with regard to the sections on datatyping and how the changes affect existing implementations and content.
This is a public W3C Working Draft and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use W3C Working Drafts as reference material or to cite as other than "work in progress". A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.
In conformance with W3C policy requirements, known patent and IPR constraints associated with this Working Draft are detailed on the RDF Core Working Group Patent Disclosure page.
Comments on this document are invited and should be sent to the public mailing list www-rdf-comments@w3.org. An archive of comments is available at http://lists.w3.org/Archives/Public/www-rdf-comments/.
The Resource Description Framework (RDF) is a data format for representing metadata about Web resources, and other information.
Align with other docs as before.
This document defines the abstract graph syntax on which RDF is based, and which serves to link its XML serialization to its formal semantics. It also describes some other technical aspects of RDF that are not covered by separate normative documents in this series.
The normative documentation of RDF falls broadly into the following areas:
Suggest:
This document defines the abstract syntax on which RDF is based, and which serves to link its various concrete syntaxes to its formal semantics. This document discusses RDF's design goals, the meaning of RDF documents, its key concepts and various technical aspects such as character normalization and handling of URI references.
[[[NOTE: it is anticipated that some of the material in this document may be moved to other documents as part of the document review process.]]]
In section 2, some background to the design goals and rationale of RDF is presented. There is also some discussion of the intended implications of publishing an RDF document (section 2.3).
RDF is based on a graph syntax, which is typically serialized using XML (but which is quite distinct from XML's tree-based infoset [XML-INFOSET]). This graph syntax captures the fundamental structure of RDF, independently of any serialization syntax that may be used. The formal semantics of RDF are defined in terms of the graph syntax. The graph syntax is defined in section 3 of this document.
Suggest: RDF's abstract syntax is a graph which is typically serialized using XML (but which is quite distinct from XML's tree-based infoset [XML-INFOSET]). The abstract syntax captures the fundamental structure of RDF, independently of any concrete syntax used for serialization. The formal semantics of RDF are defined in terms of the abstract syntax. The abstract syntax is defined in section 4 of this document.
Note section number.
Section 4 presents some other technical issues that don't clearly fall into any of the more explicit areas noted above.
Suggest: Section 5 presents some other technical issues.
Note section number.
RDF uses well established ideas from various data and knowledge representation communities, with recognizable relationships to Conceptual Graphs, logic-based knowedge representation, frames, and relational databases [Sowa] [CG] [KIF] [Hayes] [Luger] [Gray].
Suggest delete para and references. I don't recognise such relationships. This feels a bit like name dropping, quoting references to august publications to gain gloss by association.
RDF is a member of the family of languages that use XML, which in turn provides a syntactic framework for representing documents and other information. It has a simple graph-based data model and formal semantics with a rigorously defined notion of entailment, which in turn provides a basis for well founded deductions in RDF data.
Suggest delete para. There is not much new information here. We've already said there is an xml serialization. We don't need to say what xml is for. We've already said it has a graph based syntax. The term 'data model' is not defined and is rather woolly. All thats left is the formal semantics and we can leave that to the model theory.
The real value of RDF comes not so much from any single application, but from the possibilities for sharing data between applications. The value of information thus increases as it becomes accessible to more and more applications across the entire Internet.
Suggest delete para. Its a bit lonely without the others and isn't saying much.
The development of RDF has been motivated by the following uses, among others:
Suggest 'models' rather than 'formats'
The design of RDF is intended to meet the following goals:
RDF has a simple data model that is easy for applications to process and manipulate. The data model is independent of any specific serialization syntax.
NOTE: the term "model" used here in "data model" has a completely different sense to its use in the term "model theory". See the RDF model theory specification [RDF-SEMANTICS] or a textbook on logical semantics (e.g., [HUNTER] [DAVIS]) for more information about what logicians call "model theory".
RDF has a formal semantics which provides a sound basis for reasoning about the meaning of an RDF expression. In particular, it supports rigorously defined notions of entailment which provide a basis for defining reliable rules of inference in RDF data.
The vocabulary is fully extensible, being based on URIs with optional fragment identifiers (URI references, or URIrefs). URIrefs are used for naming all kinds of things in RDF data. The only other kind of label that appears in RDF data is a literal string.
[[[Review this on resolution of datatypes issues]]]
RDF has a recommended XML serialization form [RDF-SYNTAX], which can be used to encode the data model for exchange of information between applications.
RDF can be used with XML schema datatypes [XML-SCHEMA2], thus assisting the exchange of information between RDF and other XML applications.
[[[Review this on resolution of datatypes issues]]]
To facilitate operation at Internet scale, RDF is an open-world framework that allows anyone to make simple assertions about anything. In general, it is not assumed that all information about any topic is available. A consequence of this is that RDF cannot prevent anyone from making assertions that are nonsensical or inconsistent with the world as people see it, and applications that build upon RDF must find ways to deal with incomplete and conflicting sources of information. (This is where RDF departs from the XML approach to data representation, which is generally quite prescriptive and aims to present an application with information that is well-formed and complete for the application's needs.)
But what consitutes a "simple fact"? Roughly, the kind of information that can be stored in a relational database, possibly about any nameable thing or concept.
The basic building block of RDF is a statement, which is a binary relational assertion. For example, the expression "floats(oil,water)" is a binary relational assertion expressing that oil floats on water. The term "floats" names a relationship that holds between "oil" and "water". An RDF statement can also contain a variable; e.g., as in "floats(?x,water)" expressing that there is something that floats on water, where "?x" stands for the something, without saying what it is.
Conjunction (logical-AND) of statements can be used to express more complex facts, such as "floats(oil,water) AND burns(oil,air)". Using the same variable in several different statements of a conjunction can say more than one might immediately expect, e.g. "type(?x,fluid) AND floats(?x,water)" says there is a fluid that floats on water.
Relationships involving more than two things can be expressed as a conjunction of binary relations, so "boilsAt(water,100C,1atm)" could be expressed as the existence of a boiling event, say "?b", such that "boils(?b,water) AND temp(?b,100C) AND press(?b,1atm)".
The expressive power of RDF corresponds to the existential-conjunctive (EC) subset of first order logic [Sowa]. It does not provide means to express negation (NOT) or disjunction (OR). RDF is unusual, for a first order logic subset, in that it allows statements to be made about the relation terms themselves, e.g. "type(floats,physical-relationship) and floats(oil,water)". This kind of expression is more commonly associated with higher order logics, but the use allowed by RDF has first-order semantics.
Through its use of extensible URI-based vocabularies, RDF aims to provide for universal expression of facts about arbitrary subjects; i.e. assertions of named properties about specific named things. A URI can be constructed for any thing that can be named, so RDF facts can be about any such things. And, as noted above, RDF also provides for expression of assertions about unnamed things, which may be fully identifiable in terms of such assertions [TAP-RBD].
RDF itself does not provide the machinery of inference, but provides the raw data upon which such machinery can operate. The Web Ontology Language [OWL] allows more expressive expressions to be built on the basic capabilities of the RDF core language.
This whole section seems inappropriate. This section is supposed to be about design goals. The design goal is to have a simple expressive model for representing information.
Spelling: 'agreements'
RDF is intended to convey assertions that are meaningful to the extent that they may, in appropriate contexts, be used to express the terms of binding agreements.
This goal is explored further in section 2.3 below.
The RDF specification emphasizes the formal structure and meaning of RDF. But there is also a social dimension that is easily overlooked when dealing with such formal aspects.
RDF is a language designed to support the Semantic Web, in much the same way that HTML is the language that supports the original Web. The Semantic Web aims for data to be shared and processed by automated tools as well as by people. To serve this purpose, formal meanings of RDF statements must be defined in a precise manner; this is provided by the RDF Model Theory [RDF-SEMANTICS].
Model-theoretic semantics assumes that the expressions of a language refer to a world but may not describe it completely. Asserting a sentence in a language, on this view, imposes a constraint on the possible ways the world can be: it has at least to make the sentence true. As more knowledge is added - more assertions are made - the collection of possible worlds is reduced. The basic relationship between language and reality is that the more sentences are asserted, the more tightly reality is circumscribed, if none of those assertions are to come out false. The main purpose of the semantics is to make this intuition precise by giving a simple mathematical description of what counts as a 'possible way the world can be' - often simply called a 'possible world' - and of how such a possible world determines the truth-values of sentences. This combination of a world and a mapping between the language and the world is called an interpretation, so model theory might be better called 'interpretation theory'.
The chief utility of such a semantic theory is not to suggest any particular processing model, or to provide any deep analysis of the nature of the things being described by the language (in our case, the nature of resources), but rather to provide a technical tool to analyze the semantic properties of proposed operations on the language; in particular, to provide a way to determine when they preserve meaning.
The RDF model theory treats RDF as a simple assertional language, in which each triple makes a distinct assertion, and the meaning of any triple is not changed by adding other triples. Based on the semantics defined in the model theory, it is simple to translate an RDF graph into a logical expression with essentially the same meaning.
The RDF model theory is couched in the language of set theory simply because that is the normal language of mathematics - for example, the model theory assumes that names denote things in a set IR called the 'universe' - but the use of set- theoretic language is not supposed to imply that the things in the universe are set-theoretic in nature.
We don't need all this stuff about model theory here. This duplicates what is in the model theory document. Suggest delete most of this section and leave just enough to contrast with social meaning which is the main point we have to make, e.g.
There are two aspects to the meaning of an RDF document. There is the formal meaning of the document as determined by the RDF model theory [ref]. This determines, with mathematical precision, all the conclusions that can be legitimately drawn from an RDF document. There is also the social meaning of the document. It is the social meaning that affects what the document means to people and how it interacts with human social institutions such as our systems of law.
Also the above uses the term' triple' which has not been introduced yet. We need some introduction to RDF concepts before we can start talking about their meaning, formal or otherwise. What we have is currently buried in a section on goals. Need a section on RDF concepts. Ah - I see it comes later. Maybe some reordering will fix this.
RDF/XML documents, i.e. encodings of RDF graphs, can be used to make representations of claims or assertions about the 'real' world. But not every RDF/XML document does so.
What is the word 'representations' doing here. Suggest:
RDF/XML documents can be used to make assertions about the 'real' world, but not all RDF/XML documents do so. Consider a document published on the web. It is making assertions. However, the same document, sent in an attachment to an email message in which the author of the message made it clear that he did not believe the contents of the attachment, is not making any assertions.
While the formal semantics of an RDF statement (triple) is that of a distinct assertion, individual RDF statements may convey meaning that is partly determined by the circumstances in which they are used. For example, in English, a statement "I don't believe that George is a clown" contains the words "George is a clown", which, considered in isolation, have the form of an assertion that George exhibits certain comic qualities. However, considering the whole sentence, no such assertion is considered to be made.
Similarly, a collection of RDF statements having an assertional form could be presented in a context that they are not understood to be stating a truth. Thus, there is a distinction between RDF expressions that are asserted, and those that are not.
When an RDF graph is asserted in the web, its publisher is saying something about their view of the world. Such an assertion should be understood to carry the same social import and responsibilities as an assertion in any other format. A combination of social (e.g. legal) and technical machinery (protocols, file formats, publication frameworks) provide the contexts that fix the intended meanings of the vocabulary of some piece of RDF, and which distinguish assertions from other uses (e.g. citations, denials or illustrations).
The technical machinery includes protocols for transferring information (e.g. HTTP, SMTP) and file formats for encapsulating and labelling information (e.g. MIME, XML). A media type, application/rdf+xml [RDF-MIME-TYPE] is being registered for indicating the use of RDF/XML as distinct from some other XML that happens to look like RDF. Issuing an HTTP GET request and obtaining data with a "200 OK" response code is a technical indication that the received data was published at the request URI; but data received with a "404 Not found" response cannot be considered to be similarly published information.
The social machinery includes the form of publication: publishing some unqualified statements on one's World Wide Web home page would generally be taken as an assertion of those statements. But publishing the same statements with a qualification, such as "here are some common myths", or as part of a rebuttal, would likely not be construed as an assertion of the truth of those statements. Similar considerations apply to the publication of assertions expressed in RDF.
Noting that there is no single human opinion about the truth of some statements, the graph may further contain commentary for human interpreters to indicate the realm of human interpretation that should be applied. This means a graph may contain "defining information" that is opaque to logical reasoners. This information may be used by human interpreters of RDF information, or programmers writing software to perform specialized forms of deduction in the Semantic Web.
When a user invokes an application that uses RDF, there is also a social and technical context of invocation that determines some set of RDF assertions that will be assumed to be true: the application itself, and any RDF files that are passed to it. Garbage-in, garbage-out applies: if the initial assumed facts are wrong or meaningless, the results will have little value. No specific mechanisms for deciding or evaluating the validity of any such assertions are defined here.
Note: I suggest the text of this section could be shorter and clearer. I realise that's not very helpful, but I don't have time to compose alternate text now. This is just a placeholder to come back to it.
Using RDF, 'received meaning' can be characterized as the social meaning of any logical consequences. If you publish a graph G and G logically entails G', and we interpret G' using the same social conventions that everyone agrees could be reasonably used to interpret G, then you are asserting that content of G' as well.
Human publishers of RDF content commit themselves to the mechanically-inferred social obligations. The machines doing the inferences aren't expected to know about all these social conventions and obligations.
Imagine three websites each publishing some RDF:
(A) http://insult.com/lexicon# asserts the following, and this is all that one can find on the website about that term: |
||
A:Clown | rdf:type | rdfs:Class . |
A:Clown | rdfs:Comment | "A foolish person, whose pronouncements are probably ill-considered and not to be taken seriously" . |
(B) http://AngloSaxon.org/lexicon# asserts: |
||
B:Comic | rdf:subClassOf | <http://insult.com/lexicon#Clown> . |
|
||
C:JohnSmith | rdf:type | <http://AngloSaxon.org/lexicon#Comic> . |
Now, it follows by the formal RDF model theory that these three together entail:
C:JohnSmith | rdf:type | <http://insult.com/lexicon#Clown> . |
which the person identified as C:JohnSmith might reasonably consider an insult. Why? Not because of the RDF model theory, which merely says he is in some class about which nothing can be formally inferred. However, the rdfs:comment associated with that class name by the owner of that name provides the insulting content, in the social context of web publication, even though it cannot be formally inferred via the RDF inference rules.
But who has insulted the identified person? A merely defined the term; B doesn't mention him in particular, so even A and B together do not constitute a personal insult. And C might argue that although he refers to the person, he only asserts that he is a comic, which is not in itself grounds for a libel suit. However, one could reasonably claim that C is to blame, since C uses not a generic term 'Comic', but a particular uriref which is defined by its owner (B) in a way which is clearly insulting, since B in turn explicitly refers to, and uses, the term defined by A. Thus, C's use of a B-defined term suggests a clear intent by C to convey a meaning defined by B, by virtue of a definition by A, which is insulting.
Note that this argument depends on another social convention of RDF, which is that URIs 'belong to' somebody who has authority and responsibility for defining their meanings. By using the specific name http://AngloSaxon.org/lexicon#Comic instead of some term defined in, say, a glossary of job descriptions, B has explicitly removed his use of the term 'Clown' from any formal connection with people who are entertainers. In order to succeed in his probable intent of making a generic slander against these people, B should have used a term that was defined by someone else, such as:
<http://www.entertainers.com/glossary#Comic> rdfs:subClassOf <http://insult.com/lexicon#Clown> . |
and then if C had also used this first uriref, then in spite of a similar formal inference chain generating the insulting conclusion about C:JohnSmith, there would be nobody to sue, since now C would indeed have simply made a harmless observation about his occupation, and B's assertion, while indeed arguably offensive, makes no reference to him in particular.
The point of this example is to emphasize that publication of RDF, when considered as a social act, constitutes a publication of some content which is defined by whatever normal social conditions are used by the publishers of any terms in the RDF to define the meanings of those terms, even if those meanings and definitions are not accessible to the formal semantics of RDF; and, moreover, those meanings are preserved under any formally sanctioned inference processes. In a nutshell, the formal entailments of social meanings are themselves part of the social meaning.
[[[NOTE: we're not sure if this final bit is helpful. Jeremy thinks it may thwart some intended OWL inferences. Is this getting too far into murky legalistic territory?]]]
Note that we cannot use a single notion of 'meaning' to say this properly, since of course the formal entailments cannot themselves utilize the social aspects of meaning which are included in informal aspects of the publication, such as the fact that the use of 'Clown' is insulting, which is only mentioned in a comment which is opaque to any likely RDF inference engine or machine processor. Social meanings can be, as it were, transferred or carried by formal entailments, but they cannot be incorporated into the formal entailments. To emphasize this, suppose that B had failed to use rdfs:subClassOf and instead had tried to use his own term:
B:Comic | B:oneOfThem | <http://insult.com/lexicon#Clown> . |
B:oneOfThem | rdfs:comment | "This means the same as rdfs:subClassOf" . |
then in spite of the clear social meaning of the comment, there would be no formal inference path from this, taken with the A and C publications, to anything that could be found insulting; so even if C had intended to bad-mouth the person C:JohnSmith, B's stupidity would have thwarted him.
Note: I'm worried about the volume of text here. if its at all possible this should be much shorter. This is a difficult area and the more we write the more we are likely to cause problems. Isn't it just enough to say that if you assert a graph, then you assert also all its entailments as defined by the model theory.
An RDF predicate is defined by an authoritative body implicit in its URI, and misuse by others should not be permited to undermine that authority.
I don't buy 'implicit in its URI'. RDF assumes that for any URI some individual or organisation has the authority to define the meaning of that URI. But that is not implicit in the URI. We have social mechanisms to determine who is responsible.
Further, information about the meaning of a statement is primarily dependent on the predicate used. The subject and object contribute to the meaning of a statement according to the definition of the predicate, but cannot change the predicate's fundamental meaning. For example, suppose some authority defines the predicate "floats(x,y)" to mean that x floats on y, so that "floats(oil,water)" means that oil floats on water. It is not legitimate for some other party to assert that "floats(water,wood)" means that wood floats on water because they think the original definition doesn't make sense when wood is used as the second parameter.
The RDF core language provides a way to make simple formal assertions, with no way to formally express allowable inferences beyond those entailments that are defined by the RDF formal semantics [RDF-SEMANTICS].
Many inferences are performed by processes, embedded in software implementations, whose validity is not formally demonstrable, and must be assumed or trusted to be socially acceptable. Semantic web languages layered on RDF, such as OWL [OWL], give formal expression to allowable inferences, thus to enable provable deductions by generic software modules to replace some of the individual, ad-hoc implementations.
I have no idea what this is trying to say. It seems to say that any inference done by software MUST be assumed to be valid or trusted to be socially acceptable!
RDF uses the following key concepts:
The underlying structure of any RDF expression where did this term come from. is a directed labelled graph (or multigraph), which consists of nodes and labelled directed arcs that link pairs of nodes (these notions are defined more formally in section 3). The formal semantics for RDF is defined in terms of this graph syntax. This is getting a bit repetitive. An RDF expression is sometimes sometimes? called an RDF graph. The graph can conveniently be represented as a set of triples, where each triple contains two node labels and an arc label: this brings up bnode labels. We don't want to go there. Suggest delete sentence.
Maybe 'arc' is better than 'property arc'?
Each arc corresponds to a statement that asserts a relationship between the nodes that it links. Well, they are not all asserted in all contexts. Suggest: Each arc represents a relationship between the nodes that it links. All the arcs labelled with the same predicate thus form a single dyadic relationship, and the RDF graph can be informally viewed as a set of graphs, one for each relation, overlaid on top of one another with the labels showing which relation each arc has been derived from (cf. [Sowa2]). Where did that come from and why do we need it? Lets not put unnecessary stuff in here. The more we write the more opportunity there is for confusion and error. The meaning of an RDF graph is the conjunction (i.e. logical AND) of all the statements that it contains.
Need to introduce the notion of subject, predicate and object in the text.
Nodes in an RDF graph are labelled with URIs with optional fragment identifiers (URI references, or URIrefs), literal strings, or nothing at all. Datatype values? Arcs are labelled with URIrefs. (See [URIS], section 4, for a description of URI reference forms, noting that relative URIs are not used in an RDF graph. See also section 3.1.)
Why the big bold fonts for the new terms. Is this your choice or W3C guidance? I would have thought <em>...<em> would be enough.
The label on a node indicates what that node is meant to represent. The label on an arc names the relationship that is asserted to hold between the nodes connected by that arc. Suggest: The label on a node identifies what that node represents. The label on an arc identifies the relationship between the nodes connected by the arc. Some URIrefs may indicate web resources, and a node thus labelled denotes that resource. Other URIrefs may represent abstract ideas or values rather than a retreivable Web resource. Implying that nodes so named don't denote those abstract ideas or values? RDF thus leverages the universal naming space of URIs [URIS]. Suggest delete last sentence. It adds nothing.
Literals are used to indicate values in an RDF graph by means of a lexical representation. They are typically used to represent things like numbers, strings and other common kinds of value. Anything that can be represented by a literal could also be represented by a URI, but it is often more convenient and/or intuitive to use literals.
Suggest: Literal labels are used to identify values such as numbers and dates.
Literals may appear in the object position of RDF statements, and nowhere else. Well, they can appear in this document! Suggest: A literal may label the object of an RDF statement but not the subject or the arc.
Literals may be typed or untyped. The datatype, What datatype? Nothing has been said about datatypes yet. if any, provides a means of interpreting the lexical form, and hence of finding the value denoted by the literal. Suggest: delete last sentence.
An untyped literal is a string combined with a (possibly empty) language identifier. This should be used for plain text in a natural language. As recommended in the RDF formal semantics [RDF-SEMANTICS], these untyped literals are self-denoting, i.e. their formal meaning is that they mean what they are.
A single datatype is prefined as part of this recommendation:
The notion of what a datatype is has not yet been introduced. Move this bit later.
rdfs:XMLLiteral
Other datatypes are provided by XML Schema datatypes [XML-SCHEMA2].
A typed literal additionally has an associated datatype URI that identifies a datatype.
Further information about literals may be found in section 4 below on graph syntax. Suggest delete: unnecessary.
Datatyping in RDF is the use of a datatype to associate a lexical form with a denoted value.
For example, the datatype xsd:integer might be used to associate the numeral string "10" with the integer value 10, or the datatype xsd:date might be used to associate the string "2002-09-30" with the 30th day of September in the year 2002 according to the Gregorian calendar.
No this isn't right. This is describing the means, when it should be describing the end. Suggest: Datatypes are used in RDF to represent values such as integers, floating point numbers and dates. RDF uses the datatype abstraction define by XML Schema Part 2: Datatypes [Ref]. A datatype consists of a lexical space, a value space and a datatype mapping.
Then move the para below further down and run into "A datatype mapping ..."
RDF only predefines one datatype rdfs:XMLLiteral
, used for
embedding XML in RDF. There is no built-in concept of numbers or dates or
other common values. Rather, RDF defers to datatypes that are defined
separately, and identified with URIs. In particular, the predefined XML
Schema datatypes [XML-SCHEMA2] are expected to
be widely used for this purpose.
A datatype mapping is a set of pairs whose first element belongs to the lexical space of the datatype, and the second element belongs to the value space of the datatype.
A datatype mapping satisfies the following properties:
The datatypes used in RDF have a lexical space consisiting of a
set of strings. Exceptionally, rdfs:XMLLiteral
also accesses a
langauge identifier, in its datatype mapping.
For example, the datatype mapping for the XML Schema datatype xsd:boolean, where each member of the value space (represented here as 'T' and 'F') has two lexical representations, is as follows:
Value Space {T, F} Lexical Space {"0", "1", "true", "false"} Datatype Mapping {<"true", T>, <"1", T>, <"0", F>, <"false", F>}
Datatypes are used in RDF for the explicit designation of the type of literals. No. They are used to represent datatype values. A typed literal contains a datatype URI and a member of the datatype's lexical space. This serves to unambiguously identify a member of the value space of the datatype.
In the abstract syntax below, literals
include a language identifier. For the exceptional datatype,
rdfs:XMLLiteral
, the lexical space is a set of pairs, involving
such language identifiers; and the value identified by the literal depends on
the langauge identifier. For all other datatypes the lexical space is a set
of strings, not involving such language identifiers. The value identified by
the literal is independent of the language identifier, which is simply
ignored.
NOTE: XML Schema Datatypes [XML-SCHEMA2] provide an extensibility mechanism suitable for defining new datatypes for use in RDF.
For example, the typed literals which can be defined for the XML Schema datatype xsd:boolean are as follows:
Typed Literal Datatype Mapping Value <xsd:boolean, "true"> <"true", T> T <xsd:boolean, "1"> <"1", T> T <xsd:boolean, "false"> <"false", F> F <xsd:boolean, "0"> <"0", F> F
RDF datatyping depends on externally defined datatypes, and does not of itself provide any mechanisms for defining new datatypes. The defining authority associated with a datatype URI is responsible for specifying the corresponding datatype mapping, and knowledge of this mapping is presumed No its not. An rdf processor with no knowledge of a datatype can do all but the datatype entailments specific to that datatype. It is wrong to suggest that knowlege of the datatype MUST be built in to applications. to be built-in to applications that process datatypes in RDF data. RDF applications that have no such knowledge can preserve information in the RDF graph by treating the literal structure as an opaque value.
Suggest replace para with: RDF provides no mechanism for defining new datatypes. The defining authority of a URI which identifies a datatype is responsible for specifying the datatype's lexical space, value space and datatype mapping.
RDF has a specific serialization syntax based on XML [RDF-SYNTAX].
Only the XML syntax is normatively specified and recommended for use to exchange information between Internet applications: other syntaxes for RDF graphs are possible, and may be widely used (e.g. [NOTATION3]), but are not covered by this recommendation.
RDF/XML is not covered by this recommendation either. This section isn't saying anything. Delete.
Can we leave this to the model theory? No, not if we are to talk about it in social meaning. However, we can make it simpler than this. Suggest a single para in the section on social meaning is sufficient.
The ideas on meaning and inference in RDF are underpinned by the concept of entailment. An RDF expression A is said to entail another RDF expression B if every possible arrangement of things in the world that makes A true also makes B true. On this basis, if we presume or demonstrate the truth of A then we can also infer the truth of B. This idea is discussed at greater length in section 2.3.1 on formal semantics, and in the RDF formal semantics document [RDF-SEMANTICS].
This idea of entailment sets RDF apart from many other network data formats. What obligations does an entailment place on a processor of RDF data? The answer is: none. RDF applications are not required to find all facts that can be inferred on the basis of allowed entailments. (Further, it may be not possible to find all such facts.) But useful applications MAY infer some such facts, and treat those facts as if they were part of the supplied input data.
The RDF test cases described by [RDF-TESTS] contain some entailment and non-entailment tests (positive entailment tests and negative entailment tests), covering both RDF-entailment and RDFS-entailment. A positive entailment test indicates that the indicated conclusion can be inferred from the given antecendents; RDF applications are allowed to perform such inferences, but not required to do so. A negative entailment test indicates that an RDF application is not entitled by the rules of RDF alone to infer the indicated conclusion from the corresponding antecendent. A non-entailment does not mean that the conclusion is necessarily false: it may be true for reasons unrelated to the antecedent facts.
So we have the situation that a positive entailment does not mean that an RDF application must infer the conclusion, and a negative entailment does not mean the conclusion is necessarily false. How are we to judge whether a given RDF application is truly playing by the rules of RDF? The answer lies in the existence of a proof. An application that validly infers a conclusion from some antecedent facts must do so in a series of steps that can be directly traced to allowable entailments, which series constitutes a proof. The allowable entailments and corresponding proof steps sanctioned by the RDF specification are set out in the RDF formal semantics document [RDF-SEMANTICS].
RDF uses URIs to label resources and properties. Certain URIs are reserved for use by RDF, and may not be used for any purpose not sanctioned the RDF specifications. Specifically, URIs with the following leading substrings are reserved for RDF core vocabulary:
Used with the RDF/XML serialization, these URI prefix strings correspond to XML namespaces [XML-NS] associated with the RDF core vocabulary terms.
NOTE: these namespace URIs are the same as those used in earlier RDF documents [RDF-MS] [RDF-SCHEMA].
[[[NOTE FOR REVIEWERS: Some terms in these namespaces have been deprecated, some have been added, and some RDF schema terms have had their meaning changed. We invite community feedback regarding the relative costs of adopting these changes under the old namespace URIs vs creating new URIs for this revision of RDF.]]]
Vocabulary terms in the rdf: namespace are listed in section 3.4 [[[check this]]] of the RDF syntax specification [RDF-SYNTAX].
Vocabulary terms defined in the rdfs: namespace are defined [[[where?]]] in the RDF schema vocabulary specification [RDF-VOCABULARY].
RDF provides for XML content as a possible literal value. This typically
originates from the use of rdf:parseType="Literal"
in the
RDF/XML Syntax [RDF-SYNTAX].
Such content is indicated in an RDF graph using a typed literal whose
datatype is a special builtin datatype, rdfs:XMLLiteral
.
As part of the definition of this datatype, we use an ancillary definition.
The XML document corresponding to a pair ( str, lang ) is formed as follows:
If lang is not the empty language identifier, then concatenate the five strings:
- "<rdf-wrapper xml:lang='"
- lang
- "'>"
- str
- "</rdf-wrapper>"
Encode the resulting Unicode string in UTF-8 to form the correspond XML document.
If lang is the empty language identifier, then enclose str with "<rdf-wrapper>" and "</rdf-wrapper>" and encode the resulting string in UTF-8 to form the correspond XML document.
Using this, the datatype rdfs:XMLLiteral
is defined as
follows.
http://www.w3.org/2000/01/rdf-schema#XMLLiteral
.<rdf-wrapper>
xml:lang
( string, lang )
where lang is any
language identifier [RFC-3066] in lower
case, and string
is well-balanced, self-contained XML
element content [XML], for which the XML
document corresponding to the pair is a well-formed
XML document [XML] that also conforms to XML
Namespaces [XML-NS].REMINDER: All other datatypes have a lexical space being a set of strings, and a mapping which maps strings to values.
NOTE: If compatibility with XML version 1.1 is desired, then the value space of this datatype must be restricted to those that are fully normalized according to [XML 1.1]; with a corresponding restriction on the lexical space.
Is there more than one abstract syntax for RDF? This defines the abstract syntax which happens, currently, to be a graph. Suggest section title: Abstract Syntax.
[[[This section, particularly how nodes and node labels are handled, is not completely in sync with the current Model Theory WD -- coordinate with MT editors, and review after next MT WD]]]
This section defines the RDF graph abstract syntax. The RDF graph is sometimes referred to as the (data) model of RDF (see the RDF Primer [RDF-PRIMER], and RDF Model & Syntax [RDF-MS]). In brief, the RDF graph is a directed graph with labelled edges and partially labelled nodes.
A goal of this section is the precise definition of equality between RDF graphs. This benefits interoperability (two conformant implementations are more likely to be practically interoperable if they have a precise conception of the way in which they are the same). It is required for the specification of the RDF Test Cases [RDF-TESTS], which depend on testing equality of RDF graphs for their execution. It is required by the RDF Model Theory [RDF-SEMANTICS] which assigns the same meaning to any pair of equal RDF graphs.
NOTE: Many RDF applications and frameworks do not need to implement RDF graph equality. They do need to respect equality when assigning meaning to RDF graphs.
The specification of the RDF graph commences with the labels used in the graph, which can be URI references or literals; equality is defined for each. It then proceeds to describing arcs (triples), a complete graph and graph equality.
[[[This text should be reviewed in light of the IRI section in the namespaces 1.1 last call WD and comments made on it; I had one attempt but it failed.]]]
A URI Reference Label within an RDF graph (an RDF URI reference) is a Unicode string [UNICODE] that:
The disallowed characters that must be %-escaped include all non-ASCII characters, the excluded characters listed in Section 2.4 of [URIS], except for the number sign (#) and percent sign (%) characters and the square bracket characters re-allowed in [RFC-2732].
Disallowed characters must be escaped as follows:
%
HH, where HH is the hexadecimal notation of the
byte value).Two RDF URI references are equal if and only if they compare as equal, character by character, as Unicode strings. A URI reference label is not equal to a string literal label or an XML literal label.
Note: RDF URI references are compatible with the anyURI datatype as defined by XML schema datatypes [XML-SCHEMA2], constrained to be an absolute rather than a relative URI reference, and constrained to be in Unicode Normal Form C [NFC] (for compatibility with [CHARMOD]).
Note: RDF URI references are compatible with International Resource Identifiers as defined by [XML Namespaces 1.1].
Note: The restriction to absolute URI references is found in this abstract syntax. Concrete syntaxes, such as RDF/XML, may permit relative URIs (when there is a well-defined base URI) as a shorthand for such absolute URI references.
See the following test cases, per [RDF-TESTS]:
A literal in an RDF graph is a triple:
An untyped literal is one in which the datatype URI is absent.
A typed literal is one in which the datatype URI is present.
Note: Literals in which the lexical form begins with a composing character (as defined by [CHARMOD]) are allowed however they may cause interoperability problems, particularly with XML version 1.1 [XML 1.1].
Note: When using the language identifier, care must be taken not to confuse language with locale. The language identifier only relates to human language text. Presentational issues, as to how to best represent typed data to the end-user, should be addressed in end-user applications.
Two literals are equal if and only if all of the following hold:
The datatype URI refers to a datatype.
For XML Schema
built-in datatypes, URIs such as
<http://www.w3.org/2001/XMLSchema#int>
are used. The URI
of the datatype rdfs:XMLLiteral
may be used. There may be other, implementation dependent, mechanisms by
which URIs refer to datatypes.
The value associated with a typed literal is found by applying
the datatype mapping associated with the datatype URI to the lexical form.
This mapping fails if the lexical form is not in the lexical space of the
datatype associated with the datatype URI. Exceptionally, if the datatype is
rdfs:XMLLiteral
then the datatype mapping is applied to the pair form by the lexical form and
the langauge identifier.
In this abstract syntax the failure of any such mapping is regarded as a semantic rather than a syntactic problem; and this subsection does not create any well-formedness conditions on either the datatype URI or the lexical form.
[[Should this subsection be deleted. It is redundant.]]
See the following test cases, per [RDF-TESTS]:
rdfs:XMLLiteral
An RDF graph is defined using a set of nodes. Each node may be blank (unlabelled) or labelled with an RDF literal or an RDF URI reference, i.e. there is a partial labelling function from the set of nodes to the union of the set of RDF literals and RDF URI references.
In an RDF graph, no two nodes have equal labels; there may be any number of distinct blank nodes.
Two nodes are equal if and only if they are the same node. In particular, two different blank nodes are not equal.
An RDF triple describes an arc in an RDF graph. It contains three components:
If the subject and object nodes of a triple are both labelled with equal labels then they are the same node.
The subject must not be labelled with an RDF literal.
Two RDF triples are equal if and only if their subjects are equal, their predicates are equal, and their objects are equal.
An RDF graph is a set of RDF triples.
The set of nodes of an RDF graph is the set of nodes that are either subject or object of some triple in the graph. No two nodes have equal labels.
[[[Suggestions of a standard graph theory text which treats digraphs as primary would be welcome.]]]
Note: The definition of an RDF graph diverges from the definition of a directed graph in a standard text such as [[[missing ref]]] in that: (a) all nodes must be in at least one arc; (b) all the arcs are labelled; (c) some of the nodes are labelled; (d) labels on nodes are required to be distinct; (e) some labels are shared between nodes and arcs; (f) there can be multiple arcs between the same end-points, as long as they are labelled differently.
Two RDF graphs are equal if and only if they are isomorphic. An RDF graph isomorphism is a directed graph isomorphism that respects the labels on both arcs and nodes.
An RDF Graph isomorphism I between two graphs G and G' is a bijection between the nodes of G and the nodes of G', such that:
for all nodes n, s, o in G and all RDF URI references p.
[[[This subsection normatively depends on CHARMOD, currently a last call working draft. If CHARMOD has not reached the appropriate recommendation status as this document progresses down the recommendation track, this section will be deleted.]]]
[[[WG-REVIEW: Moreover, the CHARMOD editors seem minded to respond to (multiple) last call comments about their dependency on IRI-draft by waiting for IRI to move along the standards track. Thus, it is likely that this subsection will go.]]]
For the processing of character data that can be represented in different ways, RDF processors are required to conform to Early Uniform Normalization, as described by Character Model for the World Wide Web 1.0 [CHARMOD].
How should RDF treat a URI reference with a fragment identifier? Conventional web architecture has that the meaning of a fragment identifier is dependent on the MIME type of a resource that is obtained by dereferencing the URI part. URIs without fragment identifiers are generally presumed to map to some resource for which a Web representation (or several) can be retrieved. But RDF has no concept of a fragment identifier separate from a URI: RDF treats a URI reference as an opaque identifier that denotes some resource [RDF-SEMANTICS]. Further, an RDF resource identifier may denote something that is not web-retrievable; e.g. a car, or a Unicorn.
These apparently conflicting interpretations can be reconciled if:
This provides a handling of URI referencess and their denotation that is consistent with the RDF model theory and usage, and also with conventional web behaviour. This approach somewhat extends the idea of a "fragment" or "view" beyond the common idea (when handling web documents) that it is a physical part of a containing document.
In view of this, it is reasonable to consider that URIs without fragment identifiers are most helpfully used for indicating web-retrievable resources (when used in RDF), and URIs with fragment identifiers are used for abstract ideas that don't have a direct web representation. This is not a hard-and-fast distinction, as the line between resources having or not having a web-retrievable representation is sometimes hard to draw precisely.
The RDF/XML syntax uses QName syntax [XML-NS], section 3, to identify various resources, notably RDF properties. But the RDF graph syntax contains only URI references, and does not recognize QName forms.
Mostly, QNames are handled by the mapping between RDF/XML documents and RDF graph syntax. But there are some occasions where an RDF writer needs to know the correspondence between QNames and URI references (e.g. when using a typed node production). The mapping is described in [RDF-SYNTAX], sections 3.1.2 or 3.1.4.
This document contains a significant contribution from Pat Hayes, Sergey Melnik and Patrick Stickler, under whose leadership was developed the framework described in the RDF family of specifications for representing datatyped values, such as integers and dates.
The editors acknowledge valuable contributions from the following:
Jeremy Carroll thanks Oreste Signore, his host at the W3C Office in Italy and Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo", part of the Consiglio Nazionale delle Ricerche, where Jeremy is a visiting researcher.
This document is a product of extended deliberations by the RDFcore working group, whose members have included:
This specification also draws upon an earlier RDF Model and Syntax document edited by Ora Lassilla and Ralph Swick, and RDF Schema edited by Dan Brickley and R. V. Guha. RDF and RDF Schema Working group members who contributed to this earlier work are:
[[[For reviewers' reference. This appendix will be removed on final publication.]]]
$Log: rdf-concepts.html,v $ Revision 1.16 2002/10/25 09:46:12 graham Jeremy's updates to abstract syntax and datatyped literals. - *all* literals are typed - two new predefined types for what were previously untyped literals. - literal is a pair (datatypeURI, lexical form) - lexical form is also a pair (string, language-identifier) - datatype mappings (see section 2.4.4) can be from string=>value or from lexical form=>value Some reformatting of the HTML. Revision 1.15 2002/10/22 17:44:29 graham Fix up editing errors. Relinquish lock to Jeremy. Revision 1.14 2002/10/22 17:10:18 graham Clean up some text. Relinquish lock to Jeremy. Revision 1.13 2002/10/22 16:11:25 graham Sect. 2.3.2 fix typos Sect. 2.3.2 clarify distinction between assertional form and making an assertion Sect. 2.4.6 new section about entailment Revision 1.12 2002/10/17 17:19:32 graham Minor editorial changes. Added link and email for GK. Revision 1.11 2002/10/17 16:22:29 graham Jeremy's changes - lock relinquished: - name to include middle initial, e-mail and links - added some of Frank's text to 2.4.1, with new ref to Sowa2 - updated 3.1 URI Ref to be very like XML Namespaces 1.1 section 7 IRI - updated 3.2 using majority text with little bits from minority text - added acknowledgement to host in Pisa - added Sowa2 ref Revision 1.10 2002/10/14 14:00:32 graham Minor updates; hand lock to Jeremy Revision 1.9 2002/10/10 14:55:15 graham Folded in co-editor review comments, added reference to OWL. Revision 1.8 2002/10/07 15:16:59 graham Add 'latest' version of RDF concepts Revision 1.7 2002/10/02 11:39:44 graham Incorporate revised wording on formal semantics into section 2.3.1 Revision 1.6 2002/09/30 17:22:17 graham Fold in some further comments from Pat Hayes Revision 1.5 2002/09/30 12:11:05 graham Remove some superfluous comments and reference Revision 1.4 2002/09/30 11:58:40 graham Update document links following move to new directory Revision 1.3 2002/09/30 11:38:20 graham Incorporated material from datatyping draft, per issue 010-DatatypingConcepts Revision 1.2 2002/09/30 10:09:07 graham Add new sections for literals and datatyping Address issue 008-InteractionUnclear Address issue 011-DatatypingAcknowledgement Address issue 012-AssertionConflictingUse Address remaining non-syntax items from issue 013-Various Regenerate table of contents Revision 1.1 2002/09/30 09:32:01 graham Update document and move to RDF-Concepts directory Revision 1.9 2002/09/28 17:30:42 graham Rework section 2.3 to take account of comments from Pat Hayes and Tim Berners-Lee, and incorporating a sanitized version of Pat's example of formal entailment of social meaning. Revision 1.8 2002/09/26 16:55:46 graham Fix up previous document link Revision 1.7 2002/09/26 16:30:14 graham Apply edits for issue 007-Meaning-machinery Revision 1.6 2002/09/26 13:32:26 graham Apply edits for issue 003-ModelTheory.html Revision 1.5 2002/09/26 13:15:15 graham Apply edits for issue 002-InconsistentAssertions Revision 1.4 2002/09/26 12:36:27 graham Apply edits for issue 001-Editorial Revision 1.3 2002/09/26 12:13:02 graham Update previous version links Revision 1.2 2002/09/26 12:01:36 graham Re-import published WD as editors' working copy ---
Two alternative versions which could go in this subsection: http://www.w3.org/TR/rdf-syntax-grammar/#parseTypeLiteralPropertyElt or perhaps http://www.w3.org/TR/rdf-syntax-grammar/#literal.
The string value associated with the literal text l is
implementation dependent. The literal node produced from the string value
associated with this literal text is related by the rdfs:XMLLiteral
datatype to a
value, which is a canonical XML document.
[[[This next para is a mouthful...]]]
A constraint placed on the string value, is that the exclusive canonicalization without comments [XC14N] of the literal text l (i.e. the entire element content of the enclosing element) is equal to the exclusive canonicalization without comments of the root element content of this related canonical XML document.
This condition leaves the treatment of comments and namespaces that are not visibly utilized (as defined by [XC14N]), as well as the treatment of aspects of XML that are not reflected in canonical XML, as implementation dependent.
The RDF Test Cases [RDF-TESTS] use the exclusive canonicalization without comments to compute the string value of the literal.
The string value associated with the literal text l is
implementation dependent. The literal node produced from the string value
associated with this literal text is related by the rdfs:XMLLiteral
datatype to a
value, which is a canonical XML document.
The string value must preserve the information found in the exclusive canonicalization without comments [XC14N] of the literal text l.
The string value may be the exclusive canonicalization with or without comments [XC14N] of the literal text l.
The treatment of namespaces that are not visibly utilized (as defined by [XC14N]), XML comments, and of aspects of an XML document that are not reflected in the canonical form (e.g. insignificant white space within element tags) is explicitly implementation dependent.
The RDF Test Cases [RDF-TESTS] use the exclusive canonicalization without comments, discarding namespaces that are not visibly utilized, in order to compute the string value of the literal.
Markup issues to be considered: <code> vs <p class=code>; note format; spurious <div class="head"> elements ; <blockquote> for indented text.