reification "test questions": first crack

I had undertaken to produce some "test cases" to try to clarify some of
the issues revolving around RDF reification (including places where
related concepts in RDF might be unclear), based on earlier discussions
with Dan Brickley and others.  In the meantime, the notion of "test
cases" has undergone a bit of an evolution from what we were originally
talking about.  It now means an explict set of RDF triples, together
with expected output.  This "evolved" definition more fits an intuitive
definition of "test cases" than what we were talking about, so I'll call
what I'm doing here "test questions" for now (with actual "test cases"
possibly to follow).  

This isn't as clean, thoroughly-thought-out, or complete as I'd like (I
got hauled off to do some other stuff in the meantime), but I thought
I'd get my initial ideas out to give the WG a whack at them.  In some
cases, I think the answers to the questions might need some serious
thinking.  In other cases, the answers may be straightforward, but we
may need to do some serious wordsmithing to clearly express that
intention in the M&S.   

Following each question (or collection of questions) is some
discussion.  The first question is Dan's original one ("RQ" stands
(here) for "reification question").

--Frank


RQ1:  Are members of the class rdf:Statement uniquely picked out by
their predicate/subject/object properties?

This seems fairly reasonable, since the Formal Model (M&S Section 5)
says that each element of Statements is a predicate/subject/object
triple, and there isn't anything else to identify the members.  On the
other hand, it's reasonable that the same predicate/subject/object
triple will appear in different "places" (e.g., several people record
the same metadata about a given Web resource).  If we consider URIs as
identifying *appearances" of these triples, we can imagine multiple URIs
identifying the same triple (distinguishing the multiple appearances, or
having different identities from the perspective of different
*identifying* authorities), not unlike the idea that multiple URIs might
identify the same real world thing (like a person).  Or do we consider
the predicate/subject/object triple as being the URI, and these other
URIs as identifying the appearances (or something else)?  

RQ2:  M&S section 5 says that there is a set called Statements (whose
elements are triples). What is the intended scope of this set?  That is,
is this intended to be a conceptual extension (for language
specification purposes only) of class Statements that includes all RDF
statements anywhere?  Is it intended to be possible to have subsets of
this set representing specific collections of RDF statements (e.g., a
collection of statements made to describe a given resource)?  

a. Section 5 also says "We can view *a* set of statements as a directed
labeled graph...", which seems to suggest that multiple sets of
statements are possible. On the other hand, we (equivalently) can ask
the question "how many graphs are there?  One (corresponding to all
statements in Statements)?  Many (which again suggests there are subsets
of Statements)?  Note that M&S also says "A statement and its
corresponding reified statement exist independently in *an* [not *the*]
RDF graph and either may be present without the other."  [Note that,
while it may be really obvious that there are going to be subsets of
Statements, the M&S doesn't explicitly talk about that very clearly. 
One thing that the M&S, or some related document, could use is some more
thoroughly-developed Use Cases that go beyond the current examples to
show how various collections of the kinds of descriptions used in the
examples are represented in the Web, are accessed when needed, are
reified and unreified if necessary, etc.]  

b.  "set" is not an RDF-defined collection;  "bag" is the closest. So we
cannot describe the formal model in RDF(?)  
c.  If "set" is taken literally, and "the class rdf:Statement" is taken
to refer to a single set of all RDF statements anywhere, it seems that
the answer to RQ1 must be "yes", because there is no other way to
uniquely identify the triples.   

---------------

RQ3:  M&S Section 4.1 says "If, instead, we write the sentence 'Ralph
Swick says that Ora Lassila is the creator of the resource
http://www.w3.org/Home/Lassila' we have said nothing about the resource
http://www.w3.org/Home/Lassila;  instead, we have expressed a fact about
a statement Ralph has made."   If we use reification to write the RDF
for the Ralph Swick example, have we in fact "expressed a fact about a
statement Ralph has made"?  Alternatively, what is the thing that we
have expressed a fact about?  Alteratively (again), in what sense is the
reified statement really a statement?

a.  M&S says "A statement and its corresponding reified statement 
exist independently in an RDF graph and either may be present without
the other.  The RDF graph is said to contain the fact given in the
statement if and only if the statement is present in the graph,
irrespective of whether the corresponding reified statement is present."
This suggests that the statement Ralph purportedly made may not be
there;  only its reification is.  

If we say Ralph Swick says X and X is only present in reified form (not
present as a fact) what is X?  How do we know what Ralph said?  Note
that we do not discuss conversion back and forth between reified and
unreified statements.  E.g., I might want to collect all the things
Ralph Swick said, convert them to statements, and determine if they were
consistent.

b.  The formal model says "facts (that is, statements) are triples that
are members of Statements".  This suggests that the thing Ralph said
*isn't* a statement (otherwise it would be a fact), so how can we say
we're expressing a fact about a *statement* Ralph made?

c.  The intended semantics seem to be something like "there exists a
statement (that I now create) that I want to attribute to Ralph Swick." 
This is consistent with the idea that both the statement and its
reification are in Statements.  However, if only the reification is in
Statements, in what sense is the original statement (the one I want to
attribute to Ralph Swick) a statement (since it's not in Statements).  

------------
RQ4:  What is the significance of an RDF graph "containing a fact"?  Is
someone asserting that something is true?  Assuming that there are
multiple graphs, what is the significance of apparently contradictory
"facts" in multiple graphs?  [We don't really say anything about this
stuff.  A model theory for RDF would help deal with this.]

------------

RQ5:  M&S section 4.1 says "Reification is also needed to represent
explicitly in the model the statement grouping implied by Description
elements."  Why (or under what circumstances) is it necessary to
explicitly represent this grouping?  And should this idea be extended to
other groupings of statements (e.g., a group of statements made to
describe a single resource, or intended to be consistent with respect to
some model)?  That is, is it always necessary to reify groups of
statements in order to indicate they constitute a group, or only
sometimes?  If the latter, which times?  Why?

The Description element is introduced in the RDF syntax as a shorthand
to allow multiple statements to be made about the same resource without
repeating the resource identifier.  E.g., the example 

<rdf:RDF>
    <rdf:Description about="http://www.w3.org/Home/Lassila">
      <s:Creator>Ora Lassila</s:Creator>
      <s:Title>Ora's Home Page</s:Title>
    </rdf:Description>
  </rdf:RDF>

results in two triples being generated.  However, if a bagID is
specified, the example

<rdf:RDF>
    <rdf:Description about="http://www.w3.org/Home/Lassila"
bagID="D_001">
      <s:Creator>Ora Lassila</s:Creator>
      <s:Title>Ora's Home Page</s:Title>
    </rdf:Description>
  </rdf:RDF> 

results in 13 triples being generated. 

(NB: there is an issue relating to the generation of these bags already
identified).

a.  One explanation for this is that this is intended to suggest a way
of recording syntactic context (in this case, that several statements
come from the same Description element) in RDF.  That is, you generate a
resource representing the context (a bag representing the Description
element in this case), reify each of the contained statements, and add
all the resulting triples (including the triples representing the
original statements) to Statements.  Presumably this approach could be
extended to other types of syntactic contexts as well (all the RDF
statements on a given Web page, for example).  However, this suggests
the need for some principle for specifying when to include just the
reifications, and when to include the original statements as well.
(Also, this seems an extreme way of representing contextual information,
since the number of statements balloons enormously).

b. As noted above, while RDF defines what the reified model of a triple
is, it at present contains no explicit mechanism (or operator) for
moving between a triple and its reification (in either direction).

c.  The PICS example (section 7.6) uses BagID, while the Dublin Core
example (section 7.4) doesn't.  Suppose we decide we want to attribute
the specified collection of Dublin Core statements to some individual. 
Must we reify the whole collection?  (Note that if the collection of
statements is a separate resource, it has a URI that could be used
without the need to reify them).  If not, why not (and why can't this
reason apply to other collections)?  This clearly relates to the
question below.

-----------------------
RQ6:  M&S Section 4.1 says "Statements are made about resources.  A
model of a statement is the resource we need in order to be able to make
new statements (higher-order statements) about the modeled statement." 
Is this "model of a statement" really needed in order to make statements
about statements?  

a.  One of the points noted in the rdf-logic discussions is that, in
logic, "higher-order statements" don't mean "statements about
statements". 

b.  The line of reasoning here is presumably that, if statements can
only be made about resources, the only way to make statements about
statements is to make the latter statements resources.  This means they
must have URIs.  However, why is this particular model needed in order
for URIs to be assigned to statements?  Moreover, why does the M&S have
to specify *any* mechanism for assigning URIs to statements?  RDF is
specified independently of how any other resources (which may be of
arbitrary complexity) are assigned URIs. Moreover, RDF statements might
be represented in many different concrete formats, each of which has a
particularly-suitable way of assigning URIs.  [There is, in fact, some
intuitive reason why there ought to be some way of modeling statements
about statements, e.g., attribution.  Conceptual graphs, for example,
has such a mechanism.  However, this involves more than a kind of "DOM"
model or infoset for the statement.]


-- 
Frank Manola                   The MITRE Corporation
202 Burlington Road, MS A345   Bedford, MA 01730-1420
mailto:fmanola@mitre.org       voice: 781-271-8147   FAX: 781-271-8752

Received on Thursday, 14 June 2001 15:57:26 UTC