RE: Subject literals

At 04:31 PM 11/5/01 +0200, Patrick.Stickler@nokia.com wrote:

> > There is a node labeled "fi", whose type is <urn:iso:3166_1>.
> >  (That is,
> > the denotation of the node is a member of the class extension of the
> > resource identified by <urn:iso:3166_1>.)
> >
> > There is a node, which may or may not denote the same value
> > as the first
> > node, whose type is <urn:iso:639>.
>
>OK. Fair enough. The fact that you are getting two nodes
>as the result of a query at least suggests that some degree
>of ambiguity exists, but are you saying that *every* instance
>of a literal value gets its own bNode onto which one can
>hang type and other properties?
>
>How does
>
>    <urn:foo> xyz:someProperty [ rdf:value "fi"; rdf:type <urn:iso:3166_1>.
>
>get to
>
>    <urn:foo> xyz:someProperty "fi .
>    "fi" rdf:type <urn:iso:3166_1>

I don't know... there are many details that have yet to be worked out.
I was responding originally to a specific point about literal labels
on subject nodes.

>And if
>
>    "fi" rdf:type <urn:iso:639> .
>
>also exists, how do I know *which* "fi" is the actual
>value of xyz:someProperty? OK, if range is defined
>for the property, then perhaps that could help
>disambiguate.

Ditto.

>I just don't see the utility of using the literal value
>as the label of a node. What is it *really* buying us
>here?

You use the term "the label" in a way that suggests a defining quality.
I don't think that's intended.  Graph nodes have their own identity that
stands irrespective of the labels they bear.  Did you follow the
bNode debate and resolution?  Using the *graph* as the primary abstract
syntax allowed scoping issues for non-global names to be avoided when
defining the model theory, even if some syntaxes (i.e. N-triples) must
address them.

I think the utility here is that it reflects ways that RDF is actually 
being used.

One could define a framework that doesn't use literals as labels in this way,
but if it broke existing RDF user's practices and expectations, that would
severely reduce its usefulness to this community.

>Data typing of literals is a context-sensitive thing, because
>literals are ambiguous.

Yup.

>To treat literals as node labels is to introduce that
>ambiguity into the graph. Why? How is that any more
>flexible or useful than bNodes?

It only introduces ambiguity into the graph if you're using them
to identify bits of the graph.  But that's not the intent.
Graph nodes already have their own identity independent
of whatever labels they carry;  the "tidiness" requirement is not
needed to disambiguate the graph, but is believed to simplify some
of the model theoretic proofs.

In the case of a graph drawn on a pirce of paper, the identity of
a node is bound to its position on the paper.
Different positions => different nodes, whetever label they bear.

> > >The fact is that "fi" is being used to represent distinct
> > >things, and as such, should be a URI in both cases
> > >and not a literal.
> >
> > I'm not sure exactly how to interpret your phrase 'is being used to
> > represent', but it seems to carry implications of uniqueness
> > that are not
> > present.  I'd find 'is being used to label' a closer description.
>
>In essence, the local type is acting as a kind of namespace
>for the literal, to differentiate it from other equivalent
>literals belonging to other namespaces.
>
>I.e. what we really have is (urn:iso:639)fi and (urn:iso:3166_1)fi
>which are distinct "things" (abstract, yes, but distinct).

Yes.  The whole point here, as I see it, is to provide the theoretical
framework whereby the appropriate type can be applied without having
to be explicitly specified as part of the syntactic presentation of
the literal.

Allowing literals as subjects in the graph makes it easier to express
the graph closure that one gets by applying the type inference rules.
(e.g. knowing the range of a property allows us to know something about
the type of a literal used with that property.  Allowing a node labeled
with a literal in the graph allows us to express that knowledge simply:

     (someResource) ---ex:size---> "10"

     (ex:size) ---rdfs:range---> (xsd:integer)

allows us to infer that the particular node above labeled with "10"
has the type xsd:integer:

     "10" ---rdf:type---> (xsd:integer)

What I've done here is try to express a graph in linear form, and to do that
requires that we have a (locally) unique label for each distinct node.
In the graphical form, that's not a problem:

     (someResource) ---ex:size---> "10" ---rdf:type---> (xsd:integer)

the type arc just attaches to the same graph node.  End of story.

Summary:  in the graph form, the label is not needed to identify a graph node,
so ambiguity isn't a problem.  In a serial form, a unique labelis needed to
identify each node, and the ambiguity of literal labels means they're not
suitable for this purpose.

[much stuff snipped, because I think the fundamental point is addressed above.]

>In practice? I'd appreciate seeing examples. My guess is that if
>literals are being used in ways that would enable them to
>act as the subjects of statements that you have a pretty
>closed system where there exists no ambiguity between any
>literals (i.e. they are the equivalent of URIs for all
>practical purposes) or the degree of ambiguity is so small
>that it is easy to program around it.

The point, I think, is that literal strings are not being used to
identify the nodes. The ambiguity prevents that, as I think you are saying.

Rather, they tell us something (more or less) about the value that a node
denotes.  It's quite OK for the same literal used in different contexts
(i.e. having different additional information) to say different things
about the corresponding denoted value.  And in the absence of other
information, a literal says very little about the value denoted.

If you have two graph nodes labeled with the same literal string,
you definitely need more information about those nodes to conclude
that they denote the same value.  Under the scheme being proposed,
literal strings alone are emphatically not sufficient basis for
node identification or merging.

#g



------------------------------------------------------------
Graham Klyne                    MIMEsweeper Group
Strategic Research              <http://www.mimesweeper.com>
<Graham.Klyne@MIMEsweeper.com>
------------------------------------------------------------

Received on Monday, 5 November 2001 17:22:05 UTC