Semantic Web (RDF) Identifiers: Requirements and Features from Sandro Hawke on 2003-02-05 (www-archive@w3.org from February 2003)

From: Sandro Hawke <sandro@w3.org>
Date: Wed, 05 Feb 2003 14:35:06 -0500
To: dbooth@w3.org
Cc: www-archive@w3.org
Message-Id: <200302051935.h15JZ6919637@sarlacc.w3.org>
[ some notes ]


1.  Graph Serialization

    Be able to convey information about cyclic relationships, like: 

          Sam admires Betty
	  Betty admires Geoff
	  Geoff admires Sam

    Anything which does this is an "identifier".  In XML, ID/IDREF can
    do this.  In programming languages they are usually called
    identifiers.

    Without identifiers, at best you have a tree structure:
        <something> admires 
             <something> which admires 
                 <something> which admires ....

    In an abstract sense, any transmittable object from a very large
    space of objects (>=32 bits) should suffice.  Strings work well,
    as do structures which can be serialized as strings.  In another
    sense, identifiers are a class of lexemes (tokens) in the
    language.

    These are local-scope or document-scope identifiers.  In RDF they
    are NodeIDs.

2.  Graph Merging

    Be able to convey cyclic information from multiple sources

    Doc1: Sam admires Betty
    Doc2: Betty admires Geoff
    Doc3: Geoff admires Sam
    
    and we know the geoff in Doc2 and Doc3 are the same individual.

    When you merge texts, you need the identifiers to be in the same
    namespace (have the same mapping from identifier-to-individual).
    It can be possible (with some sort of "imports" directive) to
    include-and-remap, but that hasn't been the tradition in RDF.

    These all-in-one-big-namespace strings are global or universal
    identifiers.   We have two kinds of theses:

    [2a] Mintable Identifiers.  When you need an identifier for
    something, and you don't already have one, you make up a new one
    using an algorithm that produces new identifiers.  In AI terms
    these are Skolem constants; the UUID and tann/taguri algorithms
    work in many cases, as do crptographically random numbers.

    [2b] One-True-Name Identifiers.  Every reference to something is
    made using the same identifier.  If you don't know the
    one-true-name, go look it up somewhere.  If there is no name, then
    you can add one.  In AI terms, this corresponds to the
    unique-names assumption.  We know of no practical approach to
    providing this kind of identifier in the general case, so this is
    not actually a feature of any identifier plan.  It is listed here,
    however, because it would obviously be useful and this lets us
    be clear about schemes which might provide it in certain limitted
    domains. 

3.  Document-Reference and Self-Reference

    Documents (web pages) should be able to refer to and convey
    information about themselves and each other.

    Doc1: Doc1 and Doc2 were last updated (by Sam) in 1997.

    Documents, in this sense, are maintainable, mutable things, not
    fixed artifacts.
    
    There is some interplay between Document-Reference and Graph
    Merging: the pair <DocumentId, NodeID> functions as a MintableID
    for graph merging.

4.  Auxiliary Information Retreival

    Given a document D which uses an identifier X, be able to find more
    information (IX) about X.  There are different qualities associated
    with how retreival might work:

    - Deterministic: the mapping from (D,X) to IX is specified and
      well-understood (ie not Google)

    - Distributed: the mapping does not go through a central point,
      (ie no SemanticLookupService.net)

    - Maintainable: the information IX can be updated over time
      (ie D and X can not include the secure hash of IX)

    - Secure: the mapping from (D,X) to IX can only be affected by
      authorized parties (eg D or X includes the secure hash of the
      public key used to sign Y)

    - Authoritative: the mapping is determined by the social entity
      who minted X and/or the one who wrote D.

    - Annotatable: additional mappings from X to IX2, IX3, ..., may
      be contributed by 3rd parties.   (lots of issues here; maybe
      this needs to be separated out.)

    - Clickable: identifier works as a web address in deployed
      browsers to get IX.   That is D is not used; X->IX is web
      dereference function.  Should conneg work between RDF and HTML? 

    The big split in approaches is:
      - X gives URL for IX (eg "hash")
      - D gives URL for IX (eg rdfs:isDefinedBy in instance data)
      - Someone else maps X to IX (eg rdfs:isDefinedBy in other doc)
Received on Wednesday, 5 February 2003 14:36:28 UTC