RE: Comments on the Stanford RDF API from McBride, Brian on 2000-05-08 (www-rdf-interest@w3.org from May 2000)

From: McBride, Brian <bwm@hplb.hpl.hp.com>
Date: Mon, 8 May 2000 12:18:40 +0100
To: "'Sergey Melnik'" <melnik@DB.Stanford.EDU>, "McBride, Brian" <bwm@hplb.hpl.hp.com>
Cc: "RDF Interest Group (E-mail)" <www-rdf-interest@w3.org>
Message-ID: <5E13A1874524D411A876006008CD059F23921C@0-mail-1.hpl.hp.com>
>
>Sounds encouraging! Are you planning to open source some of your work?
>

I have agreement in principle to do that.  I'll need to check with our 
legal folks about the process for doing this.  And the code will need 
to be ready.

>> o Issue: Vector.indexAt() does not work for a vector of RDFNode.
>> 
>>   Reason:  RDFNode.equals(RDFNode n) should be 
>RDFNode.equals(Object o).
>....
>I'm not sure what causes the trouble for you...
>

My fault.  I found a need to add the equals method to the interface
RDFNode.  I added it a while back and then forgot I'd done it.  It was
me who got it wrong.

So this one becomes a request to add boolean equals(Object o) to RDFNode.

>
>Good point. Plan: introduce ModelException. ImmutableModelException can
>be thrown if
>the model is not modifiable (subclass of ModelException). Are 
>there more
>exception subtypes that can be generally useful?

Were you planning that ModelException would carry a value?  If so,
I'm not sure you need the subclass ImmutableModelException that just
can just be an instance of the modelExceptionClass.  Most of my exceptions
come from the database, so I'd like a way to include a nested exception.
I can always subclass ModelException if you'd rather not do that in a
general way.

>
>> o Issue:  Namespace names lost on import
>> 
>>   Reason: When I'm importing an RDF serialization into the 
>database, I'm
>>         passed the full URI.  It is not possible in general, 
>to parse that
>>       URI and pick out the namespace component.  It is 
>better to retain
>>       the namespace component which can then be used for better user
>>       presentation in an editor and for better serialization 
>of the model.
>> 
>>   Remedy: Add methods Model.createResource(String nsName, 
>String roName),
>>           String RDFResource.nsName() and String 
>RDFResource.roName().
>
>Unlike UML, namespaces are not explicitly present in the RDF 
>model. They
>are merely used to make resource identifiers unique. Namespace 
>shortcuts
>in XML are "syntactic sugar". As long as the parsed models are
>equivalent, it does not matter what namespaces to use. 

I would disagree with this for the following reasons:

	1) Syntactic sugar or no, there is benefit in retaining
         the original namespace/roname separation for presentation
         to a user.

      2) RDF Schema says the namespace tells an application which
         schemas are applicable.  From section 2.3.5 of the schema
	   spec:

		Although XML namespace declarations will typically
		provide the URI where RDF vocabulary resources are
		defined, there are cases where additional
		information is required.

	   And from 4.1.2 of the same spec:

		RDF uses the XML Namespace facility [XMLNS] to
		identify the schema in which the properties and
		classes are defined.

         An application needs to know the correct namespace.
	   I'm not sure that I'm entirely comfortable with this
	   aspect of RDF Schema, but it is what the spec
	   currently says.

	3) In some applications there may be considerable space
         savings.  For example, in loading a wordnet model into
         a database.

      4) see next comment ...


For example, the
>following two serializations are interchangeable:
>
>(1)
>
><rdf:RDF
>xmlns:s="http://www.omg.org/uml/1.3/Behavioral_Elements.State_M
>achines.">
>
>  <s:StateMachine>
>    <s:transition> ... </s:transition>
>  </s:StateMachine>
>
></rdf:RDF>
>
>(2)
>
><rdf:RDF xmlns:s="http://www.omg.org/uml/1.3/">
>
>  <s:Behavioral_Elements.State_Machines.StateMachine>
>    <s:Behavioral_Elements.State_Machines.transition> ...
></s:Behavioral_Elements.State_Machines.transition>
>  </s:Behavioral_Elements.State_Machines.StateMachine>
>
></rdf:RDF>
>

Please can you identify the part of the namespace spec on which
you base this.  I suggest that section 5.3, A.3 and
A.4 of the namespace spec refute this assertion.


>Sometimes it is nice to be able to "extract" namespaces from URIs for
>more compact/legible serialization. For that, any prefix can be used as
>long as it occurs reasonably often and the suffixes do not contain
>illegal characters.
>
>> o Issue:  Current API has no way to set a namespace prefix
>> 
>>   Reason: When displaying URI's, and when serialising it 
>would be good
>>         to display a namespace prefix that is meaningful to a human.
>> 
>>   Remedy: Add method Model.setNsPrefix(String nsName, String prefix)
>
>I think this is not needed...

Are you saying that I have the requirements for my application wrong
or are you saying that there is another way to achieve the same effect.

> 
>> o Issue: New query methods.
>> 
>>   Reason:       When generating an RDF serialization, it is 
>convenient to be
>> able to
>>         list all the namespaces used in a model so they can 
>be output at the
>> head
>>         of the serialization.  For my RDF editor, I want to 
>be able to list
>> all
>>         all the unique subjects in the model, and I'd like 
>to use a database
>>       query rather than troll through all the statements and 
>pick them out
>>         myself.  See also the stylistic note below.
>> 
>>   Remedy:  Add methods:
>>                 
>>                 RDFEnum Model.namespaces();
>>                 RDFEnum Model.subjects();
>>                 RDFEnum Model.predicates();
>>                 RDFEnum Model.objects();
>
>Ad RDFEnum Model.namespaces():
>
>For very large datasets, even this approach may not be appropriate. If
>you one has to serialize a billion statements from a database, 
>namespace
>information may not fit into main memory. I'd suggest to read 
>subsets of
>statements and generate partial serializations. So, every time you can
>collect namespaces prefixes from the given subset in memory before
>dumping it.

Yes, that would be sensible.

>As to subjects(), predicates(), objects():
>
>If adding something, I'd rather provide:
>
>	Enumeration Model.getResources();
>	Enumeration Model.getLiterals();
>
>Why do you need to distinguish between subjects, predicates 
>and objects?
>Can you explain why any of the above methods might be needed in some
>more detail?

I think we agree namespaces are useful for serialization.

The editor/browser I'm prototyping has a user interface which
consists of three panels as follows:

==================================================================
|Subject                                                         |
|  Brian's home page                                             |
|  Brian's personal home page                                    |
|                                                                |
|----------------------------------------------------------------|
|Predicate                                                       |
|                                                                |
|                                                                |
|----------------------------------------------------------------|
|Object                                                          |
|                                                                |
|================================================================|

When a model is opened it displays all the subjects in the subject
panel.  When a subject is selected it displays all the predicates
in the model for that subject.  When a predicate is selected it
shows the object.

For this application, I want to list all the subjects.  This could
be done by enumerating the statements - all the information is
there - but I'd rather have the database do the work.

The inclusion of predicates() and objects() was just based on
symmetry.

I like your suggestion of resources() and literals() and I'd
suggest that you add these.

> ...

>Using integers to manipulate resources/literals is a very valid
>approach. If may be the only feasible one if you have billions of
>statements stored persistently. I'm thinking of adding the following
>interface to support this:
>
>interface IntegerIdentifiable {
>
>   long getIntegerID();
>}
>
>If your database uses integers internally, you don't even need to load
>string URIs into memory until you have to serialize the model. Your
>custom Resource and Literal implementations could implement the above
>interface. Makes sense? Can you think of a better naming than the one
>above?

The naming is fine.  My view is that to handle anon resources as defined
in the spec there is a need to be able to generate a unique id for
for anon nodes.  So I think that this is something that needs to be
built into RDFNode.  But the outcome of this will depend on a resolution
of how to handle anon resources.  

>
>> 
>> o Issue:  What to do with model.setSourceURI() and getSourceURI().
>...
>perspective this methods seem obsolete.
>

Cool

>
>> o Issue: Model.create() does not specify URI.
>> 
>>   Reason:  See my separate note, but I don't think we are 
>far apart on this.
>> A
>>         model may have URI, so I'd expect Model.create() to 
>take a URI
>> parameter,
>>         which may be null or empty if the model is anonymous.
>> 
>>   Remedy:  modify Model.create() to be Model.create(String URI).
>
>Currently, getURI on a model returns a digest-based URI of the model.

>That's model's identity. This URI cannot be set or changed, 
>similarly to
>URIs of Resources. Why do you need this? Maybe, this is a use case for
>setSourceURI()?

I figured this might be a complicated discussion, so I sent out a
separate note on how I'm looking at this in 
http://lists.w3.org/Archives/Public/www-rdf-interest/2000May/0013.html.
Maybe we should have the architecture discussion before we come back
to the specific API issues.

>
>Right, currently, there is no provision for persistence in the API. I'm
>planning to add the following interface:
>
>interface PersistentModel {
>
>  /** return true if in-memory model is not in synch with the 
>persistent
>store */
>  boolean isDirty();
>
>  /** synchronizes persistent store with in-memory model */
>  void checkpoint() throws PersistentModelException;
>
>  /** drops the changes to the model that are not yet in the persistent
>store */
>  void rollback() throws PersistentModelException;
>}
>
>Checkpoint allows to bring DB content in a 
>transaction-consistent state.
>For example, if you write-through to the database on every add(), you
>may have the following problem. Consider adding two statements to the
>model:
>
>  (X, rdf:type, PersonWithSocialSecurityNumber)
>  (X, SSN, "123-45-6789")
>
>If your application crashes after the first add(), your 
>database becomes
>inconsistent (from the viewpoint of the application).
>
>find() invoked on a "dirty" model may throw a PersistentModelException.
>
>Let me know whether such interface fits well into your application
>architecture.
>

I'm glad you bring up the transaction issue.  It was on my list, but
for now I've been ignoring it.  All my updates are currently going
straight into the persistent store.

At first glance, I like your proposal, but I'll need to take a bit
more time and see what it would mean for my implementation.  Get back
to you again on that one.


>
>> STYLISTIC
>> =========
>> 
>> o Suggestion:  I've added some public well known constants 
>to the interfaces
>>         with things like the RDF and RDFS name spaces.
>
>What about interfaces (constant lists) in
>org.w3c.rdf.vocabulary.rdf_schema_19990303.RDFS and
>org.w3c.rdf.vocabulary.rdf_syntax_19990222.RDF? 
>

I hadn't noticed those before.  Thanks for pointing them out.
I do think they should be implementation agnostic i.e. an
app should not have to load your implementation classes.


>BTW, the next release will include an executable that generates
>"vocabulary" classes from a list of URLs of  RDF schemas.
>

Can you say some more about this.  I'm thinking about something
similar.

>> o Suggestion:  Not all models will be mutable.  Move those 
>methods that
>> modify
>>         the model into another interface, MutableModel.  
>These methods would
>>         include addStatement, createStatement, createResource,
>> createLiteral.  Or
>>         need a not implemented exception.
>
>That's a good idea. Following consideration: sometimes the 
>same model is
>mutable, sometimes it is not. If it is not, it can throw
>ImmutableModelException. To find out, one might need a method like
>
>	boolean isMutable()   (better name?)
>
>PersistentModel should extend MutableModel, otherwise its 
>methods do not
>make any sense.
>

Fine.

>
>> o Suggestion:  Not all models shoud have to support a query 
>interface so
>> move
>>          the query methods into a separate SimpleQuery interface.
>
>Hmm, I'm not sure about that. find() is such a fundamental method that
>I'd prefer to keep it in the Model interface and throw some
>NotImplemented exception instead if you really hate implementing it.
>
>However, sooner or later, we'll have multiple query languages for RDF.
>For that, interfaces like RQLQueryableModel with appropriate query
>methods are ok.

As a rule I won't argue or comment stylistic suggestions I've made.
It was exactly your point of anticipating that there will be more
query interfaces in the future that made me suggest that the 
SimpleQuery interface should be a peer of them.  Its your API,
your call.

>Well, even "better" naming would be 
>
>	Model.getNumStatements()
>	Model.getStatements()
>
>Don't you think so? That's a crucial change, it affects almost all
>classes in the API, so once doing it, let's do it right. Are 
>there other
>users of the API around who will be very unhappy about it?

I don't really mind.  Its just good if the naming is consistent.  If
its a lot of work, its not worth changing.

>
>The next release also includes
>
>	boolean Model.isEmpty()
>
>since sometimes getNumStatements() is not available. Is this a good
>name?
>

Fine.

Are there any other changes you are proposing?  It would be good to
advance notice.

Brian
Received on Monday, 8 May 2000 07:18:56 UTC