Re: N-Triples: Naming anonnodes

>>>Aaron Swartz said:
> To be quite honest I'm getting a bit tired of arguing over the 
> N-Triples syntax, but I thought I'd throw this out there.

The N-Triples syntax seems to have reached a conclusion (I'll address
in other threads) after the usual kind of technical discussion that
happens.  I and other N-Triples parser coders, N-Triple
reader/writers need this precision.

> Today I came up with the idea that we could define anonymous 
> nodes  as having a special representation in the abstract syntax, 
> but define the N-Triples conversion from RDF as having a 
> well-defined set of names for them. ...

Addressing issues with the model by making it depend more on the XML
syntax doesn't seem like a good idea (more below).

> ... IOW, an algorithm such as 
> the one used by SiRPAC or CARA to generate anonymous nodes in 
> order could be specified somehow (using XSLT, for example), and 
> all RDF/XML parsers which outputted N-Triples could follow this 
> specification. That way, comparison of N-Triples documents could 
> be reduced to a simple sort/diff operation, while still 
> retaining the semantics of anonymous nodes.
> 
> An example:
> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
> <rdf:Description>
>    <rdf:value>foo</rdf:value>
> </rdf:Description>
> </rdf:RDF>
> 
> could be unambiguously defined as:
> 
> _:g0 rdf:value "foo" .
> 
> as parsers would use anonymous node names in the form of _:gn 
> where n begins at zero and increments for each anonymous node 
> reached in the XML.
> 
> Obviously this wouldn't work for N-Triples output from a data in 
> which ordering information had been lost (I have some ideas on 
> that one too, but they are less formed) but it would work for 
> any RDF/XML -> N-Triples conversion.
> 
> Another option would be to use the number of the element from 
> which the anonymous node occurs. In this case the number would 
> be 2, as the <rdf:Description> is the second element in the 
> document.
> 
> This seems to give the best of both worlds.
> 
> Thoughts?


RDF/XML parsers can legitimately attack the RDF statement generation
in different orders (e.g. suck it all into a DOM and take top level
elements in reverse order), and they can thus generate identifiers
for anonymous nodes in different orders.

Unless all the parsers followed pretty much the same algorithm, this
processing was written into the description of the XML syntax and
model, and this processing model enforced, it would be hard to do and
cause a lot of rewriting of existing well-used parsers e.g. W3C
SiRPAC, Stanford RDF API, Jena's RDF Filter.

As a parser writer, I feel this would be a major problem to retrofit.

More evidence: the author of Repat (C version of RDF Filter) tried to
make his generated IDs match those genids made by (some) SiRPAC/RDF
Java API parser and he found it very hard since it basically meant
changing the code structure to match that of the Java.

If in future, we invented a new XML syntax which had slightly
different ordering, we would be in trouble also generating the anon
nodes in the same way.

The current alternative is better: use standard graph compare
algorithms that have already been implemented at least twice (Jeremy
@HP, Jan, Jos?) in rather a short time.

So I think this idea is not worth it, for the sake of making node
merging easier in N-Triples.

Dave

Received on Sunday, 22 July 2001 06:09:32 UTC