- From: Aaron Swartz <aswartz@upclink.com>
- Date: Sun, 21 Oct 2001 01:51:14 -0500
- To: "Sean B. Palmer" <sean@mysterylights.com>
- Cc: www-archive@w3.org
I hope you don't mind but I took the liberty of cleaning up ntriples.py.
I made some small changes:
- I renamed it "NTriples Tools: Parses and serializes N-Triples
documents."
- I moved the license down to a __license__ variable.
And a major one: I didn't see the reason it was a class with
lots of little functions. I really only wanted two things from
it: take an N-Triples string and give me back an RDF store
(parse) and take a store and give me back an N-Triples string
(serialize). So I linearized it into two plain old functions:
parse(document, store=rdf.Store()) and serialize(store).
I took out the special NTriplesURLopener, since I figured
calling apps could deal with URIs on their own. I also took out
the specialized code to deal with file, file names, files, etc.
I also fixed a number of bugs along the way. Resulting code is
104 lines + command line interface. There still looks like a lot
of room for tersification, but since I couldn't follow the
de-commented code very well, I didn't bother (and it's getting
late).
Let me know if you have questions,
- [ "Aaron Swartz" ; <mailto:me@aaronsw.com> ;
<http://www.aaronsw.com/> ]
#!/usr/bin/python
"""
NTriples Tools: Parses and serializes N-Triples documents.
http://infomesh.net/2001/10/ntriples/
Built on Aaron Swartz's RDF API: http://blogspace.com/rdf/rdfapi.txt
cf. http://www.w3.org/TR/2001/WD-rdf-testcases-20010912/#ntriples
"""
import sys, string, re, urllib
import rdfapi as rdf
__author__ = "Sean B. Palmer with Aaron Swartz"
__version__ = '1.1'
__license__ = """
Copyright (C) 2001 Sean B. Palmer.
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
02111-1307, USA.
"""
def parse(document, store=rdf.Store()):
bNodes = {}
CTriple = []
# Uncomprehensible regexps
t = r'(<[^>]+>|_:[^\s]+|\"(?:\\\"|[^"])*\")'
rt = re.compile(r'[ \t]*'+t+r'[ \t]+'+t+r'[ \t]+'+t+r'[ \t]*.[ \t]*')
rc = re.compile(r'(\#[^\n]*)')
rw = re.compile(r'[ \t]+')
# Normalize the new lines in document
if len(document) == 0: raise 'Document has no content'
else:
document = string.replace(document, '\r\n', '\n')
document = string.replace(document, '\r', '\n')
# Parse document into tripleList
lines = string.split(document, '\n')
for line in lines:
if len(line) == 0: continue # line has no content (a double '\n')
elif rt.match(line):
terms = rt.findall(line)[0]
for term in terms:
if term[0] == '<' and term[-1] == '>': # Term is a URI-view
CTriple.append(term[1:-1])
elif term[:2] == '_:': # Term is an unlabelled node: bNode
bNode = term[2:]
if re.compile(r'[A-Za-z][A-Za-z0-9]*',
re.S).match(bNode):
if not bNode in bNodes.keys():
bNodes[bNode] = rdf.node()
CTriple.append(bNodes[bNode])
else: raise 'bnode: "'+bNode+'" is not a valid bNode'
elif term[0] == '"' and term[-1] == '"':
CTriple.append(unicode(term[1:-1]))
else: raise 'Term '+str(term)+' is not a valid
NTriples term.'
store.triple(CTriple[0], CTriple[1], CTriple[2])
CTriple = [] # Reset the current triple
elif rc.match(line): continue # Line is a comment
elif rw.match(line): continue # Line is just whitespace
else:
SyntaxError = "Line is invalid"
raise SyntaxError, line # Validity error
return store
def serialize(store):
"""Prints out as NTriples (Aaron wrote this function).
Aaron notes: The code is really ugly and needs to be cleaned up."""
nodeIdMap, nodeIdNum, output = {}, 0, []
for t in store.tripleList:
if (not hasattr(t.subject, 'uri')
and t.subject not in nodeIdMap.keys()):
nodeIdNum += 1
nodeIdMap[t.subject] = 'a' + `nodeIdNum`
if (not hasattr(t.predicate, 'uri')
and t.predicate not in nodeIdMap.keys()):
nodeIdNum += 1
nodeIdMap[t.predicate] = 'a' + `nodeIdNum`
if (not hasattr(t.object, 'uri')
and t.object not in nodeIdMap.keys()):
nodeIdNum += 1
nodeIdMap[t.object] = 'a' + `nodeIdNum`
if t.subject in nodeIdMap.keys(): sub = '_:' + nodeIdMap[t.subject]
else: sub = '<'+t.subject.uri+'>'
if t.predicate in nodeIdMap.keys(): prd = '_:' +
nodeIdMap[t.predicate]
else: prd = '<'+t.predicate.uri+'>'
if t.object in nodeIdMap.keys(): obj = '_:' + nodeIdMap[t.object]
else:
if t.object.uri[:6] == "data:,":
obj = '"'+ rdf.URIToLiteral(t.object.uri) +'"'
else: obj = '<'+t.object.uri+'>'
output.append('%s %s %s .' % (sub, prd, obj))
return string.join(output, '\n')
def run():
x = parse(open(sys.argv[1]).read())
print serialize(x)
# Main program
if __name__ == "__main__":
run()
# Phew
Received on Sunday, 21 October 2001 02:52:06 UTC