- From: Tim Finin <finin@cs.umbc.edu>
- Date: Fri, 11 Aug 2006 18:31:27 -0400
- To: semantic-web@w3.org
- CC: Li Ding <dingli1@gl.umbc.edu>
Swoogle [1] has a collection of over 1M error-free RDF documents collected from the Web and an additional ~700K documents that have embedded RDF, are malformed but appear to be RDF, or are no longer accessible. We've intentionally limited the number of simple RSS and FOAF documents in the current collection. Only about 5% of these documents contain *any* triples that contribute to a definition. The rest consist of all data. We've determined that most of the 5% that contain definitional triples do so incorrectly and should be all data. Of the remaining ones, many are duplicates and copies. We estimate that only about 1% of Swoogle's collection are proper 'ontologies' that are intended to (partially) define at least one named term. That said, the vast majority of defined classes have *no* immediate instances and the vast majority of properties have *never* been used to assert a value. Most defined RDF terms have not been used. [1] http://swoogle.umbc.edu/ -- Tim Finin, Computer Science & Electrical Engineering, Univ of Maryland Baltimore County, 1000 Hilltop Cir, Baltimore MD 21250. finin@umbc.edu http://umbc.edu/~finin 410-455-3522 fax:-3969 http://ebiquity.umbc.edu
Received on Friday, 11 August 2006 22:29:50 UTC