- From: Gannon Dick <gannon_dick@yahoo.com>
- Date: Thu, 22 Aug 2013 09:18:17 -0700 (PDT)
- To: Brand Niemann <bniemann@cox.net>, "public-egov-ig@w3.org" <public-egov-ig@w3.org>
- Cc: "\"Holm, Jeanne M \\\(1760\\\)\"" <jeanne.m.holm@jpl.nasa.gov>
- Message-ID: <1377188297.11450.YahooMailNeo@web122903.mail.ne1.yahoo.com>
First this (see message below): Fw: Federated Knowledge Extraction (FOX) version 2.0.0 The Census applied statistical methods and came nowhere close to this certainty. http://www.census.gov/population/www/documentation/twps0004.html Overselling the Semantic Web maybe ? I can't be "confident" statistically. re: http://semanticommunity.info/An_Open_Data_Policy/Project_Open_Data I can say for sure that reading too much into the real-time profile of an Open Government Data user is problematic. My own solution is to throw out the identifying junk before you have to account for it, but that's just me. 1) For example, the keywords for the USDA data sets in data.gov often name organisms ("pests") with scientific names. This is not farming industry jargon, and one cannot infer a user level of expertise then go on to infer a service level "required" for the customer. FOAF (Friend of a Friend) breaks. 2) For example, if you make a government data source bi-lingual that's a good thing. But to infer you have an English speaker or a Spanish speaker "on the line" is an over-reach. The Census keeps track : http://www.census.gov/compendia/statab/cats/population/ancestry_language_spoken_at_home.html I count 39 languages with lots of basket categories, I counted, I think 108 (exploded) at one time. The Open Government Data Help Desk is not so helpful if they try too hard. 3) For example, m/data.gov(dot Country Code)?/ is a cyberspace domain set and (so far) one of the more trusted brands. The display language (terminology) might be chosen to encourage tourism or propagate FUD+Secrecy, but regardless, the "message" is monolithic within a given domain and cannot be hijacked by display language. The "message" may be bad smelling propaganda, but it can't be spammed, spoofed or deodorized by censorship. When the display languages are set up as federated identifiers then the Open Government Data space members can be grouped by the display language or terminology they use. I'll set up a little demo. Brand, Keyed by ISO 3166 Country Codes (same as yours) and ISO 631-1 Languages (transformed, not scraped from http://id.loc.gov/vocabulary/iso639-1.rdf) --Gannon ----- Forwarded Message ----- From: Axel Ngonga <ngonga@informatik.uni-leipzig.de> To: public-lod@w3.org; "semantic-web@w3.org" <semantic-web@w3.org> Sent: Thursday, August 22, 2013 8:22 AM Subject: Federated Knowledge Extraction (FOX) version 2.0.0 Dear all, We are delighted to announce a new version of the Federated Knowledge Extraction Framework FOX [1]. FOX provides an architecture that allows using supervised machine learning to combine the results of knowledge extraction frameworks. Since the last version, we extended FOX to: - Be more time-efficient - Achieve better accuracy (we have gone past the magical threshold of 90% F-measure on Named Entity Recognition and achieve up to 92% on locations and organizations as well as amost 98% on persons) - Support more output formats (NIF, JSON, Annotea, etc.) - Provide a light feature for users with almost real-time requirements - Provide better entity disambiguation on DBpedia based on the AGDISTIS framework [2] The new version has already been included into GeoLift [3], a framework for the automatic extension of RDF datasets with geo-spatial information created within the GeoKnow project [4]. Your feedback is more than welcome. Best regards, Rene and Axel on behalf of AKSW [1] http://fox.aksw.org [2] https://github.com/AKSW/AGDISTIS [3] http://github.com/AKSW/GeoLift [4] http://geoknow.eu -- Axel Ngonga, Dr. rer. nat Head of SIMBA/AKSW Augustusplatz 10 Room P616 04109 Leipzig Tel: +49 (0)341 9732341 Fax: +49 (0)341 9732239
Received on Thursday, 22 August 2013 16:18:46 UTC