- From: Jeff Mixter <jeffmixter@gmail.com>
- Date: Mon, 10 Aug 2015 15:32:12 -0400
- To: corey.harper@nyu.edu
- Cc: Dan Scott <denials@gmail.com>, "Young,Jeff (OR)" <jyoung@oclc.org>, "LeVan,Ralph" <levan@oclc.org>, Richard Wallis <richard.wallis@dataliberate.com>, "public-schemabibex@w3.org" <public-schemabibex@w3.org>
- Message-ID: <CAC=429AD4f9hv1HALhaXbEs77QH9iN3WGeovVzOioVOjS-4F-g@mail.gmail.com>
Having used open source reconciliation tools like Open Refine to clean up and enhance DC data I would like to echo Jeff Y's concern about just using schema:Thing. Although one could help limit the number of possibilities by searching for schema:Thing with a predicate of schema:author, schema:creator etc. this is not really a viable alternative for cleaning up data (since no tools make use of it). Cory, with regards to some of those ~70 Million. You would be surprised at how creative some of the MARC records are :) Also a lot of those come from other parts of the MARC record. A good example is a 260 field that has the Publisher information. One can not assume that the publisher is always an Organization and there is no sub-field code to indicate if the $b is a Person or Organization. Alternatively, I would argue that it is possible to assume that the publisher is some sort of Agent. Jeff Mixter On Mon, Aug 10, 2015 at 3:10 PM, Corey A Harper <corey.harper@nyu.edu> wrote: > Dear all, > > My $0.02: I also think that schama:Thing is the best option at this time, > and don't think we should push too much on Agent given what I consider > relatively limited usefulness. I understand Jeff's point about the dangers > of "not sorting these out", but I also think that we can store and manage > data with whatever specifity we want, and I'm not sure those dangers apply > to data as published downstream to consumers on the Web. > > I'm also _very_ interested in knowing more about the 70 Million + "mystery > agents" Richard and Jeff have been referencing. Are these just 1xx and 7xx > data points that are type unknown because they haven't matched a known > entity with a known type? Can't we at least infer more about their type by > their Marc field? Can we see some example instance (bib) data where these > show up? > > Best, > -Corey > > On Mon, Aug 10, 2015 at 1:55 PM, Dan Scott <denials@gmail.com> wrote: > >> FWIW, the Bibliographic Ontology (bibo) also uses foaf:Agent. >> >> But I concur with the developing dissenting opinion on the github issue >> that, if we have nothing specific to say about the nature of the entity >> because we lack the information, it's better to simply avoid the compromise >> of Agent. We might make ourselves feel a bit better about the dismal state >> of our bibliographic data through an abstract class like Agent, but in the >> end it doesn't really add any data to the data we're trying to express. >> >> Using schema:Thing seems like an acceptable fallback in the mean time, >> and allows the data expressed by the target links to be refined to either >> Person or Organization at some point in the future when the effort occurs. >> >> >> On Mon, 10 Aug 2015 at 11:29 Young,Jeff (OR) <jyoung@oclc.org> wrote: >> >>> I made an argument that the problem is broader than bib records: >>> >>> >>> >>> https://github.com/schemaorg/schemaorg/issues/700#issuecomment-129078302 >>> >>> >>> >>> Limiting to our situation, though, Richard cites the count from WorldCat >>> at 72 million “agents” (people and organizations excluded): >>> >>> >>> >>> https://github.com/schemaorg/schemaorg/issues/700#issuecomment-129227478 >>> >>> >>> >>> These all have Linked Data identifiers, but they are only mechanized >>> placeholders in need of exposure, reconciliation, and enrichment. >>> >>> >>> >>> The danger of not sorting these out is that naïve automated “entity >>> matching” processes resort to string matching on name as an “else >>> condition” and the resulting mix up manifests itself in the Linked Data. >>> >>> >>> >>> I suggested Google Custom Search as a possible tool to help with >>> discovery and possibly lead to an interface where they could be reconciled: >>> >>> >>> >>> https://github.com/schemaorg/schemaorg/issues/700#issuecomment-129239474 >>> >>> >>> >>> Jeff >>> >>> >>> >>> *From:* LeVan,Ralph >>> *Sent:* Monday, August 10, 2015 10:33 AM >>> *To:* Young,Jeff (OR); Richard Wallis; public-schemabibex@w3.org >>> >>> >>> *Subject:* RE: The Agent proposal in bib.schema.org is controversial >>> >>> >>> >>> One of the arguments against Agent was that if you didn’t know what kind >>> of object a thing was, then you just shouldn’t say. All the properties of >>> Agent seem to come from Thing. I’d propose that we just use Thing. >>> >>> >>> >>> My guess is that the need for Agent comes mostly from our need to >>> convert existing bib records into RDF and some of our crappy old bib >>> records don’t reliably distinguish the type of agent involved. Rather than >>> be caught out in a lie about whether the agent is a Person or Organization, >>> we’d rather say less. This is a problem peculiar to our situation and not >>> a broad problem of the internet community. It’s also a short-term >>> problem. Selling ‘Agent’ to a community that doesn’t need it is going to >>> be an uphill battle. >>> >>> >>> >>> What’s wrong with dropping all the way back to Thing when we don’t know >>> the type of the agent? >>> >>> >>> >>> Ralph >>> >>> >>> >>> *From:* Young,Jeff (OR) [mailto:jyoung@oclc.org <jyoung@oclc.org>] >>> *Sent:* Monday, August 10, 2015 10:04 AM >>> *To:* Richard Wallis; public-schemabibex@w3.org >>> *Subject:* RE: The Agent proposal in bib.schema.org is controversial >>> >>> >>> >>> One option would be for us to use foaf:Agent. Presumably search engines >>> would ignore it, but that’s their prerogative. >>> >>> >>> >>> Another option would be to preserve http://bibliograph.net/Agent, with >>> a comment that it wasn’t accepted by the broader community, but remains >>> useful in our limited domain. (Terms that have been adopted should be >>> deprecated.) >>> >>> >>> >>> Jeff >>> >>> >>> >>> >>> >>> *From:* Richard Wallis [mailto:richard.wallis@dataliberate.com >>> <richard.wallis@dataliberate.com>] >>> *Sent:* Monday, August 10, 2015 8:18 AM >>> *To:* public-schemabibex@w3.org >>> *Subject:* The Agent proposal in bib.schema.org is controversial >>> >>> >>> >>> You may have noticed if you followed the recent announcement of >>> Schema.or v2.1 >>> <https://lists.w3.org/Archives/Public/public-schemabibex/2015Aug/0000.html>, >>> which includes bib.schema.org, that one of our proposals did not make >>> it in. That proposal being the Agent type that we proposed as a super-type >>> for Person and Organization. >>> >>> >>> >>> Agent has been a theme of discussion in the community well before we >>> approached the issue. You can follow the recent debate in the related >>> schemaorg git issue comment trail: >>> https://github.com/schemaorg/schemaorg/issues/700 >>> >>> >>> >>> In the bibliographic world Agent is a well understood, some would say >>> obvious, approach. When applied to the wider domains that Schema.org >>> embraces however, it raises many concerns and issues. Especially because, >>> as proposed, it would introduce a new direct sub-type of Thing with >>> ramifications that could cascade across many areas of the vocabulary. >>> >>> >>> >>> In my personal opinion the gap between the two apposing views on this is >>> significant and the best way forward would be to consider possible >>> pragmatic approaches to how we represent our data in Schema.org without >>> loosing the ability to describe our resources effectively to the wider >>> world. >>> >>> >>> >>> In simple terms, if we identify an author, creator, publisher, or even >>> copyright holder as a Person or an Organization there is not a problem. >>> The difficulty occurs when we know from the relationships in the data that >>> they are either a Person or an Organization but cannot identify which. >>> >>> >>> >>> One suggested way forward for such a circumstance would be to define >>> them as a schema:Thing. To me this feels a little too vague. A follow-on >>> option was to suggest a 'personOrOrganization' boolean property to indicate >>> this circumstance. This is a little more appealing, but I think it still >>> needs some work. >>> >>> >>> >>> What are others thoughts on this? >>> >>> >>> >>> Do we believe that the proposed Agent type is the *only* way forward? >>> Are there potential pragmatic options like the one I describe above that we >>> could shape, that would be acceptable? Is this requirement to specifically >>> describe agents as too detailed and something we can pass over, and move on >>> to other things? >>> >>> >>> >>> ~Richard. >>> >>> >>> >>> >>> >>> >>> Richard Wallis >>> >>> Founder, Data Liberate >>> >>> http://dataliberate.com >>> >>> Linkedin: http://www.linkedin.com/in/richardwallis >>> >>> Twitter: @rjw >>> >> > > > -- > Corey A Harper > Metadata Services Librarian > New York University Libraries > 20 Cooper Square, 3rd Floor > New York, NY 10003-7112 > 212.998.2479 > corey.harper@nyu.edu > -- Jeff Mixter jeffmixter@gmail.com 440-773-9079
Received on Monday, 10 August 2015 19:32:41 UTC