Re: OpenJurist, semantic web/linked data from Brian Gryth on 2009-11-25 (public-egov-ig@w3.org from November 2009)

From: Brian Gryth <briangryth@gmail.com>
Date: Wed, 25 Nov 2009 10:08:44 -0700
To: Sam Deskin <sam@openjurist.org>
Cc: public-egov-ig@w3.org
Message-ID: <894ba28d0911250908ncff860aqe91d1ebc1d3d489f@mail.gmail.com>
Sam,

The following is the information I was telling you about.  You maybe aware
of some of the information, but I thought I'd include more rather than
less.  If you know of any of this information, please disregard.

The ABA SCOTUS page is available at
http://www.abanet.org/publiced/preview/home.html.  If you look at the bottom
right of the page the ABA provides links to briefs back to 2003-2004 term.
(Not a long time, but it is something).  Oyez is another source for briefs
in some cases.  If provided, they appear below the audio recordings of the
oral arguments.  Findlaw also has in Case and Code search at
http://www.findlaw.com/casecode/.

Out of curiousity, I ran a Google Scholar query for "privacy" against all
legal journals and opinions which returned the following result
http://scholar.google.com/scholar?as_q=Privacy&num=10&btnG=Search+Scholar&as_epq=&as_oq=&as_eq=&as_occt=any&as_sauthors=&as_publication=&as_ylo=&as_yhi=&as_sdt=2&as_sdts=5&hl=en
If you look to the fourth result or Roe v Wade, it appears that scholar
pulls back 6 versions of the Wade decision.  Here is the URL to the
resource/versions page for Roe v Wade,
http://scholar.google.com/scholar?cluster=12334123945835207673&hl=en&as_sdt=2002.
It appears that Google has in fact scanned and OCRed the case law
reporters.  The Roe v Wade results also cites supreme.justica.com,
bulk.resourse.org, and law.cornell.edu as other sources for the
opinion.    I also did a search for "privacy" in Colorado cases and it
appears that Google has scanned and OCRed the reports for state case law.
(An impressive effort if true!)

Hope this information is helpful.  Please let me know if I can be of help in
the future.

Thanks,
Brian


On Wed, Nov 25, 2009 at 8:36 AM, Sam Deskin <sam@openjurist.org> wrote:

>  Thank you for your time on the call today.  This is the email that I sent
> out about OpenJurist and the semantic web/linked data.
>
>
>
> Your guidance is appreciated.
>
>
>
> Sam Deskin
>
> OpenJurist.org
>
>
>
>
>
> *From:* public-egov-ig-request@w3.org [mailto:
> public-egov-ig-request@w3.org] *On Behalf Of *Sam Deskin
> *Sent:* Thursday, November 05, 2009 10:31 AM
> *To:* public-egov-ig@w3.org
> *Cc:* 'Sam Deskin'
> *Subject:* OpenJurist, semantic web/linked data
>
>
> Hello Participants in the eGovernment Interest Group, Glad to be a (new)
> member of the eGovernment Interest Group. I was invited because of the
> project that I am working on. I could use some guidance to learn best
> practices and to benefit from your experience.
>
> The project I am working on is called OpenJurist.org<http://openjurist.org/>.
> It is a website with 647,000+ US Supreme Court and Appellate Court cases
> that we gotten from resource.org.  We currently offer the cases for public
> consumption like several other websites. Our website is a source of legal
> information useful to attorneys/laypeople looking to understand a legal
> issue.  We are starting with case law and are working on getting more
> information organized as time goes by.
>
>
>
> Our next major initiative is to use semantic tags / linked data to organize
> the cases, making them accessible in new and different ways than ever
> before. Right now it is quite crude.  But we are just beginning.  We have a
> lot of work ahead of us cleaning up the semantic data.
>
>
>
> We have spent the past several months doing automated tagging of each case.
> We just finished our first run at this process and currently have 14,628,730
> tags for these cases; 2M+ unique tags.  To give you a sense of the scale we
> are working and the vastness of the data we are working with, within the
> cases we have identified discussion of:
>
> ·         900 different medical treatments for
>
> ·         3300 different medical conditions;
>
> ·         3000 different terms describing industries;
>
> ·         41,400 cities names;
>
> ·         504,000 company names; and
>
> ·         1.3M individual people's names.
>
>
>
> Now, the data requires A LOT of scrubbing. We have made headway on the easy
> ones: Continents, Countries, Presidents, and a few more. But the big ones
> need work. This could be a work in process for some time and will require
> help of a devoted volunteer army or paid staff to make it happen. Or, maybe,
> as researchers want to determine certain correlations, they will need/want
> to scrub the data to be able to make it useful for them - in the process,
> making it more useful for others.
>
>
>
>
> In the near future, we plan on making it possible for people to use and
> organize the data in simple ways. For example, a site about the Presidents
> of the United States could list all cases that involve each president
> during/after his tenure (by date). Soon after, we plan on making it easy for
> people to link to the cases (en mass) on our site or take the data off our
> site and apply it as they would like on their own site using a widget or an
> API.
>
>
>
> In the slightly more distant future we would like people to be able to
> manipulate the data against itself and in relation to other data. For
> example, *one day* people will be able to determine the following on the
> fly:
>
> a.       In the 1960's,
>
> b.      The American Civil Liberties Union brought cases to
>
> c.       Secure the release of inmates in overcrowded prisons,
>
> d.      And won those cases
>
> e.      In certain states,
>
> f.        How did these case affect crime in
>
>                                                                i.      those
> states?
>
>                                                              ii.      other
> states?
>
>
>
> We are pretty close to being able to allow people to organize the data, at
> least on our own site (the API will need to be built), but have LOTS of work
> to be able to do let people truly manipulate the data.  To give you an idea,
> we are working on extracting dates that cases were heard/decided right now,
> we have identified which cases the ACLU is *mentioned* in, we need to
> determine which cases are about prison overcrowding, whether those cases
> resulted in the court ordering a release of prisoners, we know which states
> are *mentioned* in each case (but not yet whether the case is specifically
> about that state), and have not yet incorporated any outside data. We have a
> lot of work to do to make this information truly useful in the way we
> envision and to make the data useful in ways we cannot imagine.
>
>
>
> Your guidance/ideas are appreciated.  Feel free to ask me questions and I
> will try my best to answer them.
>
>
>
> Sincerely,
>
>
>
> Sam Deskin
>
> OpenJurist.org
>
>
>
>
>



-- 
Brian Peltola Gryth
715 Logan street
Denver, CO 80203
303-748-5447
twitter.com/briangryth
Received on Wednesday, 25 November 2009 17:09:20 UTC