The Mozilla Foundation released version 3.6 of Firefox today, and we’re proud to release the corresponding version of the RECAP extension, beta version 0.6. In addition to Firefox 3.6 compatibility, we’ve also thrown in a new feature suggested by our users: the option to save documents using filenames that we describe as “lawyer style” in contrast to the “Internet Archive style” we’ve traditionally used. For example, rather than saving a document as “gov.uscourts.cand.204881.46.0.pdf,” you can now configure the extension to store a document as “N.D.Cal._3-08-cv-03251_46_0.pdf.” Those who prefer the traditional filenames are free to continue using those as well.
We’ve also improved our docket-parsing code, allowing us to extract more metadata from court dockets. New fields we’re now scraping include “Assigned to”, “Referred to ” , “Cause”, “Nature of Suit”, “Jury Demand”, “Jurisdiction”, and “Demand.” We also scrape information about parties, including names, contact information, and attorneys. You can see a good example here (to choose a case at random).
If you’re an existing Firefox user, Firefox periodically checks for updates to extensions and should automatically fetch the new version of the RECAP extension. Or you can force it to check immediately by clicking Tools->Extensions->Find Updates (or, depending on your Firefox version, Tools->Add-Ons->Find Updates). As always, please if you find any bugs.
The website of the Columbia Science and Technology Law Review has an excellent write-up of RECAP by Rajiv Batra. His conclusion:
As part of a trend toward opening access to American common law, RECAP’s place at the heart or the periphery of the movement remains to be seen. Like any crowdsourcing application, RECAP’s usefulness increases as more people use it. Yet PACER’s prime users are large, bill-paying law firms, which tend to be wary about adopting new technology and have little incentive to contribute documents they paid for to a free database.
“Success” for RECAP may not be mainstream adoption, however. Merely by creating the working plugin and calling attention to the problem of restricted access to court documents, CITP has advanced the cause of reforming and opening up access to PACER. That alone is “Turning PACER around.”
One point this misses is that using RECAP can directly reduce firms’ PACER fees. It’s true, of course, that most firms pass these costs along to their clients. However, in today’s economic climate, clients are increasingly pressing their law firms for cost savings. Adopting RECAP is a painless way for firms to demonstrate cost-consciousness. And the cost savings from RECAP adoption will only get bigger as RECAP’s user base continues to grow. So while we think judicial transparency is reason enough to use RECAP, installing RECAP is good for every firm’s bottom line.
In any event, Batra has written a great piece, and we encourage you to check it out.
We’re excited to see Google has unveiled a dramatic expansion of Google Scholar to include Supreme Court decisions going back to the 18th century, lower federal court decisions since the 1920s, and state Supreme Court and appellate decisions going back to the 1950s. They’ve done an impressive job with automated parsing of legal citations, transforming them into hyperlinks and allowing Google to do automated analysis of case similarity.
This type of project was precisely what we had in mind when some of us wrote “Government Data and the Invisible Hand” last year. The judiciary may be the foundation of a free society, but it’s not especially good at building websites or search engines. By making public records easily available for re-publications by third parties, the judiciary (and the other branches of government) can enable private parties to dramatically expand public access to public information.
In this case, the state and federal courts haven’t made it easy to download bulk data, so Google had to get the information from third parties. Google is a big company with significant resources at its disposal. But in an ideal world, it wouldn’t take the resources of a large company to get access to this kind of data. Of course, this is precisely the vision behind RECAP. We hope to build a free, public, and comprehensive repository of federal judicial records so that large companies like Google, small start-ups, and even non-profit organizations can get access to the data and build tools to do make these records more accessible and useful.
RECAP’s database is more limited than Google’s in some ways; we only store federal district court cases going back about 10 years. But it’s much more extensive in other respects; we have much more than just the final opinion in a case. We’d love to have third parties such as Google incorporate the data in RECAP into a tool like Google Scholar. But we’d be even happier if the judiciary itself took the lead, by freeing access to PACER and enabling bulk downloads. Google’s impressive new legal search tools show just how much value private parties can add when they build on public data.
Last week, we got our first major media coverage from across the pond, as the Guardian gave us a generous write-up. They call RECAP “an ingenious twist on peer to peer networking” and write that “since the system launched in August, legal circles have been buzzing with support for the idea.”
Meanwhile, RECAP continues to generate interest from the legal profession. Earlier this month, RECAP’s own Tim Lee spoke to a group of New Jersey lawyers about how the software can save their clients money while expanding access to the public domain. And Arizona Attorney magazine has an in-depth article about RECAP and the debate over public access. They write that “there appears to be nothing illegal about the use of RECAP by those who are paying PACER users” (we agree). And they conclude that we’ve “carefully thought through the ethical implications and goals of the program.” We like to think so. The December issue of Virginia Lawyer magazine profiles RECAP, describing in detail the efforts so far to liberate PACER documents.
Word about RECAP continues to spread through the legal profession. The latest issue of Minnesota Lawyer covers the case of a Minneapolis lawyer who was sanctioned for inadvertently including the Social Security numbers and dates of birth of dozens of individuals in court documents, when the rules of civil procedure mandate that only the last four digits of a Social Security number and the year of birth be disclosed in documents filed with the court.
The article then mentions RECAP as one reason for attorneys to be careful about redaction when they’re filing court documents:
Friedemann said that concern over the publication of sensitive information has been elevated by recent Web programs like RECAP, which has made it easier to access public court filings.
RECAP automatically uploads all PACER documents a user is viewing onto an archive maintained by the non-profit group Internet Archive. When the next RECAP user attempts to view a PACER document that has already been archived, RECAP automatically uploads the copy to prevent that user from paying for those materials. The system allows users of PACER to slowly create a secondary archive of these public documents that can be accessed for free.
Friedemann explained that prior to programs like RECAP mistakes in documents published on PACER could be corrected. “Now they can’t be taken back,” she said.
We’d like to make a couple of important clarifications. First, RECAP does scan documents for Social Security numbers before uploading them, so it’s unlikely that the document in question would have appeared on the Internet Archive even if a RECAP user had downloaded it. Second, it is possible to “take back” documents that have been uploaded to the archive. If you spot a document in our archive that shouldn’t be there, please so we can take care of the problem.
With that said, we agree with the general point of the article. We do our best to suppress documents with sensitive information in them, but we have limited manpower and can only do so much with automated methods. So attorneys are the first and most important line of defense for their clients’ privacy. We urge attorneys to take seriously their obligation to redact documents before submitting them to the courts. And we applaud the judiciary for stepping up enforcement of its redaction rules.
A group of academics has been convened by Public.Resource.Org in order to define recommendations for a proposed federal government site: law.gov. The group will study the feasibility of creating the equivalent of a data.gov for legal materials. The process will define a concrete path forward forward for the government. Specifically, it will deliver:
- Detailed technical specifications for markup, authentication, bulk access, and other aspects of a distributed registry.
- A bill of lading defining which materials should be made available on the system.
- A detailed business plan and budget for the organization in the government running the new system.
- Sample enabling legislation.
- An economic impact statement detailing the effect on federal spending and economic activity.
- Procedures for auditing materials on the system to ensure authenticity.
Ed Felten, Executive Director of Princeton’s Center for Information Technology Policy (which also produced RECAP), is one of the co-conveners.
Last week RECAP’s Steve Schultze and Harlan Yu visited Yale Law School to give a talk sponsored by Yale’s Information Society Project. Yale librarian Jason Eiseman produced a short interview with Steve that he describes as “a little Blair Witch.” Steve talks about the origins of RECAP, discusses some of the current challenge faced by RECAP, and talks briefly about RECAP’s newest sister project, FedThread.
Monday’s Los Angeles Times has a great article talking about the growing movement for government transparency. It focuses on three of our favorite transparency advocates: Ellen Miller, co-founder of the Sunlight Foundation; Josh Tauberer, a regular at CITP conferences, and Carl Malamud, whose non-profit, public.resource.org, is a key RECAP partner.
The article discusses RECAP in some detail, describing it as “a sort of digital Kumbaya.” We’re always happy to have news outlets help spread the word about RECAP, and we’re also glad that the article makes clear that RECAP is part of a broader movement for web-enabled government transparency. Folks like Carl, Josh, and Ellen have been pushing the envelope on these issues longer than we have.
One minor correction that’s worth noting: the article refers to “the courts’ PACER revenue of $10 million a year.” In reality, the expected revenue for 2009 is $87 million. This and many other details about PACER’s budget can be found in RECAP co-author Steve Schultze’s recently-released paper on the subject.
RECAP has been a subject of discussion in other venues as well. Ars Technica discussed the courts’ reaction to RECAP in its story about the PACER service offering MP3s of court proceedings. And if you happen to be a subscriber to Massachusetts Lawyers Weekly or New Jersey Law Journal, you can see their write-ups of RECAP here and here, respectively.
RECAP co-author Steve Schultze is having a busy month. Last week, he released a new paper called “Electronic Public Access Fees and the United States Federal Courts’ Budget: An Overview.” It provides a comprehensive overview of PACER’s budget. It explains how the courts decide how much to charge for PACER and how the money is spent. It’s an invaluable roadmap for anyone interested in understanding the debate over PACER’s future.
Today, Steve is at the Gov 2.0 Expo giving a talk about RECAP. If you’re at the expo as well, we hope you’re planning to go to the talk, which starts at 10:50. If not, you can see a pre-recorded version of his talk here:
Finally, next week Steve will start his new job as associate director of the Center for Information Technology Policy at Princeton, which is the home of RECAP and its other co-authors. The rest of the RECAP team is excited that we’ll soon have Steve as a colleague as well as a co-author.
Last week we did a round-up of leading technology-focused sites that have covered RECAP. Now, it seems that news of RECAP is spreading beyond the “tech blogosphere,” as more mainstream publications have begun writing about our software. Foreign Policy‘s Evgeny Morozov covered RECAP, calling it “smart and subversive.” On Wednesday NextGov, a National Journal publication widely read within the government IT community, ran a thorough write-up of RECAP by Aliya Sternstein. It included some good background on how RECAP fits into the larger debate about judicial transparency.
Finally, Katherine Mangu-Ward has penned a piece for the Wall Street Journal about RECAP. Katherine calls RECAP “a sleek little add-on” with “a stylish and subversive touch.” She writes:
With the possible exception of the ever-leaky CIA, no aspect of government remains more locked down than the secretive, hierarchical judicial branch. Digital records of court filings, briefs and transcripts sit behind paywalls like Lexis and Westlaw. Legal codes and judicial documents aren’t copyrighted, but governments often cut exclusive distribution deals, rendering other access methods a bit legally questionable. Supreme Court decisions are easy to get, but the briefs and decisions of lower courts can be hard to come by.
Last week, a team from Princeton’s Center for Information Technology Policy took a pot shot at legal secrecy, setting in motion a scheme to filch protected judicial records and make them available for free online. One of the developers, Harvard’s Stephen Schultze, says he went digging for some First Amendment precedent last fall and was shocked by the outdated technology he found. Knowing that “there’s a certain geek cache to openness projects these days,” Mr. Schultze and Princeton computer science grad students Tim Lee and Harlan Yu went straight to work.