Word about RECAP continues to spread through the legal profession. The latest issue of Minnesota Lawyer covers the case of a Minneapolis lawyer who was sanctioned for inadvertently including the Social Security numbers and dates of birth of dozens of individuals in court documents, when the rules of civil procedure mandate that only the last four digits of a Social Security number and the year of birth be disclosed in documents filed with the court.
The article then mentions RECAP as one reason for attorneys to be careful about redaction when they’re filing court documents:
Friedemann said that concern over the publication of sensitive information has been elevated by recent Web programs like RECAP, which has made it easier to access public court filings.
RECAP automatically uploads all PACER documents a user is viewing onto an archive maintained by the non-profit group Internet Archive. When the next RECAP user attempts to view a PACER document that has already been archived, RECAP automatically uploads the copy to prevent that user from paying for those materials. The system allows users of PACER to slowly create a secondary archive of these public documents that can be accessed for free.
Friedemann explained that prior to programs like RECAP mistakes in documents published on PACER could be corrected. “Now they can’t be taken back,” she said.
We’d like to make a couple of important clarifications. First, RECAP does scan documents for Social Security numbers before uploading them, so it’s unlikely that the document in question would have appeared on the Internet Archive even if a RECAP user had downloaded it. Second, it is possible to “take back” documents that have been uploaded to the archive. If you spot a document in our archive that shouldn’t be there, please let us know so we can take care of the problem.
With that said, we agree with the general point of the article. We do our best to suppress documents with sensitive information in them, but we have limited manpower and can only do so much with automated methods. So attorneys are the first and most important line of defense for their clients’ privacy. We urge attorneys to take seriously their obligation to redact documents before submitting them to the courts. And we applaud the judiciary for stepping up enforcement of its redaction rules.
A group of academics has been convened by Public.Resource.Org in order to define recommendations for a proposed federal government site: law.gov. The group will study the feasibility of creating the equivalent of a data.gov for legal materials. The process will define a concrete path forward forward for the government. Specifically, it will deliver:
- Detailed technical specifications for markup, authentication, bulk access, and other aspects of a distributed registry.
- A bill of lading defining which materials should be made available on the system.
- A detailed business plan and budget for the organization in the government running the new system.
- Sample enabling legislation.
- An economic impact statement detailing the effect on federal spending and economic activity.
- Procedures for auditing materials on the system to ensure authenticity.
Ed Felten, Executive Director of Princeton’s Center for Information Technology Policy (which also produced RECAP), is one of the co-conveners.
Last week RECAP’s Steve Schultze and Harlan Yu visited Yale Law School to give a talk sponsored by Yale’s Information Society Project. Yale librarian Jason Eiseman produced a short interview with Steve that he describes as “a little Blair Witch.” Steve talks about the origins of RECAP, discusses some of the current challenge faced by RECAP, and talks briefly about RECAP’s newest sister project, FedThread.
Monday’s Los Angeles Times has a great article talking about the growing movement for government transparency. It focuses on three of our favorite transparency advocates: Ellen Miller, co-founder of the Sunlight Foundation; Josh Tauberer, a regular at CITP conferences, and Carl Malamud, whose non-profit, public.resource.org, is a key RECAP partner.
The article discusses RECAP in some detail, describing it as “a sort of digital Kumbaya.” We’re always happy to have news outlets help spread the word about RECAP, and we’re also glad that the article makes clear that RECAP is part of a broader movement for web-enabled government transparency. Folks like Carl, Josh, and Ellen have been pushing the envelope on these issues longer than we have.
One minor correction that’s worth noting: the article refers to “the courts’ PACER revenue of $10 million a year.” In reality, the expected revenue for 2009 is $87 million. This and many other details about PACER’s budget can be found in RECAP co-author Steve Schultze’s recently-released paper on the subject.
RECAP has been a subject of discussion in other venues as well. Ars Technica discussed the courts’ reaction to RECAP in its story about the PACER service offering MP3s of court proceedings. And if you happen to be a subscriber to Massachusetts Lawyers Weekly or New Jersey Law Journal, you can see their write-ups of RECAP here and here, respectively.
RECAP co-author Steve Schultze is having a busy month. Last week, he released a new paper called “Electronic Public Access Fees and the United States Federal Courts’ Budget: An Overview.” It provides a comprehensive overview of PACER’s budget. It explains how the courts decide how much to charge for PACER and how the money is spent. It’s an invaluable roadmap for anyone interested in understanding the debate over PACER’s future.
Today, Steve is at the Gov 2.0 Expo giving a talk about RECAP. If you’re at the expo as well, we hope you’re planning to go to the talk, which starts at 10:50. If not, you can see a pre-recorded version of his talk here:
Finally, next week Steve will start his new job as associate director of the Center for Information Technology Policy at Princeton, which is the home of RECAP and its other co-authors. The rest of the RECAP team is excited that we’ll soon have Steve as a colleague as well as a co-author.
Last week we did a round-up of leading technology-focused sites that have covered RECAP. Now, it seems that news of RECAP is spreading beyond the “tech blogosphere,” as more mainstream publications have begun writing about our software. Foreign Policy‘s Evgeny Morozov covered RECAP, calling it “smart and subversive.” On Wednesday NextGov, a National Journal publication widely read within the government IT community, ran a thorough write-up of RECAP by Aliya Sternstein. It included some good background on how RECAP fits into the larger debate about judicial transparency.
Finally, Katherine Mangu-Ward has penned a piece for the Wall Street Journal about RECAP. Katherine calls RECAP “a sleek little add-on” with “a stylish and subversive touch.” She writes:
With the possible exception of the ever-leaky CIA, no aspect of government remains more locked down than the secretive, hierarchical judicial branch. Digital records of court filings, briefs and transcripts sit behind paywalls like Lexis and Westlaw. Legal codes and judicial documents aren’t copyrighted, but governments often cut exclusive distribution deals, rendering other access methods a bit legally questionable. Supreme Court decisions are easy to get, but the briefs and decisions of lower courts can be hard to come by.
Last week, a team from Princeton’s Center for Information Technology Policy took a pot shot at legal secrecy, setting in motion a scheme to filch protected judicial records and make them available for free online. One of the developers, Harvard’s Stephen Schultze, says he went digging for some First Amendment precedent last fall and was shocked by the outdated technology he found. Knowing that “there’s a certain geek cache to openness projects these days,” Mr. Schultze and Princeton computer science grad students Tim Lee and Harlan Yu went straight to work.
Now, an important correction. The statement raises the concern that RECAP could compromise sealed or private documents that attorneys access via the CM/ECF, the system attorneys use for electronic filing and retrieval of documents in pending cases. Protecting privacy is our top priority, and we specifically designed RECAP to safeguard the privacy of CM/ECF documents. As we describe
in our privacy FAQ, RECAP is carefully designed not to upload documents from the CM/ECF system. When a user logs into the CM/ECF system, a cookie is set on the user’s browser that’s different from the cookie that’s set when a user is logged into the public PACER system. RECAP monitors for this cookie and automatically deactivates itself whenever the user is logged into CM/ECF. We tested this thoroughly, with some CM/ECF users, before we released the public beta.
We’re confident that RECAP maintains the security model set up by the courts, and that it will never upload documents while a user is logged into CM/ECF. The code is open source, so anyone with concerns is welcome to inspect it for themselves. We’d like to work with the judiciary in the coming weeks to ensure they understand how RECAP protects privacy and security, and to incorporate any further enhancements they might suggest. In the meantime, users can continue using RECAP with the knowledge that it’s designed with privacy as our top priority.
Update: A final reason users should be comfortable with using RECAP is that the extension’s operation is extremely transparent. The little “R” icon in the lower-right-hand corner of every browser window turns blue when RECAP is enabled (which should only happen when you’re logged into PACER) and grey when it’s disabled (which should happen when you’re logged into CM/ECF). We don’t think you’ll ever see a blue icon when you’re browsing CM/ECF, but if you do, you should immediately disable recap and let us know about it so we can investigate the problem. In addition, RECAP notifies you about every document it uploads (unless you choose to turn this feature off). Again, you should never see an upload notification while you’re on an CM/ECF page, but if you do you can contact us and we’ll delete that document from our database. So you don’t have to take our word for it when we say RECAP won’t upload CM/ECF documents, you can monitor what it’s doing and verify for yourself.
One way to promote broader public access to the public record is to use RECAP to share documents with others. A complimentary approach is to tell the U.S. Courts directly what should change. Recently, Stanford Law Librarian Erika Wayne launched a petition to “Improve PACER,” which suggested several changes:
- Provide document authentication
As the raw materials of adjudication become digitized and disseminated online, we must have some means of knowing that they are genuine. This is a dilemma that RECAP faces in helping users to trust the documents they download.
- Lower costs, improve interfaces
Our ultimate goal is to remove PACER’s paywall entirely and free the database up for third parties to build interfaces. But in the meantime, it would certainly benefit the public to gain less expensive access to the law through more useful interfaces. The petition recommends that the U.S. Courts reduce the transaction costs of access, and make that access more usable.
- Free access from Federal Depository Libraries
The Federal Depository Library Program has served the American public for decades by providing member libraries with collections of federal records at no cost. There are about 1,250 libraries nationwide. The U.S. Courts experimented with free access to PACER from 16 of these libraries recently, but in 2008 the program was discontinued. The petition calls for universal implementation of free access in these libraries.
Erika will deliver the petition to the Administrative Office of the Courts in the near future. If you support these goals, consider signing the petition.
Of all the questions we’ve received, probably the most common is whether it will be possible to access the documents in our archive without using PACER at all. The answer is yes, but at the moment we don’t offer any good browsing or searching tools.
The big reason has to do with privacy. One of our top priorities in developing RECAP was making sure we don’t inadvertently compromise the privacy of individuals who are the subject of court records. A lot of sensitive personal information is revealed in the course of federal court cases. A variety of private parties might be interested in using the information contained in these records for illicit purposes such as identity theft, stalking, and witness intimidation. We wanted to make sure we weren’t inadvertently facilitating those types of activities.
In theory, the courts have redaction rules designed to deal with these problems. Judges can order particularly sensitive documents to be sealed, and the rest of the documents are supposed to be redacted to prevent inadvertent disclosure of private information. Unfortunately, this process is far from perfect. Private information does sometimes wind up in the public version of court documents.
When court records were kept entirely on paper, the problem was mitigated by a kind of “security by obscurity”: documents might have officially been public, but accessing them was expensive and cumbersome, so in practice they were rarely accessed by the “bad guys.” PACER represented a dramatic reduction of the costs of accessing court documents. This facilitated many beneficial uses of these documents, but it also made some illegitimate uses easier. As we move toward a free public access model, both the benefits and the challenges will grow.
It might be argued that this is an argument against making the documents free at all, but that is not our view. Remember that private data brokers are already harvesting PACER documents and building full-text search engines; if you’re willing to spend some money, you can already get whatever privacy-compromising information is in PACER. So better privacy protections are not a luxury; we need them whether or not we move to an open access regime. And we think open access comes with an important advantage: it opens the door to experiments in crowdsourced privacy auditing.
To minimize the risk that we would inadvertently compromise people’s privacy, we deliberately set modest goals for the initial version of RECAP. RECAP is built around the existing PACER interface, and is designed to be used by existing PACER users. We asked the Internet Archive to disable search engine indexing so that it wouldn’t be too easy to find whatever private information is available. We recognize that this leaves a lot of room for improvement, but we think it was necessary to protect privacy in the short run.
At the moment, there’s no officially-supported mechanism for browsing RECAP repository, but you can directly link to individual documents and dockets. To see all the files available for a case, just strip the filename from the end of any document URL for that case, giving you a URL like http://www.archive.org/download/gov.uscourts.dcd.118919/ (dcd is PACER’s code for the DC District, 118919 is PACER’s number for the case). One of the available files will be a docket.html file. There will also be a docket.xml file, which might be more useful for automated parsing—stay tuned for details about our XML format.
Obviously this is clumsy, and improving it is on our to-do list, but we’re a small team and it may be a while before we have time to do them. We’d love to hear from third parties interested in building better interfaces to our repository. As some of us have written before, one of the great advantages of open access is the fact that there can be more than one interface to the same data. If you’d like to take a crack at building a user-friendly but privacy-preserving interface to the repository, please get in touch.
We’ve been getting a ton of helpful feedback from users over the weekend. We’re grateful for all the supportive emails, comments, and tweets we’ve received. We’re also grateful for the bug reports and feature requests we’ve gotten. We need this kind of feedback to make RECAP better.
Most of the questions we’ve received are are now answered by the Frequently Asked Questions section of our about page. Stay tuned for some upcoming blog posts where we’ll address some of these questions in more detail. But first, we wanted to highlight some more of the commentary that RECAP’s release has generated.
The great part about this is that because the Archive is providing the server space for free, every RECAP user is saving the court system work. Each time you download through RECAP, you avoid having to go through PACER’s servers at all. So yes, RECAP will mean a decrease in PACER’s revenues, but it also means a decrease in the things those revenues need to pay for. It’s an all-around good thing. It saves attorneys, researchers, and citizens money. It saves the government computer resources. And it makes the law just a little bit more free and accessible.
We couldn’t have put it better ourselves.
Ryan Singel of Wired calls RECAP “a pretty good hack,” and urges the judiciary to drop its paywall. The Lawyerist blog says that RECAP is a “brilliantly-conceived tool to liberate public records from PACER.”
RECAP seems to be especially popular among law librarians. Erika Wayne, a law librarian at Stanford University, writes of RECAP: “Be impressed. Very impressed.” We also got a favorable write-up from the University of Wisconsin law library and a mention from the Georgetown law library.
RECAP is also popular among DC-area think tanks. Heather West at the Center for Democracy and Technology calls RECAP “exactly the kind of project that we need” to promote judicial transparency. Jerry Brito of the Mercatus Center at George Mason University, an early advocate of online transparency, calls RECAP “ingenious.” And we got a mention from Jim Harper of the Cato Institute.
Finally, we were particularly happy to get coverage from the American Bar Association’s ABA Journal blog. Practicing lawyers are the heaviest users of PACER, so it’s extremely helpful to have RECAP covered by influential legal publications.