Accessing the RECAP Repository without PACER
Of all the questions we’ve received, probably the most common is whether it will be possible to access the documents in our archive without using PACER at all. The answer is yes, but at the moment we don’t offer any good browsing or searching tools.
The big reason has to do with privacy. One of our top priorities in developing RECAP was making sure we don’t inadvertently compromise the privacy of individuals who are the subject of court records. A lot of sensitive personal information is revealed in the course of federal court cases. A variety of private parties might be interested in using the information contained in these records for illicit purposes such as identity theft, stalking, and witness intimidation. We wanted to make sure we weren’t inadvertently facilitating those types of activities.
In theory, the courts have redaction rules designed to deal with these problems. Judges can order particularly sensitive documents to be sealed, and the rest of the documents are supposed to be redacted to prevent inadvertent disclosure of private information. Unfortunately, this process is far from perfect. Private information does sometimes wind up in the public version of court documents.
When court records were kept entirely on paper, the problem was mitigated by a kind of “security by obscurity”: documents might have officially been public, but accessing them was expensive and cumbersome, so in practice they were rarely accessed by the “bad guys.” PACER represented a dramatic reduction of the costs of accessing court documents. This facilitated many beneficial uses of these documents, but it also made some illegitimate uses easier. As we move toward a free public access model, both the benefits and the challenges will grow.
It might be argued that this is an argument against making the documents free at all, but that is not our view. Remember that private data brokers are already harvesting PACER documents and building full-text search engines; if you’re willing to spend some money, you can already get whatever privacy-compromising information is in PACER. So better privacy protections are not a luxury; we need them whether or not we move to an open access regime. And we think open access comes with an important advantage: it opens the door to experiments in crowdsourced privacy auditing.
To minimize the risk that we would inadvertently compromise people’s privacy, we deliberately set modest goals for the initial version of RECAP. RECAP is built around the existing PACER interface, and is designed to be used by existing PACER users. We asked the Internet Archive to disable search engine indexing so that it wouldn’t be too easy to find whatever private information is available. We recognize that this leaves a lot of room for improvement, but we think it was necessary to protect privacy in the short run.
At the moment, there’s no officially-supported mechanism for browsing RECAP repository, but you can directly link to individual documents and dockets. To see all the files available for a case, just strip the filename from the end of any document URL for that case, giving you a URL like http://www.archive.org/download/gov.uscourts.dcd.118919/ (dcd is PACER’s code for the DC District, 118919 is PACER’s number for the case). One of the available files will be a docket.html file. There will also be a docket.xml file, which might be more useful for automated parsing—stay tuned for details about our XML format.
Obviously this is clumsy, and improving it is on our to-do list, but we’re a small team and it may be a while before we have time to do them. We’d love to hear from third parties interested in building better interfaces to our repository. As some of us have written before, one of the great advantages of open access is the fact that there can be more than one interface to the same data. If you’d like to take a crack at building a user-friendly but privacy-preserving interface to the repository, please