Skip to content

Accessing the RECAP Repository without PACER

2009 August 18
by recapthelaw

Of all the questions we’ve received, probably the most common is whether it will be possible to access the documents in our archive without using PACER at all. The answer is yes, but at the moment we don’t offer any good browsing or searching tools.

The big reason has to do with privacy. One of our top priorities in developing RECAP was making sure we don’t inadvertently compromise the privacy of individuals who are the subject of court records. A lot of sensitive personal information is revealed in the course of federal court cases. A variety of private parties might be interested in using the information contained in these records for illicit purposes such as identity theft, stalking, and witness intimidation. We wanted to make sure we weren’t inadvertently facilitating those types of activities.

In theory, the courts have redaction rules designed to deal with these problems. Judges can order particularly sensitive documents to be sealed, and the rest of the documents are supposed to be redacted to prevent inadvertent disclosure of private information. Unfortunately, this process is far from perfect. Private information does sometimes wind up in the public version of court documents.

When court records were kept entirely on paper, the problem was mitigated by a kind of “security by obscurity”: documents might have officially been public, but accessing them was expensive and cumbersome, so in practice they were rarely accessed by the “bad guys.” PACER represented a dramatic reduction of the costs of accessing court documents. This facilitated many beneficial uses of these documents, but it also made some illegitimate uses easier. As we move toward a free public access model, both the benefits and the challenges will grow.

It might be argued that this is an argument against making the documents free at all, but that is not our view. Remember that private data brokers are already harvesting PACER documents and building full-text search engines; if you’re willing to spend some money, you can already get whatever privacy-compromising information is in PACER. So better privacy protections are not a luxury; we need them whether or not we move to an open access regime. And we think open access comes with an important advantage: it opens the door to experiments in crowdsourced privacy auditing.

To minimize the risk that we would inadvertently compromise people’s privacy, we deliberately set modest goals for the initial version of RECAP. RECAP is built around the existing PACER interface, and is designed to be used by existing PACER users. We asked the Internet Archive to disable search engine indexing so that it wouldn’t be too easy to find whatever private information is available. We recognize that this leaves a lot of room for improvement, but we think it was necessary to protect privacy in the short run.

At the moment, there’s no officially-supported mechanism for browsing RECAP repository, but you can directly link to individual documents and dockets. To see all the files available for a case, just strip the filename from the end of any document URL for that case, giving you a URL like http://www.archive.org/download/gov.uscourts.dcd.118919/ (dcd is PACER’s code for the DC District, 118919 is PACER’s number for the case). One of the available files will be a docket.html file. There will also be a docket.xml file, which might be more useful for automated parsing—stay tuned for details about our XML format.

Obviously this is clumsy, and improving it is on our to-do list, but we’re a small team and it may be a while before we have time to do them. We’d love to hear from third parties interested in building better interfaces to our repository. As some of us have written before, one of the great advantages of open access is the fact that there can be more than one interface to the same data. If you’d like to take a crack at building a user-friendly but privacy-preserving interface to the repository, please get in touch.

4 Responses leave one →
  1. Tom Sobran permalink
    October 18, 2009

    I would suggest a written “How to use RECAP” page with instructions on the procedure to install the Fire Fox browser, attaching RECAP and then the search of the RECAP archives.

  2. March 18, 2010

    One thing that could easily be done is to OCR (if needed) the incoming documents and search for any patterns like SSN’s (###-##-####) and block them out. It wouldn’t be hard, and I already have the tools necessary to do that. Let me know if you want to know more. I would think this would be a great step forward in automating the process and allowing the public to finally access everything they need.

  3. Steve Schultze permalink
    March 18, 2010

    @Brian Wiles: We already do some of this, but if you have suggestions feel free to drop us a note at info@recapthelaw.org

  4. Alex permalink
    August 14, 2010

    Now that your made a search interface, this page is out of date.

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS