Saving a Protected PDF from a Webpage

This is mostly a note for my own use. I was recently sent a link to a special website containing documents related to the bankruptcy proceedings of a certain company. I thought “this document seems like something I should keep a copy of” and of course logging in to said website is a pain. The creators of the website had used a heavily disabled variant of the pdf.js application – a library created by Mozilla in order to prevent anyone “leaking” the documents. I wasn’t planning on (and won’t) leak said documents, but the ability to generally dump data from a running Javascript application is useful.

In Chrome, hit F12 for the developer console.

Look for some indication that pdf.js is in use. This could take the form of a reference to pdf.js in the webpage itself, one of the resources loaded or something in the network console.

Go to the Memory tab and take a heap snapshot.

In the resulting profile, change the perspective to Containment.

The Containment view provides a hierarchical space (or perhaps reference) view of the application. You’re looking for something big that isn’t “native” or something pdf related.

Chrome makes it easy to store this object as a global variable. Right click on PDFViewerApplication and select Store as Global Variable. It’ll be stored as something like temp1.

You can now explore the object… there seems to be a cool method called getData(), which returns a promise.

Inside the Promise, the PromiseResult has the data of the PDF. Store the data as another global variable. Paste the following code in the console to create a function to save the data as a file:

var downloadBlob, downloadURL;

downloadBlob = function(data, fileName, mimeType) {
  var blob, url;
  blob = new Blob([data], {
    type: mimeType
  });
  url = window.URL.createObjectURL(blob);
  downloadURL(url, fileName);
  setTimeout(function() {
    return window.URL.revokeObjectURL(url);
  }, 1000);
};

downloadURL = function(data, fileName) {
  var a;
  a = document.createElement('a');
  a.href = data;
  a.download = fileName;
  document.body.appendChild(a);
  a.style = 'display: none';
  a.click();
  a.remove();
};

Finally, enter downloadBlob(temp2, 'some-file.pdf', 'application/octet-stream'); – assuming the data you stored as a global variable was called temp2.

And that’s it, you’ve got that copy-protected PDF in its original glory.

aquarat's blog

Got a domain, better put something on it.

Saving a Protected PDF from a Webpage