What is the safest way to deal with loads of incoming PDF files, some of which could potentially be malicious?

Question

As an investigative journalist I receive each day dozens of messages, many of which contain PDF documents. But I'm worried about some of the potentially malicious consequences of blindly opening them and getting my computer compromised. In the past, before I started working in investigative journalism, I was using virustotal.com to analyze all files (including PDFs) coming to my inbox, but that's not possible in this case as the files will be sent to them when they're meant to be confidential before release. And I heard that antivirus solutions are not 100% foolproof.

@CodesInChaos Oh found this article micahflee.com/2016/07/how-qubes-makes-handling-pdfs-way-safe‌r — Tom the journalist, Feb 14 at 11:39
As always, it's a compromise between usability and security. Qubes OS is the next best thing after an air gap. It lets you work efficiently (allowing copy-paste) and very securely. In theory it's possible to escape VM isolation, in practice it's extremely hard to escape Qubes VMs: they are designed for isolation, unlike most other VMs. — A. Hersean, Feb 14 at 13:31

6 revs, 2 users 91% · Accepted Answer · 2017-02-15 22:01:36Z

I think the safest option for you would be to use Qubes OS with its built in DisposableVMs functionality, and its “Convert to Trusted PDF” tool.

What is Qubes OS?

Qubes is an operating system where it's all based on virtual machines. You can think of it as if you had different isolated ‘computers’ inside yours. So that way you can compartmentalize your digital life into different domains, so that you can have a ‘computer’ where you only do work related stuff, another ‘computer’ that is offline and where you store your password database and your PGP keys, and another ‘computer’ that is specifically dedicated for untrusted browsing... The possibilities are countless, and the only limit is your RAM and basically how much different ‘computers’ can be loaded at once. To insure that all these ‘computers’ are properly isolated from each other, and that they can't break to your host (called ‘dom0’ for domain 0) and thereby control all of your machine, Qubes uses the Xen hypervisor, which is the same piece of software that is relied upon by many major hosting providers to isolate websites and services from each other such as Amazon EC2, IBM, Linode... Another cool thing is that each one of your ‘computers’ has a special color that is reflected in the windows' borders. So you can choose red for the untrusted ‘computer’, and blue for your work ‘computer’ (see for example picture below). Thus in practice it becomes really easy to see which domain you're working at. So let's say now that some nasty malware gets into your untrusted virtual machine, then it can't break and infect other virtual machines that may contain sensitive information unless it has an exploit that can use a vulnerability in Xen to break into dom0 (which is very rare), something that significantly raises the bar of security (before one would only need to deploy malware to your machine before controlling everything), and it will protect you from most attackers except the most resourced and sophisticated ones.

What are DisposableVMs?

The other answer mentioned that you can use a burner laptop. A Disposable Virtual Machine is kind of the same except that you're not bound by physical constraints: you have infinitely many disposable VMs at your wish. All it takes to create one is a click, and after you're done the virtual machine is destroyed. Pretty cool, huh? Qubes comes with a Thunderbird extension that lets you open file attachments in DisposableVMs, so that can be pretty useful for your needs.

enter image description here

(Credits: Micah Lee)

What's that “Convert to Trusted PDF” you were talking about?

Let's say you found an interesting document, and let's say that you had an offline virtual machine specifically dedicated for storing and opening documents. Of course, you can directly send that document to that VM, but there could still be a chance that this document is malicious and may try for instance to delete all of your files (a behavior that you wouldn't notice in the short-lived DisposableVM). But you can also convert it into what's called a ‘Trusted PDF’. You send the file to a different VM, then you open the file manager, navigate to the directory of the file, right-click and choose “Convert to Trusted PDF”, and then send the file back to the VM where you collect your documents. But what does it exactly do? The “Convert to Trusted PDF” tool creates a new DisposableVM, puts the file there, and then transform it via a parser (that runs in the DisposableVM) that basically takes the RGB value of each pixel and leaves anything else. It's a bit like opening the PDF in an isolated environment and then ‘screenshoting it’ if you will. The file obviously gets much bigger, if I recall it transformed when I tested a 10Mb PDF into a 400Mb one. You can get much more details on that in this blogpost by security researcher and Qubes OS creator Joanna Rutkowska.

Note that by default VMs can still access the internet; one of the uses of malicious PDFs is to expose the IP address of the reader. — Tgr, 20 hours ago
@Tgr I confirm. "By default, if a DisposableVM is created (by Open in DispVM or Run in DispVM) from within a VM that is not connected to the Tor gateway, the new DisposableVM may route its traffic over clearnet. This is because DisposableVMs inherit their NetVMs from the calling VM (or the calling VM's dispvm_netvm setting if different). The dispvm_netvm setting can be configured per VM by: dom0 → Qubes VM Manager → VM Settings → Advanced → NetVM for DispVM." This shouldn't be an issue by default if one configured their email VM on whonix-ws which routes all traffic through Tor. — user139336, 6 hours ago
@Tgr But again this is Security StackExchange not Qubes StackExchange so I wanted to make my answer more focused on the security side :) — user139336, 6 hours ago
E.g. for an investigative journalist collecting information in an authoritarian regime, that regime being able to track down their physical location by mailing them a specially crafted PDF is a pretty essential security vulnerability. I guess Tor protects you against that though. — Tgr, 5 hours ago
@Tgr Yeah, and as I said one can set the dispvm_netvm property so that the DisposableVM be offline with no network. This should be better than with Tor. — user139336, 4 hours ago

Matthew · Answer 2 · 2017-02-14 10:14:41Z

up vote 14 down vote

Safest would probably be a burner device. Grab a cheap laptop, and a mobile internet dongle, use it to download the documents, and manually copy across any contents to your main computer (literally retyping would be safest, if you're particularly worried). Since it's not on your network, it shouldn't be able to cause problems even if it got infected, and you'd be able to wipe it or just bin it if you have any particularly evil malware sent to you.

If you need actual contents from the files (e.g. embedded images), one option would be to install a PDF print driver on your burner device, and to print the incoming PDF files using it - this will generate PDF output, but, in theory, just the visual components. Printers don't tend to need script elements, hence they can be safely dropped. Bear in mind that some PDF printer drivers spot when you provide a PDF, and just pass it through unmodified - test before relying on it! Once you've got a clean PDF, email it back to yourself, and check with a virus scanner on your main machine before opening. Note that this doesn't completely eliminate the possibility of malware getting through, but should minimise the chances.

answered Feb 14 at 10:14

Matthew

16.5k55466

1

I'd also nuke the burner device regularly even if nothing evil has been noticed. If you're under targeted attack, you wouldn't probably notice it anyway. It also puts its user into mindset that nothing personal should be stored there. – eis Feb 14 at 11:14

2

What about transformation to image formats and then sending it to yourself ? That seems a lot more practical than just retyping it. – HopefullyHelpful Feb 14 at 12:59

1

Depends what you need to transfer - for a few quotes, typing is easy enough. For pages, yeah, OCR is the way to go – Matthew Feb 14 at 13:15

3

Rather than a pdf printer, you could use ghostscript (for example) to write an image file. Unlike an arbitary printer driver, gs will do as it's told. – Chris H Feb 14 at 13:15

1

@Matthew I've created pdfs that gs can't handle but the use of graphics was close to a pathological case. Adobe choked on that file as well. – Chris H Feb 14 at 13:41

| show 9 more comments

Taegost · Answer 3 · 2017-02-14 21:33:51Z

So, I try to stick with these concerns in the "land of reasonable". With every security issue there is a balance of secure v.s. safe. For example, you could buy a laptop, read one PDF loaded from the web mail side of your email provider, re type any content you need on a "main computer" then destroy the laptop starting all over again with a new laptop. That would be pretty secure. Also costly, and a giant pain.

So back to a "reasonable" approach.

First, use Linux and a up to date PDF reader. By doing so you have really reduced your exposure. There are not as many viruses written for Linux as there are for windows. That alone will protect you quite a bit. The viruses that do work on Linux are more complicated to implement. Again reducing your exposure.

Next use a Virtual machine that supports snap-shotting. The idea is that you setup your Linux OS inside a virtual machine host (like VirtualBox) get it all setup then, "Snapshot" the state.

You can then do all your "risky" work inside the virtual machine. Using isolation options, I don't know of any virus that can "escape" the virtual machine and get to the host machine (doesn't mean they're not out there, just means it's more rare, and more complicated for the attacker).

At the end of the day, or any time during the day when you think you have gotten a virus, then you "revert" the machine to the previous snapshot. All the changes and data that "happened" after your snapshot are undone, including any work, virii, etc.

During the day, you can open a PDF, scan it with ClamAV (or the like), copy and paste what you need, or what ever you need to do with the PDF files, so long as your Virtual Machine exists in isolation. That means that you don't give the virtual machine access to the host machine. You use something like email to transfer the files. Maybe FTP between the host and the virtual machine. Something, but not direct integration. Not dropbox either. Something where if you're going to transfer the file, then you're only going to transfer that one file after you're pretty sure it's safe. If you're using a Linux host and a Linux guest then scp is a great choice.

This gives you a "pretty secure", disposable environment, to check your questionable PDFs out, with the ability to "undo" damage that may happen, without having to really change much in your work flow.

Virtual machine hosts and guests can be almost any OS including Windows. Keep in mind that if you have a Linux guest and a Windows host the Linux virtual machine may not even be susceptible to a virus that is in the PDF that a Windows machine will be susceptible to. Scanning with an anti-virus scanner is important, no matter the OS combo in use.

My only concern is that an investigative journalist needs to worry not just about run-of-the-mill viruses (for which the bar is "just be a less attractive target than your neighbour"), but may be actively targeted. In that case, if the attacker knows he is using Linux, he will be sent PDFs targeting Linux vulnerabilities. I think that not using Acrobat is still worthwhile, but it doesn't buy quite as much as you might hope. — Martin Bonner, 2 days ago
Which is why I say use the VM AND Linux AND up to date reader. Each one is only a part of the solution and not the entire solution. — coteyr, 2 days ago
It was: "By doing so you have really reduced your exposure" that I was commenting on. My point is that for an ordinary user trying to browse porn safely, that is probably true; for an investigative journalist, less so. — Martin Bonner, 2 days ago

rackandboneman · Answer 4 · 2017-02-14 13:43:59Z

up vote 1 down vote

Converting all the PDFs to some more "passive" format - maybe TIFF or postscript - could be done in batch, in a restricted account either on the local machine or on some linux box/VM. An exploit/malware being carried along into a different file format is very unlikely.

Files that are purely malicious will not even render that way; any exploit targeted at popular PDF viewers probably will not work with scripted conversion tools (which will mostly be based on the ghostscript engine); and the restricted account will keep a successful exploit from doing much damage.

A normal user account on an up-to-date linux machine is very difficult to "break out of" - do make sure that this machine doesn't have unregulated internet access though, since network access is the hardest to control.

If disclosure of the contents of the valid PDFs would have dire consequences, make sure only one PDF at a time is accessible to the account running the interpreter at a given time (eg by copying the file into a staging location from yet another user account, running the interpreter via su/sudo (not sudo to root!), then taking the result file away. Rinse, repeat.

Oh, and: Keep the original files away from any (especially Windows) PCs that are set up to do previews of files in Explorer, in email clients or similar frontends!

edited Feb 14 at 13:43

answered Feb 14 at 13:35

rackandboneman

22115

3

Postscript isn't passive. – Ben Voigt Feb 14 at 15:30

If it was a postscript document directly received from an outside source - yes. Something written by your locally installed converter that rendered a pdf and wrote the result as a ps file is very unlikely to retain active malicious content. So, more passive as I said, not absolutely passive (I am well aware that postscript is actually a turing complete programming language). – rackandboneman Feb 14 at 17:56

1

Because arbitrary postscript can be embedded in a PDF, it seems unwise to rely on a PDF->PS converter outputting only "clean" newly created postscript, as it may include some of the postscript code present at the input. Even if it passes through a "Postscript Creator" virtual printer. – Ben Voigt Feb 14 at 19:18

To extend the comment by @Ben, if you're heading down this route, make sure you write (or thoroughly inspect) the converter before using it in this way. You might want to choose SVG instead (but beware that some SVG processors have JavaScript interpreters, so you might not be much better off). – Toby Speight 2 days ago

add a comment |

dotancohen · Answer 5 · 2017-02-16 10:50:19Z

Depending on your threat model, even the "burner device" or virtual machine approach might not be sufficient. If an attacker is looking to identify your location, or even if a spammer wants to validate that your email address is active, then having the PDF phone home after being opened will expose you. Crafty PDFs might even contain worms to infect other machines, though I've never seen that in the wild.

Thus, after downloading the PDF, you may need to disconnect the device from the network before opening it.

Henry's Cat · Answer 6 · 2017-02-16 16:52:35Z

up vote 1 down vote

Just get a Chromebook. Problem solved.

I believe this to be the most pleasant solution as you get a new, cheap and affordable toy that works better than your other computer.

Chrome OS considers every program to be 'hostile' so everything is totally sandboxed.

You can run Android apps on recent Chromebooks too, plus you get the PDF viewer built in so there is nothing to install.

You can cut and paste from your PDF files and into some Google docs thing to break stuff out to make it accessible on your 'proper computer'. (The one that can get virus and malware problems).

It is all about using the right tool for the job. Chromebooks are built on a modern OS and do not have to support legacy cruft from 1999 (as other operating systems do).

answered yesterday

Henry's Cat

1112

"Chromebooks are built on a modern OS" - Chromebooks are built on Linux. – Nathan Osman 20 hours ago

1

This answer looks like an advertising. It is unacceptable as the OS is locked and the user wants confidentiality. You can't be sure if the OS is sending the pdfs to somewhere. Qubes OS is probably the best solution. – Artyom 6 hours ago

add a comment |

HopefullyHelpful · Answer 7 · 2017-02-15 21:22:13Z

up vote 0 down vote

I would recommend a "isolated" device only for downloading and opening the pdfs. Ie. not connected to the rest of your network.

Then print it (paper can't transmit malware).

After that you could scan it, then you have a copy in image format. The printer should be isolated and connected only to the contaminated device aswell in this case. The scanner can be connected to the rest of your network.

If you want a faster workflow you could just transform them to images and then send them to yourself if you need to view them somewhere else, though you have to gurantee that the mail is not infected in some way. No links/images/javascript injected and the file format isn't executable or pdf.

Which means that the receiving end needs to view only text no html or javascript.

edited 2 days ago

answered Feb 14 at 13:22

HopefullyHelpful

6471512

I'm still worried about the possibility of the file contaminating my computer in the download -> copy to USB process... – Tom the journalist Feb 14 at 13:35

1

@Tomthejournalist: Where in this answer does "USB" even appear?! – user21820 Feb 14 at 13:39

It's implicitly there when you assume that the device to open the PDF's is isolated, cause the PDF needs to get there first. Either that or the image file output of the process. – HopefullyHelpful Feb 14 at 13:40

@user21820 how do you move a file to a device isolated from the rest of your network? I suppose you could use a floppy disk if you really wanted to. – Captain Man Feb 14 at 15:54

@CaptainMan: It clearly says "not connected to the rest of your network", not "not connected to the internet"... – user21820 2 days ago

| show 1 more comment

AntivirusExpert · Answer 8 · 2017-02-14 13:55:11Z

up vote -2 down vote

Open the files in a sandbox.

If you're infected, the infection should be contained and you can reset this virtual environment.

Edit:

A sandbox is a virtual environment which you can create and remove at will, like a virtual desktop where you can do anything and then reset it back to the initial stage.

Infected? Simply reset.

Avast antivirus (paid version) for example has a sandbox solution which allows you to right click on the file and open it in a sandbox.

There are tons of solutions like this, another example is Sandboxie

Edit 2:

Removed the MD5 suggestion, it's inferior to the sandbox solution.

edited Feb 14 at 13:55

answered Feb 14 at 10:20

AntivirusExpert

1887

I mentioned virustotal in the post and why it's not convenient for this type of work. Your second proposal is interesting, can you please add more details to it? – Tom the journalist Feb 14 at 10:24

Regarding VT - when you convert the file to a hash, it's a one way conversion and VT doesn't receive your files. It's not bulletproof from malware (because not all hashes are known to VT), but it's bulletproof from leaking. Regarding the sandbox solution: a sandbox is a virtual environment which you can create and remove at will, like a virtual desktop where you can do anything and then reset it back to the initial stage. Infected? Just reset. Avast antivirus (paid version) for example has a sandbox solution which allows you to right click on the file and open it in a sandbox. – AntivirusExpert Feb 14 at 10:29

7

But the PDF would be unique, it most probably wont be in virustotal database. Thanks for your clarifications regarding the sandbox, I will look into that. – Tom the journalist Feb 14 at 10:32

That's why I say it's not bulletproof, but then again - if someone is attempting to infect you with malware they might not go through the effort of creating a unique file for you. Changing the file name won't result in a different MD5 hash, only changing the file's content. – AntivirusExpert Feb 14 at 10:35

@AntivirusExpert but pdfs are perfectly capable of having content change randomly on the fly to defeat this sort of scanning (as has been done for years or even decades in .exe etc. malware). I've just tested and you can replace header data using something as trivial as sed; this would change the MD5sum. You would have to assume that a malware creator could and would do that. Not to mention that the source might make it unique in this threat model – Chris H Feb 14 at 13:13

| show 1 more comment

asked	3 days ago
viewed	10276 times
active	yesterday

current community

your communities

more stack exchange communities

What is the safest way to deal with loads of incoming PDF files, some of which could potentially be malicious?

8 Answers 8

What is Qubes OS?

What are DisposableVMs?

What's that “Convert to Trusted PDF” you were talking about?

Your Answer

Not the answer you're looking for? Browse other questions tagged malware virus antivirus antimalware pdf or ask your own question.

Linked

Hot Network Questions

current community

your communities

more stack exchange communities

What is the safest way to deal with loads of incoming PDF files, some of which could potentially be malicious?

8 Answers 8

What is Qubes OS?

What are DisposableVMs?

What's that “Convert to Trusted PDF” you were talking about?

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged malware virus antivirus antimalware pdf or ask your own question.

Linked

Related

Hot Network Questions