| Did you know...? LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net. |
Creating a sandbox—a safe area in which to run untrusted code—is a difficult problem. The successful sandbox implementations tend to come with completely new languages (e.g. Java) that are specifically designed to support that functionality. Trying to sandbox C code is a much more difficult task, but one that the Google Chrome web browser team has been working on.
The basic idea is to restrict the WebKit-based renderer—along with the various image and other format libraries that are linked to it—so that browser-based vulnerabilities are unable to affect the system as a whole. A successful sandbox for the browser would eliminate a whole class of problems that plague Firefox and other browsers that require frequent, critical security updates. Essentially, the browser would protect users from bugs in the rendering of maliciously-crafted web pages, so that they could not lead to system or user data compromise.
The Chrome browser, and its free software counterpart, Chromium, is designed around the idea of separate processes for each tab, both for robustness and security. A misbehaving web page can only affect the process controlling that particular tab, so it won't bring the entire browser down if it causes the process to crash. In addition, these processes are considered to be "untrusted", in that they could have been compromised by some web page exploiting a bug in the renderer. The sandbox scheme works by severely restricting the actions that untrusted processes can take directly.
At some level, Linux already has a boundary that isolates programs from the underlying system: system calls. A program that does no system calls should not be able to affect anything else, at least permanently. But it is a trivial program indeed that does not need to call on some system services. A largely unknown kernel feature, seccomp, allows processes to call a very small subset of system calls—just read(), write(), sigreturn(), and exit()—aborting a process that attempts to call any other. That is the starting point for the Chromium sandbox.
But, there are other system calls that the browser might need to make. For one thing, memory allocation might require the brk() system call. Also, the renderer needs to be able to share memory with the X server for drawing. And so on. Any additional system calls, beyond the four that seccomp allows, have to be handled differently.
A proposed change to seccomp that would allow finer-grained control over which system calls were allowed didn't get very far. In any case, that wasn't a near-term solution, so Markus Gutschke of the Chrome team went in another direction. By splitting the renderer process into trusted and untrusted threads, some system calls could be allowed for the untrusted thread by making the equivalent of a remote procedure call (RPC) to the trusted thread. The trusted thread could then verify that the system call, and its arguments, were reasonable and, if so, perform the requested action.
Chrome team member Adam Langley describes it this way:
The trusted thread can receive requests to make system calls from the untrusted thread over a socket pair, validate the system call number and perform them on its behalf. We can stop the untrusted thread from breaking out by only using CPU registers and by refusing to let the untrusted code manipulate the VM in unsafe ways with mmap, mprotect etc.
There are still problems with that approach, however. For one thing, the renderer code is large, with many different system calls scattered throughout. Turning each of those into an RPC is possible, but then would have to be maintained by the Chromium developers going forward. The upstream projects (WebKit, et. al.) would not be terribly interested in those changes, so each new revision from upstream would need to be patched and then checked for new system calls.
Another approach might be to use LD_PRELOAD trickery to intercept the calls in glibc. That has its own set of problems as Langley points out: "we could try and intercept at dynamic linking time, assuming that all the system calls are via glibc. Even if that were true, glibc's functions make system calls directly, so we would have to patch at the level of functions like printf rather than write."
So, a method of finding and patching the system calls at runtime was devised. It uses a disassembler on the executable code, finds each system call and turns it into an RPC to the trusted thread. Correctly parsing x86 machine code is notoriously difficult, but it doesn't have to be perfect. Because the untrusted thread runs in seccomp mode, any system call that is missed will not lead to a security breach, as the kernel will abort the thread if it attempts any but the trusted four system calls. As Langley puts it:
The last piece of the puzzle is handling time-of-check-to-time-of-use race conditions. System call arguments that are passed in memory, via pointers or for system calls with too many arguments to fit in registers, can be changed by the, presumably subverted, untrusted thread between the time they are checked for validity and when they are used. To handle that, a trusted process, which is shared between all of the renderers, is created to check system calls that cannot be verified within the address space of the untrusted renderer.
The trusted process shares a few pages of memory with each trusted thread, which are read-only to the trusted thread, and read-write for the trusted process. System calls that cannot be handled by the trusted thread, either because some of the arguments live in memory, or because the verification process is too complex to be reasonably done in assembly code, are handed off to the trusted process. The arguments are copied by the trusted process into its address space, so they are immune to changes from the untrusted code.
While the current implementation is for x86 and x86-64—though there are still a few issues to be worked out with the V8 Javascript engine on x86-64—there is a clear path for other architectures. Adapting or writing a disassembler and writing the assembly language trusted thread are the two pieces needed to support each additional architecture. According to Langley:
There are some potential pitfalls in this sandbox mechanism. Bugs in the implementation of the trusted pieces—either coding errors or mistakes made in determining which system calls and arguments are "safe"—could certainly lead to problems. Currently, deciding which calls to allow is done on an ad hoc basis, by running the renderer, seeing which calls it makes, and deciding which are reasonable. The outcome of those decisions are then codified in syscall_table.c.
One additional, important area that is not covered by the sandbox are plugins like Flash. Restricting what plugins can do does not fit well with what users expect, which makes plugins a major vector for attack. Langley said that the plugin support on Linux is relatively new, but "our experience on Windows is that, in order for Flash to do all the things that various sites expect it to be able to do, the sandbox has to be so full of holes that it's rather useless". He is currently looking at SELinux as a way to potentially restrict plugins, but, for now, they are wide open.
This is a rather—some would say overly—complex scheme. It is still in the experimental stage, so changes are likely, but it does show one way to protect browser users from bugs in the HTML renderer that might lead to system or data compromise. It certainly doesn't solve all of the web's security problems, but could, over time, largely eliminate a whole class of attacks. It is definitely a project worth keeping an eye on.
[ Many thanks to Adam Langley, whose document was used as a basis for this article, and who patiently answered questions from the author. ]
Google's Chromium sandbox
Posted Aug 19, 2009 15:37 UTC (Wed) by johill (subscriber, #25196) [Link]
Also -- I first wondered why they weren't using processes to start with to get the secure/insecure boundary more defined, but once you think about it more it doesn't seem like you could then do the disasm stuff ... might be worth mentioning that :)
Either way, interesting method, and nice article!
Google's Chromium sandbox
Posted Aug 19, 2009 16:23 UTC (Wed) by jake (editor, #205) [Link]
I don't think, but don't know for sure, that it is required to have a thread to do the disassembling. I believe that is done by the untrusted thread before it handles any user input, and before it enters seccomp mode.
jake
Google's Chromium sandbox
Posted Aug 20, 2009 0:43 UTC (Thu) by cventers (subscriber, #31465) [Link]
I should have been more clear about why a thread is needed. Certain operations, memory allocation for example, cannot be done in one process on behalf of another because they don't share address space.
On the contrary, I experimented with a technique to do just that. This may not be the perfect solution for Chrome's needs, but I played around with the idea of open()ing a shared memory segment on the vfs, using ftruncate() to resize it, and then sending the fd via a UNIX-domain socket to the untrusted process and allowing it to mmap() the pages.
Now, in my case, I was using this technique to allow dynamically-grown, runtime-allocated shared memory segments between untrusted processes. There are still complications (such as the need to install a SIGBUS handler since the untrusted process might ftruncate() the mmaped fd() to 0, causing the trusted process to fault when it tries to access its own mmap()), and perhaps the requirements for this kind of an implementation are not easy to satisfy for desktop applications. But it's Linux, and there's more than one way to do it. My implementation had the advantage of being architecture-agnostic, as well-behaved user-space code should be.
Google's Chromium sandbox
Posted Aug 20, 2009 0:58 UTC (Thu) by agl (guest, #4541) [Link]
Google's Chromium sandbox
Posted Aug 20, 2009 8:59 UTC (Thu) by mingo (subscriber, #31122) [Link]
Btw., (and i raised this on lkml too in the past - at that time the code i referred to was not upstream yet), there's a way you could further increase the restrictions (and hence, the security) of the untrusted seccomp thread: by the use of the C expressions filter engine that is included in the upstream kernel. (right now used by ftrace and will also be used by perfcounters)
The engine accepts an ASCII C-ish expression runtime, such as:
"fd <= 2 && addr == 0x1234000 && len == 4096"
... and turns/parses that into a cached list of safe predicaments that the kernel will execute atomically on syscall arguments. Once parsed (by the kernel), the execution of the filter expression is very fast.
Despite it being used for tracing currently, the filter engine is generic and can be reused not just to limit trace entries of syscalls, but also to restrict execution on syscalls.
This is real, working code very close to what you need. With latest -tip you can use the filter engine on a per syscall basis, and the kernel knows about the parameter names of system calls. So on a testbox i can do this:
# cd /debug/tracing/events/syscalls/sys_enter_read # echo "fd <= 2 && buf == 0x120000 && count == 1024" > filter # cat filter fd <= 2 && buf == 0x120000 && count == 1024
... and from that point on the kernel can execute that filter expression to limit trace entries that match the expression.
All you need is a small extension to seccomp to allow the installation of such expressions from user-space, by passing in the ASCII string. The filter engine can be used by unprivileged user-space as well. (but obviously the untrusted sandboxed thread should not be allowed to modify it.)
The filter engine has no deep dependence on tracing (other than being used by it currently) - it is a safe parser and atomic script execution engine that can be utilized by unprivileged tasks too and so it could be reused in seccomp and could be reused by other Linux security frameworks as well, such as selinux or netfilter.
Google's Chromium sandbox
Posted Aug 20, 2009 14:41 UTC (Thu) by paragw (guest, #45306) [Link]
How would one deal with which process can specify which other process or
thread can do what syscalls with what arguments and is the change permanent
and localized w.r.t the target thread? How does one go about safely modifying
the restrictions dynamically - the restricted thread needs to open a FD with
user permission that wasn't in the originally specified restrictions list?
From what you described there seem to be some significant usability problems
(need to have tracing enabled, debug file system mounted, user-space access
to the filtering mechanism and per PID operation etc.) that need to be
addressed before it can become generally usable?
Google's Chromium sandbox
Posted Aug 20, 2009 19:33 UTC (Thu) by mingo (subscriber, #31122) [Link]
Does this approach work on a per process basis? I.e. do the restrictions apply to a particular process/thread while others are not impacted?
It's an engine - and as such it takes ASCII strings, turns them into a 'filter object' in essence which you can then attach to anything and pass in values to evaluate.
Note that there's nothing 'tracing' about that concept.
Right now we attach such filters to tracepoints - such as syscall tracepoints.
It could be attached via seccomp and to an untrusted process as well, with minimal amount of code, if there's interest to share this facility for such purposes.
Google's Chromium sandbox
Posted Aug 19, 2009 15:58 UTC (Wed) by johill (subscriber, #25196) [Link]
Why, for example, can an untrusted process look into my filesystem using getdents() without any checking?
I think that file should come with comments as to why it is allowed, etc., because otherwise it's JUST a collection of arbitrary things, with that information at least it would be verifiable why/that it is needed.
Google's Chromium sandbox
Posted Aug 19, 2009 16:32 UTC (Wed) by foom (subscriber, #14868) [Link]
Why, for example, can an untrusted process look into my filesystem using getdents() without any checking?Presumably because getdents takes an already-open fd, and open is sandboxed.
Qemu user space emulation
Posted Aug 19, 2009 16:07 UTC (Wed) by leonb (guest, #3054) [Link]
- L.
Qemu user space emulation
Posted Aug 19, 2009 16:19 UTC (Wed) by johill (subscriber, #25196) [Link]
VEX
Posted Aug 19, 2009 17:55 UTC (Wed) by abacus (subscriber, #49001) [Link]
VEX
Posted Aug 19, 2009 19:04 UTC (Wed) by agl (guest, #4541) [Link]
But also, we wouldn't want to transform all the code back and forth. By
patching the code rather than transforming it we can reuse nearly all the
.text pages and save memory.
Google's Chromium sandbox
Posted Aug 19, 2009 20:54 UTC (Wed) by kjp (subscriber, #39639) [Link]
Was there consideration of using x86 ring 1 or 2 for this purpose? Is that too architecture dependent?
Anyway... still an interesting idea. The syscall table looks refreshingly small. I noticed things like socket, connect aren't in there... I take it the network IO is still running in the trusted/main process?
Google's Chromium sandbox
Posted Aug 19, 2009 22:03 UTC (Wed) by agl (guest, #4541) [Link]
Also, you're correct that all network IO runs in the main browser process.
This is actually a little unfortunate: it would be best to have a separate,
sandboxed process for that but, alas, that's only a wishlist item for now.
Google's Chromium sandbox
Posted Aug 19, 2009 22:22 UTC (Wed) by ikm (subscriber, #493) [Link]
Google's Chromium sandbox
Posted Aug 19, 2009 23:36 UTC (Wed) by ncm (subscriber, #165) [Link]
Google's Chromium sandbox
Posted Aug 20, 2009 1:33 UTC (Thu) by njs (guest, #40338) [Link]
Google's Chromium sandbox
Posted Aug 20, 2009 2:40 UTC (Thu) by ncm (subscriber, #165) [Link]
Google's Chromium sandbox
Posted Oct 15, 2009 21:57 UTC (Thu) by SEJeff (subscriber, #51588) [Link]
Sandboxing made easy
Posted Aug 20, 2009 0:14 UTC (Thu) by man_ls (guest, #15091) [Link]
This is probably a stupid question, but I have to ask. Why not use read() and write() to make the untrusted part communicate with the trusted part, via a pipe? The untrusted part (a process) could decipher the HTML, and then send the result in an intermediate form to the trusted part (another process) for it to display that on the screen. Any compromise would have to generate an intermediate "poisoned" form that did something bad to the trusted part, but sending the malicious payload would be really difficult.It does look quite complex, but the sandboxing is not trivial either.
Sandboxing made easy
Posted Aug 20, 2009 0:33 UTC (Thu) by Simetrical (guest, #53439) [Link]
Sandboxing made easy
Posted Aug 20, 2009 18:13 UTC (Thu) by man_ls (guest, #15091) [Link]
Ah, but of course -- sounds obvious once it is pointed out. Stupid dangers of memory management!
Sandboxing made easy
Posted Aug 20, 2009 16:25 UTC (Thu) by martine (guest, #59979) [Link]
This article about the architecture used to make the HTML-decoding process
both sandboxed but still powerful enough to convert HTML into images (which
are then sent back to the trusted process).
Generic sandbox needed
Posted Aug 22, 2009 12:34 UTC (Sat) by Wout (guest, #8750) [Link]
If the kernel would provide a flexible mechanism for an application to limit what it can do, the threat of hostile data could be reduced. A combination of user level chroot ("This application doesn't need anything outside this directory.") and an allowed system call mask ("This application will only use these system calls, it doesn't need the rest.") should severely limit what an attacker can do.
Generic sandbox needed
Posted Sep 4, 2009 20:18 UTC (Fri) by cmccabe (guest, #60281) [Link]
I thought that this was what selinux was all about.
The basic idea behind selinux is that rather than using identity-based security, you use capability-based security
Identity-based security works like this: I am a process started by bob, therefore I can do everything bob can do. Capability-based security works like this: bob starts a process and gives it only the capabilities it needs to do the work it's supposed to do.
So bob runs a spell-checker program (aspell or whatever), it shouldn't have the capability to open network sockets and send messages to evilhackers.com. It's the difference between giving the application a few keys, to open the doors it needs, and giving it the whole keyring, which is what we do with traditional uid / gid based security.
It seems like what the google people are trying to do here is to reinvent the selinux concept with seccomp. I'm curious as to why. I guess selinux is difficult to set up and configure, and a lot of distributions have been slow to adopt it. Perhaps they are also trying to be cross-platform?
I'm also curious why Google is using threads rather than processes here. If you don't want to share your memory with the untrusted guy, processes are the obvious solution. As other have noted, you can always use posix shared memory if you feel the need to directly access the memory of the untrusted guy. As a bonus, you could run the untrusted processes as "nobody," and prevent them from doing a lot of nasty things -- even on a system like openBSD, where seccomp and selinux are unheard-of.
P.S.
I seem to remember that the openBSD ssh daemon was written in a similar way. There was an trusted part which ran as root, and an untrusted part which ran as a regular user.
Google's Chromium sandbox
Posted Aug 23, 2009 8:47 UTC (Sun) by oak (guest, #2786) [Link]
And btw, one can easily do a DOS with memory allocations. Just alloc
large enough amount of memory (but not so large that it would trigger
OOM-killer) and then constantly write over it. Device is frozen swapping
until the process is killed.
As to LD_PRELOAD and ptrace(), former doesn't catch syscalls done directly
in ASM and AFAIK ptrace is racy (if I remember correctly, this was
mentioned in the discussions about utrace).
Regarding things like Flash. Until that can be secured, this doesn't
really do browser any safer for the normal users. Most of the content on
web that non-technical people use and are interested uses Flash in some
way. Especially for media delivery. What's the point of securing a mouse
hole if the barn doors are wide open?
Google's Chromium sandbox
Posted Aug 23, 2009 14:49 UTC (Sun) by i3839 (guest, #31386) [Link]
For its design see http://www.cs.vu.nl/~guido/publications/ps/secrypt07.pdf
The rewritten version does some things differently and doesn't yet support all features of the original one. The code isn't released yet, but we plan to release it under a BSD-like license. If interested email Guido or me ([email protected]).
Google's Chromium sandbox
Posted Aug 29, 2009 5:20 UTC (Sat) by gmatht (subscriber, #58961) [Link]
However, I am "interested" in packaging this for Ubuntu. I really don't have
time now, but I may drop you an email in a few months. Having an easy to use
sandbox tool would be very nice.
Google's Chromium sandbox
Posted Oct 12, 2009 21:01 UTC (Mon) by cwitty (subscriber, #4600) [Link]
"Forbidden
You don't have permission to access /~guido/publications/ps/secrypt07.pdf on this server."
Google's Chromium sandbox
Posted Oct 21, 2009 10:36 UTC (Wed) by i3839 (guest, #31386) [Link]
Copyright © 2009, Eklektix, Inc.
This article may be redistributed under the terms of the
Creative
Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds