13.2 Restricted Execution

Python code executed dynamically normally suffers no special restrictions. Python's general philosophy is to give the programmer tools and mechanisms that make it easy to write good, safe code, and trust the programmer to use them appropriately. Sometimes, however, trust might not be warranted. When code to execute dynamically comes from an untrusted source, the code itself is untrusted. In such cases it's important to selectively restrict the execution environment so that such code cannot accidentally or maliciously inflict damage. If you never need to execute untrusted code, you can skip this section. However, Python makes it easy to impose appropriate restrictions on untrusted code if you ever do need to execute it.

When the _ _builtins_ _ item in the global namespace isn't the standard _ _builtin_ _ module (or the latter's dictionary), Python knows the code being run is restricted. Restricted code executes in a sandbox environment, previously prepared by the trusted code, that requests the restricted code's execution. Standard modules rexec and Bastion help you prepare an appropriate sandbox. To ensure that restricted code cannot escape the sandbox, a few crucial internals (e.g., the _ _dict_ _ attributes of modules, classes, and instances) are not directly available to restricted code.

There is no special protection against restricted code raising exceptions. On the contrary, Python diagnoses any attempt by restricted code to violate the sandbox restrictions by raising an exception. Therefore, you should generally run restricted code in the try clause of a try/except statement, as covered in Chapter 6. Make sure you catch all exceptions and handle them appropriately if your program needs to keep running in such cases.

There is no built-in protection against untrusted code attempting to inflict damage by consuming large amounts of memory or time (so-called denial-of-service attacks). If you need to ward against such attacks, you can run untrusted code in a separate process. The separate process uses the mechanisms described in this section to restrict the untrusted code's execution, while the main process monitors the separate one and terminates it if and when resource consumption becomes excessive. Processes are covered in Chapter 14. Resource monitoring is currently supported by the standard Python library only on Unix-like platforms (by platform-specific module resource), and this book covers only cross-platform Python.

As a final note, you need to know that there are known, exploitable security weaknesses in the restricted-execution mechanisms, even in the most recent versions of Python. Although restricted execution is better than nothing, at the time of this writing there are no known ways to execute untrusted code that are suitable for security-critical situations.

13.2.1 The rexec Module

The rexec module supplies the RExec class, which you can instantiate to prepare a typical restricted-execution sandbox environment in which to run untrusted code.

RExec

class RExec(hooks=None,verbose=False)

Returns an instance of the RExec class, which corresponds to a new restricted-execution environment, also known as a sandbox. hooks, if not None, lets you exert fine-grained control on import statements executed in the sandbox. This is an advanced and rarely used functionality, and I do not cover it further in this book. verbose, if true, causes additional debugging output to be sent to standard output for many kinds of operations in the sandbox.

13.2.1.1 Methods

An instance r of RExec provides the following methods. Versions of RExec's methods whose names start with s_ rather than r_ are also available. An r_ method and its s_ variant are equivalent, but the latter also ensures that untrusted code can call only safe methods on standard file objects sys.stdin, sys.stdout, and sys.stderr. This is needed only in the unusual case in which you have replaced the standard file objects with file-like objects that also expose additional, unsafe methods or attributes.

r_add_module

r.r_add_module(modname)

Adds and returns a new empty module if no module yet corresponds to name modname in the sandbox. If the sandbox already contains a module object that corresponds to name modname, r_add_module returns that module object.

r_eval, s_eval

r.r_eval(expr)
r.s_eval(expr)

r_eval executes expr, which must be an expression or a code object, in the restricted environment and returns the expression's result.

r_exec, s_exec

r.r_exec(code)
r.s_exec(code)

r_exec executes code, which must be a string of code or a code object, in the restricted environment.

r_execfile, s_execfile

r.r_execfile(filename)
r.s_execfile(filename)

r_execfile executes the file identified by filename, which must contain Python code, in the restricted environment.

r_import, s_import

r.r_import(modname[,globals[,locals[,fromlist]]])
r.s_import(modname[,globals[,locals[,fromlist]]])

Imports the module modname into the restricted environment. All parameters are just like for built-in function _ _import_ _, covered in Chapter 7. r_import raises ImportError if the module is considered unsafe. A subclass of RExec may override r_import, to change the set of modules available to import statements in untrusted code and/or to otherwise change import functionality for the sandbox.

r_open

r.r_open(filename[,mode[,bufsize]])

Executes when restricted code calls the built-in open. All parameters are just like for the built-in open, covered in Chapter 10. The version of r_open in class RExec opens any file for reading, but none for writing or appending. A subclass may ease or tighten these restrictions.

r_reload, s_reload

r.r_reload(module)
r.s_reload(module)

Reloads the module object module in the restricted-execution environment, similarly to built-in function reload, covered in Chapter 7.

r_unload, s_unload

r.r_unload(module)
r.s_unload(module)

Unloads the module object module from the restricted-execution environment (i.e., removes it from the dictionary sys.modules as seen by untrusted code executing in the sandbox).

13.2.1.2 Attributes

When RExec's defaults don't fully correspond to your application's specific needs, you can easily customize the restricted-execution sandbox. Class RExec has several attributes that are tuples of strings. The items of these tuples are names of functions, modules, or directories to be specifically allowed or disallowed, as follows:

nok_builtin_names: Built-in functions not to be supplied in the sandbox
ok_builtin_modules: Built-in modules that the sandbox can import
ok_path: Used as sys.path for the sandbox's import statements
ok_posix_names: Attributes of os that the sandbox may import
ok_sys_names: Attributes of sys that the sandbox may import

When you instantiate RExec, the new instance uses class attributes to prepare the sandbox. If you need to customize the sandbox, subclass RExec and instantiate the subclass. Your subclass can override RExec's attributes, typically by copying the value that each attribute has in RExec and selectively adding or removing specific items.

13.2.1.3 Using rexec

In the simplest case, you can instantiate RExec and call the instance's r_exec and r_eval methods instead of using statement exec and built-in function eval. For example, here's a somewhat safer variant of built-in function input:

import rexec
rex = rexec.RExec(  )
def rexinput(prompt):
    expr = raw_input(prompt)
    return rex.r_eval(expr)

Function rexinput in this example is roughly equivalent to built-in function input, covered in Chapter 8. However, rexinput wards against some of the abuses that are possible if you don't trust the user who's supplying input. For example, with the normal, unrestricted eval, an expression such as _ _import_ _('os').system('xx') lets the interactive user run any external program xx. Built-in function input implicitly uses normal, unrestricted eval on the user's input. Function rexinput uses restricted execution instead, so that the same expression fails and raises AttributeError, claiming that module os has no attribute named system. This example does not use a try/except around the r_eval call, but of course your application code that calls rexinput should use try/except if you need your program to keep executing when the user makes mistakes or unsuccessful attempts to break security. Mistakes and attempts to break security both get diagnosed through exceptions.

This example's usefulness comes from the fact that a restricted-execution sandbox can hide some functionality from untrusted code, so that untrusted code cannot take advantage of that functionality to wreak havoc. Function os.system is a prime example of functionality that should always be prohibited to untrusted code, so class RExec forbids it by default.

After creating a new restricted-execution environment r with r=rexec.RExec( ), you can optionally complete r's initialization by inserting modules into r's sandbox with add_module, then inserting attributes in those modules with built-in function setattr. Simple assignment statements also work just fine if the attributes have names that you know at the time you're writing your sandbox-preparation code. Here's how to enrich the previous example to let the user-entered expressions use all functions from module math (covered in Chapter 15) as if they were built-ins, since you know that none of the functions presents any security risk:

import rexec, math
rex = rexec.RExec(  )
burex = rex.add_module('_ _builtins_ _')
for function in dir(math):
    if function[0] != '_':
        setattr(burex, function, getattr(math, function))
def rich_input(prompt):
    expr = raw_input(prompt)
    return rex.r_eval(expr)

Function rich_input in this example is now both richer and safer than the built-in input. It's richer because the user can now also input expressions such as sin(1.0). It's safer, just like rexinput in the previous example, because it uses restricted execution to limit untrusted code.

Normally, you use add_module, and then add attributes, only for the modules named '_ _main_ _' and '_ _builtins_ _'. If the untrusted code needs other modules that it is allowed to import (based on the ok_builtin_modules and ok_path attributes of the RExec subclass you instantiated), the untrusted code can import those other modules normally, usually with an import statement or a call to built-in function _ _import_ _. However, you can also choose to use add_module for other module names in order to synthesize, restrict, or otherwise modify modules that later get imported by the untrusted code.

Once you have populated the sandbox, untrusted code can call the functions and other callables that you added to the sandbox. When called, such functions and other callables execute in the normal (non-sandbox) environment, without constraints. You should therefore ensure that untrusted code cannot cause damage by misusing such callables. Module Bastion, covered in the next section, deals with the specific task of selectively exposing object methods.

13.2.2 The Bastion Module

The Bastion module supplies a class, each of whose instances wraps an object and selectively exposes some of the wrapped object's methods, but no other attributes.

Bastion

class Bastion(obj,filter=lambda n: n[:1]!='_',name=None)

A Bastion instance b wrapping object obj exposes only those methods of obj for whose name filter returns true. An access b.attr works like:

if filter('attr'): return obj.attr
else: raise AttributeError, 'attr'

plus a check that b.attr is a method, not an attribute of any other type.

The default filter accepts all method names that do not start with an underscore (_) (i.e., all methods that are neither private nor special methods). When name is not None, repr(b) is the string '<Bastion for name>'. When name is None, repr(b) is '<Bastion for %s>' % repr(obj).

Suppose, for example, that your application supplies a class MyClass whose public methods are all safe, while private and special methods, as well as attributes that are not methods, should be hidden from untrusted code. In the sandbox, you can provide a factory function that supplies safely wrapped instances of MyClass to untrusted code as follows:

import rexec, Bastion
rex = rexec.RExec(  )
burex = rex.add_module('_ _builtins_ _')
def SafeMyClassFactory(*args, **kwds):
    return Bastion.Bastion(MyClass(*args, **kwds))
burex.MyClass = SafeMyClassFactory

Now, untrusted code that you run with rex.r_exec can instantiate and use safely wrapped instances of MyClass:

m = MyClass(1,2,3)
m.somemethod(4,5)

However, any attempt by the untrusted code to access private or special methods, even indirectly (e.g., m[6]=7 indirectly tries to use special method _ _setitem_ _), raises AttributeError, whether the real MyClass supplies such methods or not. Suppose you want a slightly less tight wrapping, allowing untrusted code to use special method _ _getitem_ _, as well as normal public methods, but no other. You just need to provide a custom filter function when you instantiate Bastion:

import rexec, Bastion
rex = rexec.RExec(  )
burex = rex.add_module('_ _builtins_ _')
def SafeMyClassFactory(*args, **kwds):
    def is_safe(n): n=  ='_ _getitem_ _' or n[0]!='_'
    return Bastion.Bastion(MyClass(*args, **kwds), is_safe)
burex.MyClass = SafeMyClassFactory

Now, untrusted code that is run in sandbox rex can get, but not set, items of the instances of MyClass it builds with the factory function (assuming, of course, that your class MyClass does supply method _ _getitem_ _).