123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245 |
- 1. Intel(R) MPX Overview
- ========================
- Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new capability
- introduced into Intel Architecture. Intel MPX provides hardware features
- that can be used in conjunction with compiler changes to check memory
- references, for those references whose compile-time normal intentions are
- usurped at runtime due to buffer overflow or underflow.
- You can tell if your CPU supports MPX by looking in /proc/cpuinfo:
- cat /proc/cpuinfo | grep ' mpx '
- For more information, please refer to Intel(R) Architecture Instruction
- Set Extensions Programming Reference, Chapter 9: Intel(R) Memory Protection
- Extensions.
- Note: As of December 2014, no hardware with MPX is available but it is
- possible to use SDE (Intel(R) Software Development Emulator) instead, which
- can be downloaded from
- http://software.intel.com/en-us/articles/intel-software-development-emulator
- 2. How to get the advantage of MPX
- ==================================
- For MPX to work, changes are required in the kernel, binutils and compiler.
- No source changes are required for applications, just a recompile.
- There are a lot of moving parts of this to all work right. The following
- is how we expect the compiler, application and kernel to work together.
- 1) Application developer compiles with -fmpx. The compiler will add the
- instrumentation as well as some setup code called early after the app
- starts. New instruction prefixes are noops for old CPUs.
- 2) That setup code allocates (virtual) space for the "bounds directory",
- points the "bndcfgu" register to the directory (must also set the valid
- bit) and notifies the kernel (via the new prctl(PR_MPX_ENABLE_MANAGEMENT))
- that the app will be using MPX. The app must be careful not to access
- the bounds tables between the time when it populates "bndcfgu" and
- when it calls the prctl(). This might be hard to guarantee if the app
- is compiled with MPX. You can add "__attribute__((bnd_legacy))" to
- the function to disable MPX instrumentation to help guarantee this.
- Also be careful not to call out to any other code which might be
- MPX-instrumented.
- 3) The kernel detects that the CPU has MPX, allows the new prctl() to
- succeed, and notes the location of the bounds directory. Userspace is
- expected to keep the bounds directory at that location. We note it
- instead of reading it each time because the 'xsave' operation needed
- to access the bounds directory register is an expensive operation.
- 4) If the application needs to spill bounds out of the 4 registers, it
- issues a bndstx instruction. Since the bounds directory is empty at
- this point, a bounds fault (#BR) is raised, the kernel allocates a
- bounds table (in the user address space) and makes the relevant entry
- in the bounds directory point to the new table.
- 5) If the application violates the bounds specified in the bounds registers,
- a separate kind of #BR is raised which will deliver a signal with
- information about the violation in the 'struct siginfo'.
- 6) Whenever memory is freed, we know that it can no longer contain valid
- pointers, and we attempt to free the associated space in the bounds
- tables. If an entire table becomes unused, we will attempt to free
- the table and remove the entry in the directory.
- To summarize, there are essentially three things interacting here:
- GCC with -fmpx:
- * enables annotation of code with MPX instructions and prefixes
- * inserts code early in the application to call in to the "gcc runtime"
- GCC MPX Runtime:
- * Checks for hardware MPX support in cpuid leaf
- * allocates virtual space for the bounds directory (malloc() essentially)
- * points the hardware BNDCFGU register at the directory
- * calls a new prctl(PR_MPX_ENABLE_MANAGEMENT) to notify the kernel to
- start managing the bounds directories
- Kernel MPX Code:
- * Checks for hardware MPX support in cpuid leaf
- * Handles #BR exceptions and sends SIGSEGV to the app when it violates
- bounds, like during a buffer overflow.
- * When bounds are spilled in to an unallocated bounds table, the kernel
- notices in the #BR exception, allocates the virtual space, then
- updates the bounds directory to point to the new table. It keeps
- special track of the memory with a VM_MPX flag.
- * Frees unused bounds tables at the time that the memory they described
- is unmapped.
- 3. How does MPX kernel code work
- ================================
- Handling #BR faults caused by MPX
- ---------------------------------
- When MPX is enabled, there are 2 new situations that can generate
- #BR faults.
- * new bounds tables (BT) need to be allocated to save bounds.
- * bounds violation caused by MPX instructions.
- We hook #BR handler to handle these two new situations.
- On-demand kernel allocation of bounds tables
- --------------------------------------------
- MPX only has 4 hardware registers for storing bounds information. If
- MPX-enabled code needs more than these 4 registers, it needs to spill
- them somewhere. It has two special instructions for this which allow
- the bounds to be moved between the bounds registers and some new "bounds
- tables".
- #BR exceptions are a new class of exceptions just for MPX. They are
- similar conceptually to a page fault and will be raised by the MPX
- hardware during both bounds violations or when the tables are not
- present. The kernel handles those #BR exceptions for not-present tables
- by carving the space out of the normal processes address space and then
- pointing the bounds-directory over to it.
- The tables need to be accessed and controlled by userspace because
- the instructions for moving bounds in and out of them are extremely
- frequent. They potentially happen every time a register points to
- memory. Any direct kernel involvement (like a syscall) to access the
- tables would obviously destroy performance.
- Why not do this in userspace? MPX does not strictly require anything in
- the kernel. It can theoretically be done completely from userspace. Here
- are a few ways this could be done. We don't think any of them are practical
- in the real-world, but here they are.
- Q: Can virtual space simply be reserved for the bounds tables so that we
- never have to allocate them?
- A: MPX-enabled application will possibly create a lot of bounds tables in
- process address space to save bounds information. These tables can take
- up huge swaths of memory (as much as 80% of the memory on the system)
- even if we clean them up aggressively. In the worst-case scenario, the
- tables can be 4x the size of the data structure being tracked. IOW, a
- 1-page structure can require 4 bounds-table pages. An X-GB virtual
- area needs 4*X GB of virtual space, plus 2GB for the bounds directory.
- If we were to preallocate them for the 128TB of user virtual address
- space, we would need to reserve 512TB+2GB, which is larger than the
- entire virtual address space today. This means they can not be reserved
- ahead of time. Also, a single process's pre-populated bounds directory
- consumes 2GB of virtual *AND* physical memory. IOW, it's completely
- infeasible to prepopulate bounds directories.
- Q: Can we preallocate bounds table space at the same time memory is
- allocated which might contain pointers that might eventually need
- bounds tables?
- A: This would work if we could hook the site of each and every memory
- allocation syscall. This can be done for small, constrained applications.
- But, it isn't practical at a larger scale since a given app has no
- way of controlling how all the parts of the app might allocate memory
- (think libraries). The kernel is really the only place to intercept
- these calls.
- Q: Could a bounds fault be handed to userspace and the tables allocated
- there in a signal handler instead of in the kernel?
- A: mmap() is not on the list of safe async handler functions and even
- if mmap() would work it still requires locking or nasty tricks to
- keep track of the allocation state there.
- Having ruled out all of the userspace-only approaches for managing
- bounds tables that we could think of, we create them on demand in
- the kernel.
- Decoding MPX instructions
- -------------------------
- If a #BR is generated due to a bounds violation caused by MPX.
- We need to decode MPX instructions to get violation address and
- set this address into extended struct siginfo.
- The _sigfault field of struct siginfo is extended as follow:
- 87 /* SIGILL, SIGFPE, SIGSEGV, SIGBUS */
- 88 struct {
- 89 void __user *_addr; /* faulting insn/memory ref. */
- 90 #ifdef __ARCH_SI_TRAPNO
- 91 int _trapno; /* TRAP # which caused the signal */
- 92 #endif
- 93 short _addr_lsb; /* LSB of the reported address */
- 94 struct {
- 95 void __user *_lower;
- 96 void __user *_upper;
- 97 } _addr_bnd;
- 98 } _sigfault;
- The '_addr' field refers to violation address, and new '_addr_and'
- field refers to the upper/lower bounds when a #BR is caused.
- Glibc will be also updated to support this new siginfo. So user
- can get violation address and bounds when bounds violations occur.
- Cleanup unused bounds tables
- ----------------------------
- When a BNDSTX instruction attempts to save bounds to a bounds directory
- entry marked as invalid, a #BR is generated. This is an indication that
- no bounds table exists for this entry. In this case the fault handler
- will allocate a new bounds table on demand.
- Since the kernel allocated those tables on-demand without userspace
- knowledge, it is also responsible for freeing them when the associated
- mappings go away.
- Here, the solution for this issue is to hook do_munmap() to check
- whether one process is MPX enabled. If yes, those bounds tables covered
- in the virtual address region which is being unmapped will be freed also.
- Adding new prctl commands
- -------------------------
- Two new prctl commands are added to enable and disable MPX bounds tables
- management in kernel.
- 155 #define PR_MPX_ENABLE_MANAGEMENT 43
- 156 #define PR_MPX_DISABLE_MANAGEMENT 44
- Runtime library in userspace is responsible for allocation of bounds
- directory. So kernel have to use XSAVE instruction to get the base
- of bounds directory from BNDCFG register.
- But XSAVE is expected to be very expensive. In order to do performance
- optimization, we have to get the base of bounds directory and save it
- into struct mm_struct to be used in future during PR_MPX_ENABLE_MANAGEMENT
- command execution.
- 4. Special rules
- ================
- 1) If userspace is requesting help from the kernel to do the management
- of bounds tables, it may not create or modify entries in the bounds directory.
- Certainly users can allocate bounds tables and forcibly point the bounds
- directory at them through XSAVE instruction, and then set valid bit
- of bounds entry to have this entry valid. But, the kernel will decline
- to assist in managing these tables.
- 2) Userspace may not take multiple bounds directory entries and point
- them at the same bounds table.
- This is allowed architecturally. See more information "Intel(R) Architecture
- Instruction Set Extensions Programming Reference" (9.3.4).
- However, if users did this, the kernel might be fooled in to unmapping an
- in-use bounds table since it does not recognize sharing.
|