123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222 |
- MEMORY ATTRIBUTE ALIASING ON IA-64
- Bjorn Helgaas
- <bjorn.helgaas@hp.com>
- May 4, 2006
- MEMORY ATTRIBUTES
- Itanium supports several attributes for virtual memory references.
- The attribute is part of the virtual translation, i.e., it is
- contained in the TLB entry. The ones of most interest to the Linux
- kernel are:
- WB Write-back (cacheable)
- UC Uncacheable
- WC Write-coalescing
- System memory typically uses the WB attribute. The UC attribute is
- used for memory-mapped I/O devices. The WC attribute is uncacheable
- like UC is, but writes may be delayed and combined to increase
- performance for things like frame buffers.
- The Itanium architecture requires that we avoid accessing the same
- page with both a cacheable mapping and an uncacheable mapping[1].
- The design of the chipset determines which attributes are supported
- on which regions of the address space. For example, some chipsets
- support either WB or UC access to main memory, while others support
- only WB access.
- MEMORY MAP
- Platform firmware describes the physical memory map and the
- supported attributes for each region. At boot-time, the kernel uses
- the EFI GetMemoryMap() interface. ACPI can also describe memory
- devices and the attributes they support, but Linux/ia64 currently
- doesn't use this information.
- The kernel uses the efi_memmap table returned from GetMemoryMap() to
- learn the attributes supported by each region of physical address
- space. Unfortunately, this table does not completely describe the
- address space because some machines omit some or all of the MMIO
- regions from the map.
- The kernel maintains another table, kern_memmap, which describes the
- memory Linux is actually using and the attribute for each region.
- This contains only system memory; it does not contain MMIO space.
- The kern_memmap table typically contains only a subset of the system
- memory described by the efi_memmap. Linux/ia64 can't use all memory
- in the system because of constraints imposed by the identity mapping
- scheme.
- The efi_memmap table is preserved unmodified because the original
- boot-time information is required for kexec.
- KERNEL IDENTITY MAPPINGS
- Linux/ia64 identity mappings are done with large pages, currently
- either 16MB or 64MB, referred to as "granules." Cacheable mappings
- are speculative[2], so the processor can read any location in the
- page at any time, independent of the programmer's intentions. This
- means that to avoid attribute aliasing, Linux can create a cacheable
- identity mapping only when the entire granule supports cacheable
- access.
- Therefore, kern_memmap contains only full granule-sized regions that
- can referenced safely by an identity mapping.
- Uncacheable mappings are not speculative, so the processor will
- generate UC accesses only to locations explicitly referenced by
- software. This allows UC identity mappings to cover granules that
- are only partially populated, or populated with a combination of UC
- and WB regions.
- USER MAPPINGS
- User mappings are typically done with 16K or 64K pages. The smaller
- page size allows more flexibility because only 16K or 64K has to be
- homogeneous with respect to memory attributes.
- POTENTIAL ATTRIBUTE ALIASING CASES
- There are several ways the kernel creates new mappings:
- mmap of /dev/mem
- This uses remap_pfn_range(), which creates user mappings. These
- mappings may be either WB or UC. If the region being mapped
- happens to be in kern_memmap, meaning that it may also be mapped
- by a kernel identity mapping, the user mapping must use the same
- attribute as the kernel mapping.
- If the region is not in kern_memmap, the user mapping should use
- an attribute reported as being supported in the EFI memory map.
- Since the EFI memory map does not describe MMIO on some
- machines, this should use an uncacheable mapping as a fallback.
- mmap of /sys/class/pci_bus/.../legacy_mem
- This is very similar to mmap of /dev/mem, except that legacy_mem
- only allows mmap of the one megabyte "legacy MMIO" area for a
- specific PCI bus. Typically this is the first megabyte of
- physical address space, but it may be different on machines with
- several VGA devices.
- "X" uses this to access VGA frame buffers. Using legacy_mem
- rather than /dev/mem allows multiple instances of X to talk to
- different VGA cards.
- The /dev/mem mmap constraints apply.
- mmap of /proc/bus/pci/.../??.?
- This is an MMIO mmap of PCI functions, which additionally may or
- may not be requested as using the WC attribute.
- If WC is requested, and the region in kern_memmap is either WC
- or UC, and the EFI memory map designates the region as WC, then
- the WC mapping is allowed.
- Otherwise, the user mapping must use the same attribute as the
- kernel mapping.
- read/write of /dev/mem
- This uses copy_from_user(), which implicitly uses a kernel
- identity mapping. This is obviously safe for things in
- kern_memmap.
- There may be corner cases of things that are not in kern_memmap,
- but could be accessed this way. For example, registers in MMIO
- space are not in kern_memmap, but could be accessed with a UC
- mapping. This would not cause attribute aliasing. But
- registers typically can be accessed only with four-byte or
- eight-byte accesses, and the copy_from_user() path doesn't allow
- any control over the access size, so this would be dangerous.
- ioremap()
- This returns a mapping for use inside the kernel.
- If the region is in kern_memmap, we should use the attribute
- specified there.
- If the EFI memory map reports that the entire granule supports
- WB, we should use that (granules that are partially reserved
- or occupied by firmware do not appear in kern_memmap).
- If the granule contains non-WB memory, but we can cover the
- region safely with kernel page table mappings, we can use
- ioremap_page_range() as most other architectures do.
- Failing all of the above, we have to fall back to a UC mapping.
- PAST PROBLEM CASES
- mmap of various MMIO regions from /dev/mem by "X" on Intel platforms
- The EFI memory map may not report these MMIO regions.
- These must be allowed so that X will work. This means that
- when the EFI memory map is incomplete, every /dev/mem mmap must
- succeed. It may create either WB or UC user mappings, depending
- on whether the region is in kern_memmap or the EFI memory map.
- mmap of 0x0-0x9FFFF /dev/mem by "hwinfo" on HP sx1000 with VGA enabled
- The EFI memory map reports the following attributes:
- 0x00000-0x9FFFF WB only
- 0xA0000-0xBFFFF UC only (VGA frame buffer)
- 0xC0000-0xFFFFF WB only
- This mmap is done with user pages, not kernel identity mappings,
- so it is safe to use WB mappings.
- The kernel VGA driver may ioremap the VGA frame buffer at 0xA0000,
- which uses a granule-sized UC mapping. This granule will cover some
- WB-only memory, but since UC is non-speculative, the processor will
- never generate an uncacheable reference to the WB-only areas unless
- the driver explicitly touches them.
- mmap of 0x0-0xFFFFF legacy_mem by "X"
- If the EFI memory map reports that the entire range supports the
- same attributes, we can allow the mmap (and we will prefer WB if
- supported, as is the case with HP sx[12]000 machines with VGA
- disabled).
- If EFI reports the range as partly WB and partly UC (as on sx[12]000
- machines with VGA enabled), we must fail the mmap because there's no
- safe attribute to use.
- If EFI reports some of the range but not all (as on Intel firmware
- that doesn't report the VGA frame buffer at all), we should fail the
- mmap and force the user to map just the specific region of interest.
- mmap of 0xA0000-0xBFFFF legacy_mem by "X" on HP sx1000 with VGA disabled
- The EFI memory map reports the following attributes:
- 0x00000-0xFFFFF WB only (no VGA MMIO hole)
- This is a special case of the previous case, and the mmap should
- fail for the same reason as above.
- read of /sys/devices/.../rom
- For VGA devices, this may cause an ioremap() of 0xC0000. This
- used to be done with a UC mapping, because the VGA frame buffer
- at 0xA0000 prevents use of a WB granule. The UC mapping causes
- an MCA on HP sx[12]000 chipsets.
- We should use WB page table mappings to avoid covering the VGA
- frame buffer.
- NOTES
- [1] SDM rev 2.2, vol 2, sec 4.4.1.
- [2] SDM rev 2.2, vol 2, sec 4.4.6.
|