aliasing.txt 8.5 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222
  1. MEMORY ATTRIBUTE ALIASING ON IA-64
  2. Bjorn Helgaas
  3. <bjorn.helgaas@hp.com>
  4. May 4, 2006
  5. MEMORY ATTRIBUTES
  6. Itanium supports several attributes for virtual memory references.
  7. The attribute is part of the virtual translation, i.e., it is
  8. contained in the TLB entry. The ones of most interest to the Linux
  9. kernel are:
  10. WB Write-back (cacheable)
  11. UC Uncacheable
  12. WC Write-coalescing
  13. System memory typically uses the WB attribute. The UC attribute is
  14. used for memory-mapped I/O devices. The WC attribute is uncacheable
  15. like UC is, but writes may be delayed and combined to increase
  16. performance for things like frame buffers.
  17. The Itanium architecture requires that we avoid accessing the same
  18. page with both a cacheable mapping and an uncacheable mapping[1].
  19. The design of the chipset determines which attributes are supported
  20. on which regions of the address space. For example, some chipsets
  21. support either WB or UC access to main memory, while others support
  22. only WB access.
  23. MEMORY MAP
  24. Platform firmware describes the physical memory map and the
  25. supported attributes for each region. At boot-time, the kernel uses
  26. the EFI GetMemoryMap() interface. ACPI can also describe memory
  27. devices and the attributes they support, but Linux/ia64 currently
  28. doesn't use this information.
  29. The kernel uses the efi_memmap table returned from GetMemoryMap() to
  30. learn the attributes supported by each region of physical address
  31. space. Unfortunately, this table does not completely describe the
  32. address space because some machines omit some or all of the MMIO
  33. regions from the map.
  34. The kernel maintains another table, kern_memmap, which describes the
  35. memory Linux is actually using and the attribute for each region.
  36. This contains only system memory; it does not contain MMIO space.
  37. The kern_memmap table typically contains only a subset of the system
  38. memory described by the efi_memmap. Linux/ia64 can't use all memory
  39. in the system because of constraints imposed by the identity mapping
  40. scheme.
  41. The efi_memmap table is preserved unmodified because the original
  42. boot-time information is required for kexec.
  43. KERNEL IDENTITY MAPPINGS
  44. Linux/ia64 identity mappings are done with large pages, currently
  45. either 16MB or 64MB, referred to as "granules." Cacheable mappings
  46. are speculative[2], so the processor can read any location in the
  47. page at any time, independent of the programmer's intentions. This
  48. means that to avoid attribute aliasing, Linux can create a cacheable
  49. identity mapping only when the entire granule supports cacheable
  50. access.
  51. Therefore, kern_memmap contains only full granule-sized regions that
  52. can referenced safely by an identity mapping.
  53. Uncacheable mappings are not speculative, so the processor will
  54. generate UC accesses only to locations explicitly referenced by
  55. software. This allows UC identity mappings to cover granules that
  56. are only partially populated, or populated with a combination of UC
  57. and WB regions.
  58. USER MAPPINGS
  59. User mappings are typically done with 16K or 64K pages. The smaller
  60. page size allows more flexibility because only 16K or 64K has to be
  61. homogeneous with respect to memory attributes.
  62. POTENTIAL ATTRIBUTE ALIASING CASES
  63. There are several ways the kernel creates new mappings:
  64. mmap of /dev/mem
  65. This uses remap_pfn_range(), which creates user mappings. These
  66. mappings may be either WB or UC. If the region being mapped
  67. happens to be in kern_memmap, meaning that it may also be mapped
  68. by a kernel identity mapping, the user mapping must use the same
  69. attribute as the kernel mapping.
  70. If the region is not in kern_memmap, the user mapping should use
  71. an attribute reported as being supported in the EFI memory map.
  72. Since the EFI memory map does not describe MMIO on some
  73. machines, this should use an uncacheable mapping as a fallback.
  74. mmap of /sys/class/pci_bus/.../legacy_mem
  75. This is very similar to mmap of /dev/mem, except that legacy_mem
  76. only allows mmap of the one megabyte "legacy MMIO" area for a
  77. specific PCI bus. Typically this is the first megabyte of
  78. physical address space, but it may be different on machines with
  79. several VGA devices.
  80. "X" uses this to access VGA frame buffers. Using legacy_mem
  81. rather than /dev/mem allows multiple instances of X to talk to
  82. different VGA cards.
  83. The /dev/mem mmap constraints apply.
  84. mmap of /proc/bus/pci/.../??.?
  85. This is an MMIO mmap of PCI functions, which additionally may or
  86. may not be requested as using the WC attribute.
  87. If WC is requested, and the region in kern_memmap is either WC
  88. or UC, and the EFI memory map designates the region as WC, then
  89. the WC mapping is allowed.
  90. Otherwise, the user mapping must use the same attribute as the
  91. kernel mapping.
  92. read/write of /dev/mem
  93. This uses copy_from_user(), which implicitly uses a kernel
  94. identity mapping. This is obviously safe for things in
  95. kern_memmap.
  96. There may be corner cases of things that are not in kern_memmap,
  97. but could be accessed this way. For example, registers in MMIO
  98. space are not in kern_memmap, but could be accessed with a UC
  99. mapping. This would not cause attribute aliasing. But
  100. registers typically can be accessed only with four-byte or
  101. eight-byte accesses, and the copy_from_user() path doesn't allow
  102. any control over the access size, so this would be dangerous.
  103. ioremap()
  104. This returns a mapping for use inside the kernel.
  105. If the region is in kern_memmap, we should use the attribute
  106. specified there.
  107. If the EFI memory map reports that the entire granule supports
  108. WB, we should use that (granules that are partially reserved
  109. or occupied by firmware do not appear in kern_memmap).
  110. If the granule contains non-WB memory, but we can cover the
  111. region safely with kernel page table mappings, we can use
  112. ioremap_page_range() as most other architectures do.
  113. Failing all of the above, we have to fall back to a UC mapping.
  114. PAST PROBLEM CASES
  115. mmap of various MMIO regions from /dev/mem by "X" on Intel platforms
  116. The EFI memory map may not report these MMIO regions.
  117. These must be allowed so that X will work. This means that
  118. when the EFI memory map is incomplete, every /dev/mem mmap must
  119. succeed. It may create either WB or UC user mappings, depending
  120. on whether the region is in kern_memmap or the EFI memory map.
  121. mmap of 0x0-0x9FFFF /dev/mem by "hwinfo" on HP sx1000 with VGA enabled
  122. The EFI memory map reports the following attributes:
  123. 0x00000-0x9FFFF WB only
  124. 0xA0000-0xBFFFF UC only (VGA frame buffer)
  125. 0xC0000-0xFFFFF WB only
  126. This mmap is done with user pages, not kernel identity mappings,
  127. so it is safe to use WB mappings.
  128. The kernel VGA driver may ioremap the VGA frame buffer at 0xA0000,
  129. which uses a granule-sized UC mapping. This granule will cover some
  130. WB-only memory, but since UC is non-speculative, the processor will
  131. never generate an uncacheable reference to the WB-only areas unless
  132. the driver explicitly touches them.
  133. mmap of 0x0-0xFFFFF legacy_mem by "X"
  134. If the EFI memory map reports that the entire range supports the
  135. same attributes, we can allow the mmap (and we will prefer WB if
  136. supported, as is the case with HP sx[12]000 machines with VGA
  137. disabled).
  138. If EFI reports the range as partly WB and partly UC (as on sx[12]000
  139. machines with VGA enabled), we must fail the mmap because there's no
  140. safe attribute to use.
  141. If EFI reports some of the range but not all (as on Intel firmware
  142. that doesn't report the VGA frame buffer at all), we should fail the
  143. mmap and force the user to map just the specific region of interest.
  144. mmap of 0xA0000-0xBFFFF legacy_mem by "X" on HP sx1000 with VGA disabled
  145. The EFI memory map reports the following attributes:
  146. 0x00000-0xFFFFF WB only (no VGA MMIO hole)
  147. This is a special case of the previous case, and the mmap should
  148. fail for the same reason as above.
  149. read of /sys/devices/.../rom
  150. For VGA devices, this may cause an ioremap() of 0xC0000. This
  151. used to be done with a UC mapping, because the VGA frame buffer
  152. at 0xA0000 prevents use of a WB granule. The UC mapping causes
  153. an MCA on HP sx[12]000 chipsets.
  154. We should use WB page table mappings to avoid covering the VGA
  155. frame buffer.
  156. NOTES
  157. [1] SDM rev 2.2, vol 2, sec 4.4.1.
  158. [2] SDM rev 2.2, vol 2, sec 4.4.6.