pnfs.4 8.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229
  1. .\" Copyright (c) 2017 Rick Macklem
  2. .\"
  3. .\" Redistribution and use in source and binary forms, with or without
  4. .\" modification, are permitted provided that the following conditions
  5. .\" are met:
  6. .\" 1. Redistributions of source code must retain the above copyright
  7. .\" notice, this list of conditions and the following disclaimer.
  8. .\" 2. Redistributions in binary form must reproduce the above copyright
  9. .\" notice, this list of conditions and the following disclaimer in the
  10. .\" documentation and/or other materials provided with the distribution.
  11. .\"
  12. .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
  13. .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  14. .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  15. .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
  16. .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  17. .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
  18. .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
  19. .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
  20. .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
  21. .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
  22. .\" SUCH DAMAGE.
  23. .\"
  24. .Dd December 20, 2019
  25. .Dt PNFS 4
  26. .Os
  27. .Sh NAME
  28. .Nm pNFS
  29. .Nd NFS Version 4.1 and 4.2 Parallel NFS Protocol
  30. .Sh DESCRIPTION
  31. The NFSv4.1 and NFSv4.2 client and server provides support for the
  32. .Tn pNFS
  33. specification; see
  34. .%T "Network File System (NFS) Version 4 Minor Version 1 Protocol RFC 5661" ,
  35. .%T "Network File System (NFS) Version 4 Minor Version 2 Protocol RFC 7862" and
  36. .%T "Parallel NFS (pNFS) Flexible File Layout RFC 8435" .
  37. A pNFS service separates Read/Write operations from all other NFSv4.1 and
  38. NFSv4.2 operations, which are referred to as Metadata operations.
  39. The Read/Write operations are performed directly on the Data Server (DS)
  40. where the file's data resides, bypassing the NFS server.
  41. All other file operations are performed on the NFS server, which is referred to
  42. as a Metadata Server (MDS).
  43. NFS clients that do not support
  44. .Tn pNFS
  45. perform Read/Write operations on the MDS, which acts as a proxy for the
  46. appropriate DS(s).
  47. .Pp
  48. The NFSv4.1 and NFSv4.2 protocols provide two pieces of information to pNFS
  49. aware clients that allow them to perform Read/Write operations directly on
  50. the DS.
  51. .Pp
  52. The first is DeviceInfo, which is static information defining the DS
  53. server.
  54. The critical piece of information in DeviceInfo for the layout types
  55. supported by
  56. .Fx
  57. is the IP address that is used to perform RPCs on the DS.
  58. It also indicates which version of NFS the DS supports, I/O size and other
  59. layout specific information.
  60. In the DeviceInfo, there is a DeviceID which, for the
  61. .Fx
  62. server
  63. is unique to the DS configuration
  64. and changes whenever the
  65. .Xr nfsd
  66. daemon is restarted or the server is rebooted.
  67. .Pp
  68. The second is the layout, which is per file and references the DeviceInfo
  69. to use via the DeviceID.
  70. It is for a byte range of a file and is either Read or Read/Write.
  71. For the
  72. .Fx
  73. server, a layout covers all bytes of a file.
  74. A layout may be recalled by the MDS using a LayoutRecall callback.
  75. When a client returns a layout via the LayoutReturn operation it can
  76. indicate that error(s) were encountered while doing I/O on the DS,
  77. at least for certain layout types such as the Flexible File Layout.
  78. .Pp
  79. The
  80. .Fx
  81. client and server supports two layout types.
  82. .Pp
  83. The File Layout is described in RFC5661 and uses the NFSv4.1 or NFSv4.2 protocol
  84. to perform I/O on the DS.
  85. It does not support client aware DS mirroring and, as such,
  86. the
  87. .Fx
  88. server only provides File Layout support for non-mirrored
  89. configurations.
  90. .Pp
  91. The Flexible File Layout allows the use of the NFSv3, NFSv4.0, NFSv4.1 or
  92. NFSv4.2 protocol to perform I/O on the DS and does support client aware
  93. mirroring.
  94. As such, the
  95. .Fx
  96. server uses Flexible File Layout layouts for the
  97. mirrored DS configurations.
  98. The
  99. .Fx
  100. server supports the
  101. .Dq tightly coupled
  102. variant and all DSs allow use of the
  103. NFSv4.2 or NFSv4.1 protocol for I/O operations.
  104. Clients that support the Flexible File Layout will do writes and commits
  105. to all DS mirrors in the mirror set.
  106. .Pp
  107. A
  108. .Fx
  109. pNFS service consists of a single MDS server plus one or more
  110. DS servers, all of which are
  111. .Fx
  112. systems.
  113. For a non-mirrored configuration, the
  114. .Fx
  115. server will issue File Layout
  116. layouts by default.
  117. However that default can be set to the Flexible File Layout by setting the
  118. .Xr sysctl 8
  119. sysctl
  120. .Dq vfs.nfsd.default_flexfile
  121. to one.
  122. Mirrored server configurations will only issue Flexible File Layouts.
  123. .Tn pNFS
  124. clients mount the MDS as they would a single NFS server.
  125. .Pp
  126. A
  127. .Fx
  128. .Tn pNFS
  129. client must be running the
  130. .Xr nfscbd 8
  131. daemon and use the mount options
  132. .Dq nfsv4,minorversion=2,pnfs or
  133. .Dq nfsv4,minorversion=1,pnfs .
  134. .Pp
  135. When files are created, the MDS creates a file tree identical to what a
  136. single NFS server creates, except that all the regular (VREG) files will
  137. be empty.
  138. As such, if you look at the exported tree on the MDS directly
  139. on the MDS server (not via an NFS mount), the files will all be of size zero.
  140. Each of these files will also have two extended attributes in the system
  141. attribute name space:
  142. .Bd -literal -offset indent
  143. pnfsd.dsfile - This extended attribute stores the information that the
  144. MDS needs to find the data file on a DS(s) for this file.
  145. pnfsd.dsattr - This extended attribute stores the Size, AccessTime,
  146. ModifyTime, Change and SpaceUsed attributes for the file.
  147. .Ed
  148. .Pp
  149. For each regular (VREG) file, the MDS creates a data file on one
  150. (or on N of them for the mirrored case, where N is the mirror_level)
  151. of the DS(s) where the file's data will be stored.
  152. The name of this file is
  153. the file handle of the file on the MDS in hexadecimal at time of file creation.
  154. The data file will have the same file ownership, mode and NFSv4 ACL
  155. (if ACLs are enabled for the file system) as the file on the MDS, so that
  156. permission checking can be done on the DS.
  157. This is referred to as
  158. .Dq tightly coupled
  159. for the Flexible File Layout.
  160. .Pp
  161. For
  162. .Tn pNFS
  163. aware clients, the service generates File Layout
  164. or Flexible File Layout
  165. layouts and associated DeviceInfo.
  166. For non-pNFS aware NFS clients, the pNFS service appears just like a normal
  167. NFS service.
  168. For the non-pNFS aware client, the MDS will perform I/O operations on the
  169. appropriate DS(s), acting as
  170. a proxy for the non-pNFS aware client.
  171. This is also true for NFSv3 and NFSv4.0 mounts, since these are always non-pNFS
  172. aware.
  173. .Pp
  174. It is possible to assign a DS to an MDS exported file system so that it will
  175. store data for files on the MDS exported file system.
  176. If a DS is not assigned to an MDS exported file system, it will store data
  177. for files on all exported file systems on the MDS.
  178. .Pp
  179. If mirroring is enabled, the pNFS service will continue to function when
  180. DS(s) have failed, so long is there is at least one DS still operational
  181. that stores data for files on all of the MDS exported file systems.
  182. After a disabled mirrored DS is repaired, it is possible to recover the DS
  183. as a mirror while the pNFS service continues to function.
  184. .Pp
  185. See
  186. .Xr pnfsserver 4
  187. for information on how to set up a
  188. .Fx
  189. pNFS service.
  190. .Sh SEE ALSO
  191. .Xr nfsv4 4 ,
  192. .Xr pnfsserver 4 ,
  193. .Xr exports 5 ,
  194. .Xr fstab 5 ,
  195. .Xr rc.conf 5 ,
  196. .Xr nfscbd 8 ,
  197. .Xr nfsd 8 ,
  198. .Xr nfsuserd 8 ,
  199. .Xr pnfsdscopymr 8 ,
  200. .Xr pnfsdsfile 8 ,
  201. .Xr pnfsdskill 8
  202. .Sh BUGS
  203. Linux kernel versions prior to 4.12 only supports NFSv3 DSs in its client
  204. and will do all I/O through the MDS.
  205. For Linux 4.12 kernels, support for NFSv4.1 DSs was added, but I have seen
  206. Linux client crashes when testing this client.
  207. For Linux 4.17-rc2 kernels, I have not seen client crashes during testing,
  208. but it only supports the
  209. .Dq loosely coupled
  210. variant.
  211. To make it work correctly when mounting the
  212. .Fx
  213. server, you must
  214. set the sysctl
  215. .Dq vfs.nfsd.flexlinuxhack
  216. to one so that it works around
  217. the Linux client driver's limitations.
  218. Wihout this sysctl being set, there will be access errors, since the Linux
  219. client will use the authenticator in the layout (uid=999, gid=999) and not
  220. the authenticator specified in the RPC header.
  221. .Pp
  222. Linux 5.n kernels appear to be patched so that it uses the authenticator
  223. in the RPC header and, as such, the above sysctl should not need to be set.
  224. .Pp
  225. Since the MDS cannot be mirrored, it is a single point of failure just
  226. as a non
  227. .Tn pNFS
  228. server is.