ceph.txt 5.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153
  1. Ceph Distributed File System
  2. ============================
  3. Ceph is a distributed network file system designed to provide good
  4. performance, reliability, and scalability.
  5. Basic features include:
  6. * POSIX semantics
  7. * Seamless scaling from 1 to many thousands of nodes
  8. * High availability and reliability. No single point of failure.
  9. * N-way replication of data across storage nodes
  10. * Fast recovery from node failures
  11. * Automatic rebalancing of data on node addition/removal
  12. * Easy deployment: most FS components are userspace daemons
  13. Also,
  14. * Flexible snapshots (on any directory)
  15. * Recursive accounting (nested files, directories, bytes)
  16. In contrast to cluster filesystems like GFS, OCFS2, and GPFS that rely
  17. on symmetric access by all clients to shared block devices, Ceph
  18. separates data and metadata management into independent server
  19. clusters, similar to Lustre. Unlike Lustre, however, metadata and
  20. storage nodes run entirely as user space daemons. Storage nodes
  21. utilize btrfs to store data objects, leveraging its advanced features
  22. (checksumming, metadata replication, etc.). File data is striped
  23. across storage nodes in large chunks to distribute workload and
  24. facilitate high throughputs. When storage nodes fail, data is
  25. re-replicated in a distributed fashion by the storage nodes themselves
  26. (with some minimal coordination from a cluster monitor), making the
  27. system extremely efficient and scalable.
  28. Metadata servers effectively form a large, consistent, distributed
  29. in-memory cache above the file namespace that is extremely scalable,
  30. dynamically redistributes metadata in response to workload changes,
  31. and can tolerate arbitrary (well, non-Byzantine) node failures. The
  32. metadata server takes a somewhat unconventional approach to metadata
  33. storage to significantly improve performance for common workloads. In
  34. particular, inodes with only a single link are embedded in
  35. directories, allowing entire directories of dentries and inodes to be
  36. loaded into its cache with a single I/O operation. The contents of
  37. extremely large directories can be fragmented and managed by
  38. independent metadata servers, allowing scalable concurrent access.
  39. The system offers automatic data rebalancing/migration when scaling
  40. from a small cluster of just a few nodes to many hundreds, without
  41. requiring an administrator carve the data set into static volumes or
  42. go through the tedious process of migrating data between servers.
  43. When the file system approaches full, new nodes can be easily added
  44. and things will "just work."
  45. Ceph includes flexible snapshot mechanism that allows a user to create
  46. a snapshot on any subdirectory (and its nested contents) in the
  47. system. Snapshot creation and deletion are as simple as 'mkdir
  48. .snap/foo' and 'rmdir .snap/foo'.
  49. Ceph also provides some recursive accounting on directories for nested
  50. files and bytes. That is, a 'getfattr -d foo' on any directory in the
  51. system will reveal the total number of nested regular files and
  52. subdirectories, and a summation of all nested file sizes. This makes
  53. the identification of large disk space consumers relatively quick, as
  54. no 'du' or similar recursive scan of the file system is required.
  55. Mount Syntax
  56. ============
  57. The basic mount syntax is:
  58. # mount -t ceph monip[:port][,monip2[:port]...]:/[subdir] mnt
  59. You only need to specify a single monitor, as the client will get the
  60. full list when it connects. (However, if the monitor you specify
  61. happens to be down, the mount won't succeed.) The port can be left
  62. off if the monitor is using the default. So if the monitor is at
  63. 1.2.3.4,
  64. # mount -t ceph 1.2.3.4:/ /mnt/ceph
  65. is sufficient. If /sbin/mount.ceph is installed, a hostname can be
  66. used instead of an IP address.
  67. Mount Options
  68. =============
  69. ip=A.B.C.D[:N]
  70. Specify the IP and/or port the client should bind to locally.
  71. There is normally not much reason to do this. If the IP is not
  72. specified, the client's IP address is determined by looking at the
  73. address its connection to the monitor originates from.
  74. wsize=X
  75. Specify the maximum write size in bytes. By default there is no
  76. maximum. Ceph will normally size writes based on the file stripe
  77. size.
  78. rsize=X
  79. Specify the maximum read size in bytes. By default there is no
  80. maximum.
  81. rasize=X
  82. Specify the maximum readahead.
  83. mount_timeout=X
  84. Specify the timeout value for mount (in seconds), in the case
  85. of a non-responsive Ceph file system. The default is 30
  86. seconds.
  87. rbytes
  88. When stat() is called on a directory, set st_size to 'rbytes',
  89. the summation of file sizes over all files nested beneath that
  90. directory. This is the default.
  91. norbytes
  92. When stat() is called on a directory, set st_size to the
  93. number of entries in that directory.
  94. nocrc
  95. Disable CRC32C calculation for data writes. If set, the storage node
  96. must rely on TCP's error correction to detect data corruption
  97. in the data payload.
  98. dcache
  99. Use the dcache contents to perform negative lookups and
  100. readdir when the client has the entire directory contents in
  101. its cache. (This does not change correctness; the client uses
  102. cached metadata only when a lease or capability ensures it is
  103. valid.)
  104. nodcache
  105. Do not use the dcache as above. This avoids a significant amount of
  106. complex code, sacrificing performance without affecting correctness,
  107. and is useful for tracking down bugs.
  108. noasyncreaddir
  109. Do not use the dcache as above for readdir.
  110. More Information
  111. ================
  112. For more information on Ceph, see the home page at
  113. http://ceph.newdream.net/
  114. The Linux kernel client source tree is available at
  115. git://ceph.newdream.net/git/ceph-client.git
  116. git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git
  117. and the source for the full system is at
  118. git://ceph.newdream.net/git/ceph.git