123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150 |
- Making Filesystems Exportable
- =============================
- Overview
- --------
- All filesystem operations require a dentry (or two) as a starting
- point. Local applications have a reference-counted hold on suitable
- dentries via open file descriptors or cwd/root. However remote
- applications that access a filesystem via a remote filesystem protocol
- such as NFS may not be able to hold such a reference, and so need a
- different way to refer to a particular dentry. As the alternative
- form of reference needs to be stable across renames, truncates, and
- server-reboot (among other things, though these tend to be the most
- problematic), there is no simple answer like 'filename'.
- The mechanism discussed here allows each filesystem implementation to
- specify how to generate an opaque (outside of the filesystem) byte
- string for any dentry, and how to find an appropriate dentry for any
- given opaque byte string.
- This byte string will be called a "filehandle fragment" as it
- corresponds to part of an NFS filehandle.
- A filesystem which supports the mapping between filehandle fragments
- and dentries will be termed "exportable".
- Dcache Issues
- -------------
- The dcache normally contains a proper prefix of any given filesystem
- tree. This means that if any filesystem object is in the dcache, then
- all of the ancestors of that filesystem object are also in the dcache.
- As normal access is by filename this prefix is created naturally and
- maintained easily (by each object maintaining a reference count on
- its parent).
- However when objects are included into the dcache by interpreting a
- filehandle fragment, there is no automatic creation of a path prefix
- for the object. This leads to two related but distinct features of
- the dcache that are not needed for normal filesystem access.
- 1/ The dcache must sometimes contain objects that are not part of the
- proper prefix. i.e that are not connected to the root.
- 2/ The dcache must be prepared for a newly found (via ->lookup) directory
- to already have a (non-connected) dentry, and must be able to move
- that dentry into place (based on the parent and name in the
- ->lookup). This is particularly needed for directories as
- it is a dcache invariant that directories only have one dentry.
- To implement these features, the dcache has:
- a/ A dentry flag DCACHE_DISCONNECTED which is set on
- any dentry that might not be part of the proper prefix.
- This is set when anonymous dentries are created, and cleared when a
- dentry is noticed to be a child of a dentry which is in the proper
- prefix.
- b/ A per-superblock list "s_anon" of dentries which are the roots of
- subtrees that are not in the proper prefix. These dentries, as
- well as the proper prefix, need to be released at unmount time. As
- these dentries will not be hashed, they are linked together on the
- d_hash list_head.
- c/ Helper routines to allocate anonymous dentries, and to help attach
- loose directory dentries at lookup time. They are:
- d_obtain_alias(inode) will return a dentry for the given inode.
- If the inode already has a dentry, one of those is returned.
- If it doesn't, a new anonymous (IS_ROOT and
- DCACHE_DISCONNECTED) dentry is allocated and attached.
- In the case of a directory, care is taken that only one dentry
- can ever be attached.
- d_splice_alias(inode, dentry) will introduce a new dentry into the tree;
- either the passed-in dentry or a preexisting alias for the given inode
- (such as an anonymous one created by d_obtain_alias), if appropriate.
- It returns NULL when the passed-in dentry is used, following the calling
- convention of ->lookup.
-
- Filesystem Issues
- -----------------
- For a filesystem to be exportable it must:
-
- 1/ provide the filehandle fragment routines described below.
- 2/ make sure that d_splice_alias is used rather than d_add
- when ->lookup finds an inode for a given parent and name.
- If inode is NULL, d_splice_alias(inode, dentry) is equivalent to
- d_add(dentry, inode), NULL
- Similarly, d_splice_alias(ERR_PTR(err), dentry) = ERR_PTR(err)
- Typically the ->lookup routine will simply end with a:
- return d_splice_alias(inode, dentry);
- }
- A file system implementation declares that instances of the filesystem
- are exportable by setting the s_export_op field in the struct
- super_block. This field must point to a "struct export_operations"
- struct which has the following members:
- encode_fh (optional)
- Takes a dentry and creates a filehandle fragment which can later be used
- to find or create a dentry for the same object. The default
- implementation creates a filehandle fragment that encodes a 32bit inode
- and generation number for the inode encoded, and if necessary the
- same information for the parent.
- fh_to_dentry (mandatory)
- Given a filehandle fragment, this should find the implied object and
- create a dentry for it (possibly with d_obtain_alias).
- fh_to_parent (optional but strongly recommended)
- Given a filehandle fragment, this should find the parent of the
- implied object and create a dentry for it (possibly with
- d_obtain_alias). May fail if the filehandle fragment is too small.
- get_parent (optional but strongly recommended)
- When given a dentry for a directory, this should return a dentry for
- the parent. Quite possibly the parent dentry will have been allocated
- by d_alloc_anon. The default get_parent function just returns an error
- so any filehandle lookup that requires finding a parent will fail.
- ->lookup("..") is *not* used as a default as it can leave ".." entries
- in the dcache which are too messy to work with.
- get_name (optional)
- When given a parent dentry and a child dentry, this should find a name
- in the directory identified by the parent dentry, which leads to the
- object identified by the child dentry. If no get_name function is
- supplied, a default implementation is provided which uses vfs_readdir
- to find potential names, and matches inode numbers to find the correct
- match.
- A filehandle fragment consists of an array of 1 or more 4byte words,
- together with a one byte "type".
- The decode_fh routine should not depend on the stated size that is
- passed to it. This size may be larger than the original filehandle
- generated by encode_fh, in which case it will have been padded with
- nuls. Rather, the encode_fh routine should choose a "type" which
- indicates the decode_fh how much of the filehandle is valid, and how
- it should be interpreted.
|