This is an NFS client for Linux that supports async RPC calls for read-ahead (and hopefully soon, write-back) on regular files. The implementation uses a straightforward nfsiod scheme. After trying out a number of different concepts, I finally got back to this concept, because everything else either didn't work or gave me headaches. It's not flashy, but it works without hacking into any other regions of the kernel. HOW TO USE This stuff compiles as a loadable module (I developed it on 1.3.77). Simply type mkmodule, and insmod nfs.o. This will start four nfsiod's at the same time (which will show up under the pseudonym of insmod in ps-style listings). Alternatively, you can put it right into the kernel: remove everything from fs/nfs, move the Makefile and all *.c to this directory, and copy all *.h files to include/linux. After mounting, you should be able to watch (with tcpdump) several RPC READ calls being placed simultaneously. HOW IT WORKS When a process reads from a file on an NFS volume, the following happens: * nfs_file_read sets file->f_reada if more than 1K is read at once. It then calls generic_file_read. * generic_file_read requests one ore more pages via nfs_readpage. * nfs_readpage allocates a request slot with an nfsiod daemon, fills in the READ request, sends out the RPC call, kicks the daemon, and returns. If there's no free biod, nfs_readpage places the call directly, waiting for the reply (sync readpage). * nfsiod calls nfs_rpc_doio to collect the reply. If the call was successful, it sets page->uptodate and wakes up all processes waiting on page->wait; This is the rough outline only. There are a few things to note: * Async RPC will not be tried when server->rsize < PAGE_SIZE. * When an error occurs, nfsiod has no way of returning the error code to the user process. Therefore, it flags page->error and wakes up all processes waiting on that page (they usually do so from within generic_readpage). generic_readpage finds that the page is still not uptodate, and calls nfs_readpage again. This time around, nfs_readpage notices that page->error is set and unconditionally does a synchronous RPC call. This area needs a lot of improvement, since read errors are not that uncommon (e.g. we have to retransmit calls if the fsuid is different from the ruid in order to cope with root squashing and stuff like this). Retransmits with fsuid/ruid change should be handled by nfsiod, but this doesn't come easily (a more general nfs_call routine that does all this may be useful...) * To save some time on readaheads, we save one data copy by frobbing the page into the iovec passed to the RPC code so that the networking layer copies the data into the page directly. This needs to be adjustable (different authentication flavors; AUTH_NULL versus AUTH_SHORT verifiers). * Currently, a fixed number of nfsiod's is spawned from within init_nfs_fs. This is problematic when running as a loadable module, because this will keep insmod's memory allocated. As a side-effect, you will see the nfsiod processes listed as several insmod's when doing a `ps.' * This NFS client implements server congestion control via Van Jacobson slow start as implemented in 44BSD. I haven't checked how well this behaves, but since Rick Macklem did it this way, it should be okay :-) WISH LIST After giving this thing some testing, I'd like to add some more features: * Some sort of async write handling. True write-back doesn't work with the current kernel (I think), because invalidate_pages kills all pages, regardless of whether they're dirty or not. Besides, this may require special bdflush treatment because write caching on clients is really hairy. Alternatively, a write-through scheme might be useful where the client enqueues the request, but leaves collecting the results to nfsiod. Again, we need a way to pass RPC errors back to the application. * Support for different authentication flavors. * /proc/net/nfsclnt (for nfsstat, etc.). March 29, 1996 Olaf Kirch <okir@monad.swb.de>