root/fs/nfs/README

/* [previous][next][first][last][top][bottom][index][help] */


    This is an NFS client for Linux that supports async RPC calls for
    read-ahead (and hopefully soon, write-back) on regular files. 

    The implementation uses a straightforward nfsiod scheme.  After
    trying out a number of different concepts, I finally got back to
    this concept, because everything else either didn't work or gave me
    headaches. It's not flashy, but it works without hacking into any
    other regions of the kernel.


    HOW TO USE

    This stuff compiles as a loadable module (I developed it on 1.3.77).
    Simply type mkmodule, and insmod nfs.o. This will start for nfsiod's
    at the same time (which will show up under the pseudonym of insmod in
    ps-style listings).

    Alternatively, you can put it right into the kernel: remove everything
    from fs/nfs, move the Makefile and all *.c to this directory, and
    copy all *.h files to include/linux.

    After mounting, you should be able to watch (with tcpdump) several
    RPC READ calls being placed simultaneously.


    HOW IT WORKS

    When a process reads from a file on an NFS volume, the following
    happens:

     *  nfs_file_read sets file->f_reada if more than 1K is
        read at once. It then calls generic_file_read.

     *  generic_file_read requests one ore more pages via
        nfs_readpage.

     *  nfs_readpage allocates a request slot with an nfsiod
        daemon, fills in the READ request, sends out the
        RPC call, kicks the daemon, and returns.
        If there's no free biod, nfs_readpage places the
        call directly, waiting for the reply (sync readpage).

     *  nfsiod calls nfs_rpc_doio to collect the reply. If the
        call was successful, it sets page->uptodate and
        wakes up all processes waiting on page->wait;

    This is the rough outline only. There are a few things to note:

     *  Async RPC will not be tried when server->rsize < PAGE_SIZE.

     *  When an error occurs, nfsiod has no way of returning
        the error code to the user process. Therefore, it flags
        page->error and wakes up all processes waiting on that
        page (they usually do so from within generic_readpage).

        generic_readpage finds that the page is still not
        uptodate, and calls nfs_readpage again. This time around,
        nfs_readpage notices that page->error is set and
        unconditionally does a synchronous RPC call.

        This area needs a lot of improvement, since read errors
        are not that uncommon (e.g. we have to retransmit calls
        if the fsuid is different from the ruid in order to
        cope with root squashing and stuff like this).

        Retransmits with fsuid/ruid change should be handled by
        nfsiod, but this doesn't come easily (a more general nfs_call
        routine that does all this may be useful...)

     *  To save some time on readaheads, we save one data copy
        by frobbing the page into the iovec passed to the
        RPC code so that the networking layer copies the
        data into the page directly.

        This needs to be adjustable (different authentication
        flavors; AUTH_NULL versus AUTH_SHORT verifiers).

     *  Currently, a fixed number of nfsiod's is spawned from
        within init_nfs_fs. This is problematic when running
        as a loadable module, because this will keep insmod's
        memory allocated. As a side-effect, you will see the
        nfsiod processes listed as several insmod's when doing
        a `ps.'

     *  This NFS client implements server congestion control via
        Van Jacobson slow start as implemented in 44BSD. I haven't
        checked how well this behaves, but since Rick Macklem did
        it this way, it should be okay :-)


    WISH LIST

    After giving this thing some testing, I'd like to add some more
    features:

     *  Some sort of async write handling. True write-back doesn't
        work with the current kernel (I think), because invalidate_pages
        kills all pages, regardless of whether they're dirty or not.
        Besides, this may require special bdflush treatment because
        write caching on clients is really hairy.

        Alternatively, a write-through scheme might be useful where
        the client enqueues the request, but leaves collecting the
        results to nfsiod. Again, we need a way to pass RPC errors
        back to the application.

     *  Support for different authentication flavors.

     *  /proc/net/nfsclnt (for nfsstat, etc.).

March 29, 1996
Olaf Kirch <okir@monad.swb.de>

/* [previous][next][first][last][top][bottom][index][help] */