1 General

The General Section of the AFSFrequentlyAskedQuestions.

1.01 What is AFS?

AFS is a distributed filesystem that enables co-operating hosts (clients and servers) to efficiently share filesystem resources across both local area and wide area networks.

AFS is based on a distributed file system originally developed at the Information Technology Center at Carnegie-Mellon University that was called the "Andrew File System".

"Andrew" was the name of the research project at CMU - honouring the founders of the University. A spin-off company, Transarc Corporation (now part of IBM), started producing and marketing a commercial version of AFS in 1989.

In November, 2000, IBM Open-Sourced AFS, creating OpenAFS. OpenAFS is under active development; as of this writing, the 1.6.4 release is being prepared for Unix-like platforms, and the 1.7 series for Windows is updated regularly.

1.02 Who supplies AFS?

OpenAFS is available from the OpenAFS website. An independent open source project, Arla, supports some clients that OpenAFS does not, but has not been updated since release 0.90 in early 2007; the release of OpenAFS largely removed the reasons for the development of Arla and its sister project Milka (an open source AFS server).

There is also an incomplete kernel-based AFS client (only) for Linux, maintained by Red Hat Software.

IBM no longer markets AFS and has declared an end-of-life for support.

OpenAFS WWW: http://www.openafs.org/
Arla WWW: http://www.stacken.kth.se/projekt/arla/
kAFS see Documentation/filesystems/afs.txt in the Linux kernel source tree

1.03 What is /afs?

The root of the AFS filetree is /afs. If you execute ls /afs, you will see directories that correspond to AFS cells (see below). These cells may be local (on same LAN) or remote (e.g. halfway around the world).

With AFS you can access all the filesystem space under /afs with commands you already use (e.g. cd, cp, rm, and so on) provided you have been granted permission (see AFS ACL below).

1.04 What is an AFS cell?

An AFS cell is a collection of servers grouped together administratively and presenting a single, cohesive filesystem. Typically, an AFS cell is a set of hosts that use the same Internet domain name.

Normally, a variation of the domain name is used as the AFS cell name.

Users log into AFS client workstations which request information and files from the cell's servers on behalf of the users.

1.05 What are the benefits of using AFS?

The main strengths of AFS are its:

  • caching facility
  • security features
  • simplicity of addressing
  • scalability
  • communications protocol

Here are some of the advantages of using AFS in more detail:

1.05.a Cache Manager

AFS client machines run a Cache Manager process. The Cache Manager maintains information about the identities of the users logged into the machine, finds and requests data on their behalf, and keeps chunks of retrieved files on local disk.

The effect of this is that as soon as a remote file is accessed a chunk of that file gets copied to local disk and so subsequent accesses (warm reads) are almost as fast as to local disk and considerably faster than a cold read (across the network).

Local caching also significantly reduces the amount of network traffic, improving performance when a cold read is necessary.

Many modern NFS implementations provide metadata caching, but this caching is limited and the protocol support for it is somewhat weak, with the result that cached NFS can get out of sync with the server. NFS does not support file data caching at all, although some operating systems can be configured to make use of a separate cache filesystem module which must be configured separately from NFS on each client workstation and for each NFS mountpoint. As this cache is separate, it can only avoid becoming out of sync with the remote filesystem at the price of extra validation and additional network traffic to detect updates every time the file is accessed (or, if there is a delay set to minimize this traffic, it will "miss" remote changes made within that window); OpenAFS's integrated cache, by comparison, will be notified of changes on the fileserver (by means of "callback breaking") and only needs to check explicitly for remote updates if the callback has expired (roughly 2 hours). In addition, for files on a read-only volume, it is sufficient to check for a volume update to revalidate all locally cached files on that volume.

1.05.b Location independence

Unlike NFS, which makes use of a per-client mount table (such as /etc/filesystems on AIX or /etc/fstab on Linux) to map (mount) between a local directory name and a remote filesystem, AFS does its mapping (filename to location) at the server. This has the tremendous advantage of making the served filespace location independent.

Location independence means that a user does not need to know which fileserver holds the file, the user only needs to know the pathname of a file. Of course, the user does need to know the name of the AFS cell to which the file belongs. Use of the AFS cellname as the second part of the pathname (e.g. /afs/$AFSCELL/somefile) is helpful to distinguish between file namespaces of local and non-local AFS cells.

To understand why such location independence is useful, consider having 20 clients and 2 servers. Let's say you had to move a filesystem /home from server a to server b.

Using NFS, you would have to change the /etc/fstab file on 20 clients and take /home off-line while you moved it between servers.

With AFS, you simply move the AFS volume(s) which constitute /home between the servers. You do this "on-line" while users are actively using files in /home with no disruption to their work.

(Actually, the AFS equivalent of /home would be /afs/$AFSCELL/home, where $AFSCELL is the AFS cellname.)

1.05.c Scalability

With location independence comes scalability. An architectural goal of the AFS designers was client/server ratios of 200:1; some sites exceed this ratio. Exactly what ratio your cell can use depends on many factors including:

  • number of AFS files
  • size of AFS files
  • rate at which changes are made
  • rate at which file are being accessed
  • speed of server's processor(s)
  • I/O rates
  • network bandwidth

AFS cells can range from the small (1 server/client) to the massive (with hundreds of servers and thousands of clients). Cells can be dynamic: it is simple to add new fileservers or clients and grow the computing resources to meet new user requirements.

1.05.d Improved security

Firstly, AFS makes use of Kerberos to authenticate users. This improves security for several reasons:

  • passwords do not pass across the network in plaintext

  • encrypted passwords no longer need to be visible

    • You don't have to use NIS, aka yellow pages, to distribute /etc/passwd - thus "ypcat passwd" can be eliminated.
    • If you do choose to use NIS, you can replace the password field with "X" so the encrypted password is not visible. (These issues are discussed in detail in [AdminGuide]).
  • AFS uses mutual authentication - both the service provider and service requester prove their identities

Secondly, AFS uses access control lists (ACLs) to enable users to restrict access to their own directories.

Some (not all) implmentations of NFS version 3 can use Kerberos authentication; NFS version 4 adds ACLs, but not all NFS4 implementations interoperate well. Additionally, configuring NFS to use Kerberos, even when it is supported, is often painful and can lead to interoperability problems.

1.05.e Single systems image (SSI)

Establishing the same view of filestore from each client and server in a network of systems (that comprise an AFS cell) is an order of magnitude simpler with AFS than it is with, say, NFS.

This is useful to do because it enables users to move from workstation to workstation and still have the same view of filestore. It also simplifies part of the systems management workload.

In addition, because AFS works well over wide area networks, the SSI is also accessible remotely.

As an example, consider a company with two widespread divisions (and two AFS cells): ny.acme.com and sf.acme.com. Mr. Fudd, based in the New York office, is visiting the San Francisco office.

Mr. Fudd can then use any AFS client workstation in the San Francisco office that he can log into (a unprivileged guest account would suffice). He could authenticate himself to the ny.acme.com cell and securely access his New York filespace, and he doesn't need to remember a different path even though he's working from a remote cell.

For example:

The following shows a guest in the sf.acme.com AFS cell:

  1. add AFS executables directory to PATH
  2. obtaining a PAG with pagsh command (see 2.06)
  3. use kinit and aklog to authenticate into the ny.acme.com AFS cell
  4. making a HOME away from home
  5. invoking a homely .profile

    guest@toontown.sf.acme.com $ PATH=/usr/afsws/bin:$PATH; export PATH    # {1}
    guest@toontown.sf.acme.com $ pagsh                                     # {2}
    $ kinit elmer@NY.ACME.COM                                              # {3}
    Password for elmer@NY.ACME.COM:
    $ aklog -cell ny.acme.com
    $ HOME=/afs/ny.acme.com/user/elmer; export HOME                        # {4}
    $ cd
    $ . .profile                                                           # {5}
    you have new mail
    guest@toontown $ _
    

It is not necessary for the San Francisco system administrator to give Mr. Fudd an AFS account in the sf.acme.com cell. Mr. Fudd only needs to be able to log into an AFS client that is:

  1. on the same network as his cell and
  2. his ny.acme.com cell is mounted in the sf.acme.com cell (as would certainly be the case in a company with two cells).

1.05.f Replicated AFS volumes

AFS files are stored in structures called volumes. These volumes reside on the disks of the AFS file server machines. Volumes containing frequently accessed data can be read-only replicated on several servers.

Cache managers (on users client workstations) will make use of replicate volumes to load balance. If accessing data from one replicate copy, and that copy becomes unavailable due to server or network problems, AFS will automatically start accessing the same data from a different replicate copy.

An AFS client workstation will access the closest volume copy. By placing replicate volumes on servers closer to clients (eg on same physical LAN) access to those resources is improved and network traffic reduced.

1.05.g Improved robustness to server crash

The Cache Manager maintains local copies of remotely accessed files. This is accomplished in the cache by breaking files into chunks of up to 64k (default chunk size). So, for a large file, there may be several chunks in the cache but a small file will occupy a single chunk (which will be only as big as is needed).

A "working set" of files that have been accessed on the client is established locally in the client's cache (copied from fileserver(s)).

If a fileserver crashes, the client's locally cached file copies remain readable but updates to cached files fail while the server is down.

Also, if the AFS configuration has included replicated read-only volumes then alternate fileservers can satisfy requests for files from those volumes.

1.05.h "Easy to use" networking

Accessing remote file resources via the network becomes much simpler when using AFS. Users have much less to worry about: want to move a file from a remote site? Just copy it to a different part of /afs.

Once you have wide-area AFS in place, you don't have to keep local copies of files. Let AFS fetch and cache those files when you need them.

1.05.i Communications protocol

The AFS communications protocol is optimized for Wide Area Networks. Retransmitting only the single bad packet in a batch of packets and allowing the number of unacknowledged packets to be higher (than in other protocols, see [Johnson90]).

1.05.j Improved system management capability

Systems administrators are able to make configuration changes from any client in the AFS cell (it is not necessary to login to a fileserver).

With AFS it is simple to effect changes without having to take systems off-line.

Example:

A department (with its own AFS cell) was relocated to another office. The cell had several fileservers and many clients. How could they move their systems without causing disruption?

First, the network infrastructure was established to the new location. The AFS volumes on one fileserver were migrated to the other fileservers. The "freed up" fileserver was moved to the new office and connected to the network.

A second fileserver was "freed up" by moving its AFS volumes across the network to the first fileserver at the new office. The second fileserver was then moved.

This process was repeated until all the fileservers were moved.

All this happened with users on client workstations continuing to use the cell's filespace. Unless a user saw a fileserver being physically moved (s)he would have no way to tell the change had taken place.

Finally, the AFS clients were moved - this was noticed!

1.06 Which systems is AFS available for?

OpenAFS, as of the 1.6.2 release for Unix and 1.7.24 for Windows, is currently available in binary releases for:

  • IBM AIX 5.3, 6.1
  • Fedora Core 15, 16, 17, 18 (Intel)
  • ?FreeBSD 8.2, 8.3, 9.0, 9.1 (Intel)
  • ?MacOS X 10.6-10.8 (Intel) (Snow Leopard, Lion, Mountain Lion)
  • RHEL 5, 6 (Intel)
  • SuSE Enterprise 10, 11 (Intel)
  • Solaris 10, 11 (Sparc and Intel)
  • Windows 2000/XP/2003/Vista/7

These are only the platforms for which official binary releases are prepared; it can be built from source for a large number of additional platforms including HP-UX, SGI, OpenBSD, NetBSD, and older releases and other CPU architectures of supported platforms.

1.07 What does ls /afs display in the Internet AFS filetree?

Essentially this displays the AFS cells that co-operate in the Internet AFS filetree.

Note that the output of this will depend on the cell you do it from; a given cell may not have all the publicly advertised cells available, and it may have some cells that aren't advertised outside of the given site.

The definitive source for this information is /afs/grand.central.org/service/CellServDB.

Note that it is also possible to use AFS "behind the firewall" within the confines of your organization's network - you don't have to participate in the Internet AFS filetree.

Indeed, there are lots of benefits of using AFS on a local area network without using the WAN capabilities.

1.08 Why does AFS use Kerberos authentication?

It improves security.

Kerberos uses the idea of a trusted third party to prove identification. This is a bit like using a letter of introduction or quoting a referee who will vouch for you.

When a user authenticates using the klog command (s)he is prompted for a password. If the password is accepted the Kerberos server provides the user with an encrypted token (containing a "ticket granting ticket").

From that point on, it is the encrypted token that is used to prove the user's identity. These tokens have a limited lifetime (typically a day) and are useless when expired.

In AFS, it is possible to authenticate into multiple AFS cells. A summary of the current set of tokens held can be displayed by using the "tokens" command.

For example:

elmer@toontown $ tokens

Tokens held by the Cache Manager:

User's (AFS ID 9997) tokens for afs@ny.acme.com [Expires Sep 15 06:50]
User's (AFS ID 5391) tokens for afs@sf.acme.com [Expires Sep 15 06:48]
   --End of list--

Kerberos improves security because a users's password need only be entered once (at kinit time).

AFS uses Kerberos to do complex mutual authentication which means that both the service requester and the service provider have to prove their identities before a service is granted.

Originally AFS shipped with its own version of Kerberos 4, called kaserver. kaserver still ships at this time (1.6.2 release), but is deprecated in favor of using a true Kerberos 5 implementation. OpenAFS does not currently ship with a Kerberos 5 implementation; it is up to the administrator(s) to choose a version (MIT krb5, Heimdal, Active Directory, etc) and install it. OpenAFS will happily work with any KDC.

For more detail on this and other Kerberos issues see the faq for Kerberos (posted to news.answers and comp.protocols.kerberos) [Jaspan]. (Also, see [Miller87], [Bryant88], [Bellovin90], [Steiner88])

1.09 Does AFS work over protocols other than UDP?

No. AFS was designed to work over UDP, and does not use TCP.

There is some work being done (see here) to allow AFS to make use of other network transports, including TCP, but this is still experimental and undergoing development.

1.10 How can I access AFS from my PC?

You can use OpenAFS for Windows client. In the past year it has become very stable and robust. ?OAfW works with Kerberos for Windows in much the same way the Unix clients work with Kerberos.

There is also Samba, an SMB server for UNIX. There are several ways to integrate AFS with Samba. See SMBtoAFS.

Mac OS X and Linux users might find sshfs useful in some circumstances.

1.11 How does AFS compare with NFS?

  AFS NFS
File Access Common name space from all workstations Different file names from different workstations
File Location Tracking Automatic tracking by file system processes and databases Mountpoints to files set by administrators and users
Performance Client caching to reduce network load; callbacks to maintain cache consistency No local disk caching without local configuration of `cachefs`; limited cache consistency
Andrew Benchmark (5 phases, 8 clients) Average time of 210 seconds/client Average time of 280 seconds/client
Scaling capabilities Maintains performance in small and very large installations Best in small to mid-size installations
  Excellent performance on wide-area configuration Best in local-area configurations
Security Kerberos mutual authentication Security based on unencrypted user ID's
  Access control lists on directories for user and group access No access control lists
Availability Replicates read-mostly data and AFS system information No replication
Backup Operation No system downtime with specially developed AFS Backup System Standard UNIX backup system
Reconfiguration By volumes (groups of files) Per-file movement
  No user impact; files remain accessible during moves, and file names do not change Users lose access to files and filenames change (mountpoints need to be reset)
System Management Most tasks performed from any workstation Frequently involves telnet to other workstations
Autonomous Architecture Autonomous administrative units called cells, in addition to file servers and clients File servers and clients
  No trust required between cells No security distinctions between sites
[ source: ftp://ftp.transarc.com/pub/afsps/doc/afs-nfs.comparison ]

Other points:

  • Some vendors offer more secure versions of NFS but implementations vary. Many NFS ports have no extra security features (such as Kerberos).

  • The AFS Cache Manager can be configured to work with a RAM (memory) based cache. This offers signifigant performance benefits over a disk based cache. NFS has no such feature. Imagine how much faster it is to access files cached into RAM!

  • The Andrew benchmark demonstrates that AFS has better performance than NFS as the number of clients increases. A graph of this (taken from Andrew benchmark report) is available in:

    • 20050131_graph_afs_nfs.jpg