3 AFS administration

The Administration Section of the AFSFrequentlyAskedQuestions.

PreambleFAQ

GeneralFAQ

UsageFAQ

3 AFS administration

ResourcesFAQ

AboutTheFAQ

FurtherReading

3.01 Is there a version of program available with AFS authentication?

In general, not specifically; modern systems use authentication frameworks, so that an e.g. AFS plugin can be added to the framework and all programs will thereby be able to use it without modification. On many systems, the authentication framework is PAM (Pluggable Authentication Modules). Acquiring AFS tokens via PAM can be done by several different PAM modules, including Russ Allbery's pam-afs-session and Red Hat's pam_krb5afs.

3.02 What is /afs/@cell?

It is a commonly created symbolic link pointing at /afs/$your_cell_name. @cell is not something that is provided by AFS. You may decide it is useful in your cell and wish to create it yourself.

/afs/@cell is useful because:

  • If you look after more than one AFS cell, you could create the link in each cell then set your $PATH as:

    PATH=$PATH:/afs/@cell/@sys/local/bin
    
  • For most cells, it shortens the path names to be typed in, thus reducing typos and saving time.

A disadvantage of using this convention is that when you cd into /afs/@cell then type pwd you see /afs/@cell instead of the full name of your cell. This may appear confusing if a user wants to tell a user in another cell the pathname to a file.

You could create your own /afs/@cell with the following script (usable in ksh or any POSIX shell):

#/bin/ksh -
# author: mpb
[ -L /afs/@cell ] && echo We already have @cell! && exit
cell=$(cat /usr/vice/etc/ThisCell)
cd /afs/.${cell} && fs mkm temp root.afs
cd temp
ln -s /afs/${cell} @cell
ln -s /afs/.${cell} .@cell            # .@cell for RW path
cd /afs/.${cell} && fs rmm temp
vos release root.afs; fs checkv

http://www-archive.stanford.edu/lists/info-afs/hyper95/0298.html

3.03 Given that AFS data is location independent, how does an AFS client determine which server houses the data its user is attempting to access?

The Volume Location Database (VLDB) is stored on AFS Database Servers and is ideally replicated across 3 or more Database Server machines. Replication of the Database ensures high availability and load balances the requests for the data. The VLDB maintains information regarding the current physical location of all volume data (files and directories) in the cell, including the IP address of the ?FileServer, and the name of the disk partition the data is stored on.

A list of a cell's Database Servers is stored on the local disk of each AFS Client machine as /usr/vice/etc/CellServDB

The Database Servers also house the Protection Database (user UID and protection group information) and the Backup Database (used by System Administrators to backup AFS file data to tape), and in older sites the Kerberos Authentication Database (encrypted user and server passwords).

3.04 How does AFS maintain consistency on read-write files?

AFS uses a mechanism called "callbacks".

A callback is a promise from the fileserver that the cache version of a file/directory is up-to-date. It is established by the fileserver with the caching of a file.

When a file is modified, the fileserver breaks the callback. When the user accesses the file again the Cache Manager fetches a new copy if the callback has been broken or it has expired (after 2 hours by default).

The following paragraphs describe the AFS callback mechanism in more detail:

If I open() fileA and start reading, and you then open() fileA, write() a change **and close() or fsync()** the file to get your changes back to the server - at the time the server accepts and writes your changes to the appropriate location on the server disk, the server also breaks callbacks to all clients to which it issued a copy of fileA.

So my client receives a message to break the callback on fileA, which it dutifully does. But my application (editor, spreadsheet, whatever I'm using to read fileA) is still running, and doesn't really care that the callback has been broken.

When something causes the application to read() more of the file, the read() system call executes AFS cache manager code via the VFS switch, which does check the callback and therefore gets new copies of the data.

Of course, the application may not re-read data that it has already read, but that would also be the case if you were both using the same host. So, for both AFS and local files, I may not see your changes.

Now if I exit the application and start it again, or if the application does another open() on the file, then I will see the changes you've made.

This information tends to cause tremendous heartache and discontent - but unnecessarily so. People imagine rampant synchronization problems. In practice this rarely happens and in those rare instances, the data in question is typically not critical enough to cause real problems or crashing and burning of applications. Since 1985, we've found that the synchronization algorithm has been more than adequate in practice - but people still like to worry!

The source of worry is that, if I make changes to a file from my workstation, your workstation is not guaranteed to be notified until I close or fsync the file, at which point AFS guarantees that your workstation will be notified. This is a significant departure from NFS, in which no guarantees are provided.

3.05 Which protocols does AFS use?

AFS may be thought of as a collection of protocols and software processes, nested one on top of the other. The constant interaction between and within these levels makes AFS a very sophisticated software system.

At the lowest level is the UDP protocol. UDP is the connection to the actual network wire. The next protocol level is the remote procedure call (RPC). In general, RPCs allow the developer to build applications using the client/server model, hiding the underlying networking mechanisms. AFS uses Rx, an RPC protocol developed specifically for AFS during its development phase at Carnegie Mellon University.

Above the RPC is a series of server processes and interfaces that all use Rx for communication between machines. Fileserver, volserver, upserver, upclient, and bosserver are server processes that export RPC interfaces to allow their user interface commands to request actions and get information. For example, a bos status command will examine the bos server process on the indicated file server machine.

Database servers use ubik, a replicated database mechanism which is implemented using RPC. Ubik guarantees that the copies of AFS databases of multiple server machines remain consistent. It provides an application programming interface (API) for database reads and writes, and uses RPCs to keep the database synchronized. The database server processes, vlserver and ptserver, reside above ubik. These processes export an RPC interface which allows user commands to control their operation. For instance, the pts command is used to communicate with the ptserver, while the command vos uses the vlserver's RPC interface.

Some application programs are quite complex, and draw on RPC interfaces for communication with an assortment of processes. Scout utilizes the RPC interface to file server processes to display and monitor the status of file servers. The uss command interfaces with ptserver, volserver, and vlserver to create new user accounts.

The Cache Manager also exports an RPC interface. This interface is used principally by file server machines to break callbacks. It can also be used to obtain Cache Manager status information. The program cmdebug shows the status of a Cache Manager using this interface.

For additional information, Section 1.5 of the AFS System Administrator's Guide and the April 1990 Cache Update contain more information on ubik. Udebug information and short descriptions of all debugging tools were included in the January 1991 Cache Update. Future issues will discuss other debugging tools in more detail.

[source: ftp://ftp.transarc.com/pub/afsug/newsletter/apr91] [Copyright 1991 Transarc Corporation]

3.06 Which TCP/IP ports and protocols do I need to enable in order to operate AFS through my Internet firewall?

Outbound destination ports for a client:

kerberos 88/udp 88/tcp
ntp 123/udp
afs3-fileserver 7000/udp
afs3-ptserver 7002/udp
afs3-vlserver 7003/udp
afs3-volserver 7005/udp

If you also plan to control AFS servers from a client, you will also need

afs3-bosserver 7007/udp

You will also need to allow an inbound port

afs3-callback 7001/udp

or if you are using Arla

cachemanager 4711/udp

(Note: if you are using NAT, you should try to to arrange for the UDP NAT timeout on port 7001 to be at least two hours. Recent OpenAFS server and client versions will try to send keepalives to keep the callback NAT entry open, but some consumer router/WiFi/NAT devices may have a timeout that is too short even for this keepalive. If the NAT entry expires, your cache manager will not be notified of file changes on the server and you will only find out about file changes approximately after two hours, when the callback expires.)

You will also need to allow various ephemeral UDP source ports for outbound connections, but you will need to do this for DNS and NTP anyway.

3.07 Are setuid programs executable across AFS cell boundaries?

By default, the setuid bit is ignored but the program may be run (without setuid privilege). It would be bad to allow arbitrary setuid programs in remote cells to run; consider that someone could put a setuid copy of bash in a personal cell, arrange for that to be visible via DNS SRV records, and then fs mkmount a reference to it in their AFS space on e.g. a school machine.

It is possible to configure an AFS client to honor the setuid bit. This is achieved by root (only) running:

   root@toontown # fs setcell -cell $cellname -suid

where $cellname is the name of the foreign cell. Use with care!

Note that making a program setuid (or setgid) in AFS does not mean that the program will get AFS permissions of a user or group. To become AFS authenticated, you have to aklog. If you are not authenticated to AFS, AFS treats you as system:anyuser. setuid only affects local Unix permissions (and is meaningless on Windows clients).

3.08 How can I run daemons with tokens that do not expire?

It is not a good idea to run with tokens that do not expire because this would weaken one of the security features of Kerberos. A (slightly) better approach is to re-authenticate just before the token expires. (Even more preferable would be to get a token for a particular operation, preferably from the user performing some operation, but this is not always possible, especially with services that are not aware of Kerberos.)

The most common way to achieve this these days is to generate a keytab containing the credentials you want the daemon to have, and use a program like k5start to run the daemon with those credentials.

3.09 Can I check my users' passwords for security purposes?

The major Kerberos implementations (MIT Kerberos and Heimdal) all include ways to do password strength checking when a user chooses a password. There are not currently any (public!) utilities to check keys in a KDC (which are generated from passwords) against dictionaries; and you cannot (generally) generate an unencrypted KDC dump to check them (the KDC keys are double-encrypted: not only are they stored as encrypted keys instead of the original plaintext passwords, but the entire record is encrypted with the KDC's own master key).

3.10 Is there a way to automatically balance disk usage across fileservers?

Yes. There is a tool, balance, which does exactly this. It can be retrieved via anonymous ftp from ftp://ftp.andrew.cmu.edu/pub/AFS-Tools/balance-1.2-beta.tar.gz. (It does not appear to have been updated since late 2003).

Actually, it is possible to write arbitrary balancing algorithms for this tool. The default set of "agents" provided for the current version of balance balance by usage, # of volumes, and activity per week, the latter currently requiring a source patch to the AFS volserver. Balance is highly configurable.

Author: Dan Lovinger Contact: Derrick Brashear <shadow+@andrew.cmu.edu>

3.11 Can I shutdown an AFS fileserver without affecting users?

Yes, this is an example of the flexibility you have in managing AFS.

Before attempting to shutdown an AFS fileserver you have to make some arrangements that any services that were being provided are moved to another AFS fileserver:

  1. Move all AFS volumes to another fileserver. (Check you have the space!) This can be done "live" while users are actively using files in those volumes with no detrimental effects.

  2. Make sure that critical services have been replicated on one (or more) other fileserver(s). Such services include:

    • vlserver - Volume Location server
    • ptserver - Protection server
    • buserver - Backup server
    • kaserver - Old Kerberos Authentication server
    • fileserver
    • Kerberos KDCs, or kaserver on older installations

It is simple to test this before the real shutdown by issuing:

bos shutdown $server $service

where $server is the name of the server to be shutdown and $service is -all or the specific service to be shut down. Note that a service instance may not be the same as the service name; use bos status $server to check. (One common configuration uses short names like pts or even pt for the ptserver service, for example.)

Kerberos services are usually not managed via bos; check the OS's services manager for krb5kdc, kadmind, kpasswdd, and similar. (Different Kerberos implementations will have different service daemons.)

Other points to bear in mind:

  • vos remove any RO volumes on the server to be shutdown. Create corresponding RO volumes on the 2nd fileserver after moving the RW. There are two reasons for this:
    1. An RO on the same partition ("cheap replica") requires less space than a full-copy RO.
    2. Because AFS always accesses RO volumes in preference to RW, traffic will be directed to the RO and therefore quiesce the load on the fileserver to be shutdown.
  • If you are still using kaserver and the system to be shutdown has the lowest IP address, there may be a brief delay in authenticating because of timeout experienced before contacting a second kaserver.

3.12 How can I set up mail delivery to users with $HOMEs in AFS?

Preferably, don't. This has been found to scale poorly because of high load on read-write servers; mail clients check for new mail every few minutes (or even seconds) and this will cause problems for any file server. Additionally, as the mail server cannot authenticate to AFS as the receiving user, you need to carefully manage permissions on the receiving directory tree to avoid mail being lost or the directory being used as a general dropbox with potential security implications.

See this message for more information about the scalability of mail delivery onto a shared fileserver. (Something to think about: very similar considerations are why the recommendation for Exchange mail servers is to only have a small number of users on each mailbox server.)

If you absolutely must do this for some reason, here's one way to do it.

First, you must have your mail delivery daemon AFS authenticated (probably as "postman" or similar). kstart can be used to do this. (Note that the mail delivery agent cannot authenticate as the actual user! To do so, it would need access to keytabs for each possible recipient; and it is almost certainly a bad idea to give it access to such keytabs.)

Second, you need to set up the ACLs so that "postman" has lookup rights down to the user's $HOME and "lik" on the destination directory (for this example, we'll use $HOME/Mail).

3.13 Should I replicate a ?ReadOnly volume on the same partition and server as the ?ReadWrite volume?

Yes, Absolutely! It improves the robustness of your served volumes.

If ?ReadOnly volumes exist (not just "are available"), Cache Managers will not utilize the ?ReadWrite version of the volume except via an explicit ?ReadWrite mountpoint. This means if all RO copies are on dead servers, are offline, are behind a network partition, etc, then clients will not be able to get the data, even if the RW version of the volume is healthy, on a healthy server and in a healthy network.

However, you are very strongly encouraged to keep one RO copy of a volume on the same server and partition as the RW. There are two reasons for this:

  1. The RO that is on the same server and partition as the RW is a clone (just a copy of the header, not a full copy of each file). It therefore is very small, but provides access to the same set of files that all other (full copy) ?ReadOnly volumes do. Transarc trainers referred to this as the "cheap replica"; some admins call it a "shadow", but this is not the same as a [[shadow volume|AdminFAQ#shadow volume]].
  2. To prevent the frustration that occurs when all your ROs are unavailable but a perfectly healthy RW was accessible but not used.

If you keep a "cheap replica", then by definition, if the RW is available, one of the ROs is also available, and clients will utilize that site.

3.14 Will AFS run on a multi-homed fileserver?

(multi-homed = host has more than one network interface.)

Yes, it will. Older AFS assumed that there is one address per host, but modern OpenAFS identifies servers ad clients by UUIDs (universally unique identifiers) so that a fileserver will be recognized by any of its registered addresses.

See the documentation for the NetInfo and NetRestrict files. The UUID for a fileserver is generated when the sysid file is created.

If you have multiple addresses and must use only one of them (say, multiple addresses on the same subnet), you may need to use the -rxbind option to the network server processes bosserver, kaserver, ptserver, vlserver, volserver, fileserver as appropriate. (Note that some of these do not currently document -rxbind, notably kaserver because it is not being maintained. Again, the preferred solution here is to migrate off of kaserver, but the rxbind option will work if needed.)

Database servers can not safely operate multihomed; the Ubik replication protocol assumes a 1-to-1 mapping between addresses and servers. Use the NetInfo and NetRestrict files to associate database servers with a single address.

3.15 Can I replicate my user's home directory AFS volumes?

No.

Users with $HOMEs in /afs normally have an AFS ?ReadWrite volume mounted in their home directory. You can replicate a RW volume, but only as a ?ReadOnly volume; there can only be one instance of a ?ReadWrite volume.

In theory you could have RO copies of a user's RW volume on a second server, but in practice this won't work for the following reasons:

a) AFS has a bias to always access the RO copy of a RW volume if one exists. So the user would have a ?ReadOnly $HOME, which is not very useful. (You could use an RW mountpoint to avoid this.) b) ?ReadOnly volumes are not automatically updated; you would need to manually update each user volume (e.g. vos release user.fred; fs checkv).

The bottom line is: you cannot usefully replicate $HOMEs across servers.

(That said, there is one potentially useful case: if there is an extended fileserver outage, you can use vos convertROtoRW to promote a ?ReadOnly volume to ?ReadWrite. You should only do this if the alternative is restoring the entire contents of the downed fileserver from a backup; should the fileserver return to service, the attempt to re-register additional ?ReadWrite volume instances will fail. As such, if you make sure to use a ?ReadWrite mountpoint for user volumes, replicating a user's $HOME may prove useful as an online backup.)

3.16 How can I list which clients have cached files from a server?

By using the following script, which should work in a POSIX-compliant shell, ksh or bash (but check the path to rxdebug, and you need nslookup to be installed):

#!/bin/ksh -
#
# NAME          afsclients
# AUTHOR        Rainer Toebbicke  <rtb@dxcern.cern.ch>
# DATE          June 1994
# PURPOSE       Display AFS clients which have grabbed files from a server

if [ $# = 0 ]; then
    echo "Usage: $0 <afs_server 1> ... <afsserver n>"
    exit 1
fi
for n; do
    /usr/afsws/etc/rxdebug -servers $n -allconn
done |
grep '^Connection' |
while read x y z ipaddr rest; do
    echo $ipaddr
done |
sort -u |
while read ipaddr; do
    ipaddr=${ipaddr%%,}
    n="`nslookup $ipaddr`"
    n="${n##*Name: }"
    n="${n%%Address:*}"
    n="${n##*([ ])}"
    n="${n%?}"
    echo "$n ($ipaddr)"
done

An alternative in Perl (still requires rxdebug but not nslookup):

#! /usr/bin/perl -w
use strict;
use warnings;
use Socket;

my %client;
for my $fs (@ARGV) {
    open my $rx, '-|', "rxdebug -server \Q$fs\E -allconn" or die "rxdebug: $!";
    while (<$rx>) {
        /^Connection from host (\S+),/ and $client{$1} = 1;
    }
}
for my $ip (keys %client) {
    my ($ia, $host);
    $ia = inet_aton($ip);
    if (defined ($host = gethostbyaddr($ia, AF_INET))) {
        $client{$ip} = "$host ($ip)";
    } else {
        $client{$ip} = $ip;
    }
}
for my $host (sort values %client) {
    print $host, "\n";
}

3.17 Do Backup volumes require as much space as ?ReadWrite volumes?

Occasionally, but usually not. A backup volume consists of copy-on-write clones of the files in the original volume; if the file in the original is then modified, it will be copied first, leaving the backup volume pointing at the original version. The BK volume is re-synchronised with the RW next time a vos backup or vos backupsys is run.

The space needed for the BK volume is directly related to the size of all files changed in the RW between runs of vos backupsys.

3.18 Should I run ntpd on my AFS client?

Yes. You should not rely on older time services such as timed or programs such as ntpdate, and should not use the legacy settime functionality of the AFS client. You should also avoid using automatic time synchronization provided by virtual machine hypervisors (indeed, VMware specifically recommends disabling its time synchronization on Linux and using ntpd).

The AFS Servers make use of NTP [NTP] to synchronise time each other and typically with one or more external NTP servers. By default, clients synchronize their time with one of the servers in the local cell. Thus all the machines participating in the AFS cell have an accurate view of the time.

For further details on NTP see http://www.ntp.org/. The latest version is 4.2.6, dated December 2011, which is much more recent that the version packaged with Transarc AFS. OpenAFS no longer ships with timed, since it is assumed that all sites use NTP.

A list of NTP servers is available from http://support.ntp.org/servers/. Note that you should prefer to have one or more master local servers sync to one of the "pool" servers for your continent, and other local clients sync to the master local server(s).

The default time setting behavior of the AFS client can be disabled by specifying the -nosettime argument to afsd. It is strongly recommended that you run NTP and use -nosettime on all machines (clients and servers).

3.19 Why and how should I keep /usr/vice/etc/CellServDB current?

On AFS clients, /usr/vice/etc/CellServDB defines the cells (and their db servers) that can be accessed via /afs. Over time, site details change: servers are added/removed or moved onto new network addresses; new cells appear. While some of this can be handled by means of DNS AFSDB or SRV records, you must know about a cell to even be able to ask about it; the CellServDB acts as a central directory of cell. (Of course, it is sometimes a good idea to not advertise some internal cells; but that also means not putting them in public-facing DNS, so you will likely want a local CellServDB.)

In order to keep up-to-date with such changes, the CellServDB file on each AFS client should be kept consistent with some master copy (at your site).

As well as updating CellServDB, your AFS administrator should ensure that new cells are mounted in your cell's root.afs volume. If a cell is added to CellServDB, either the client must be restarted or you must use fs newcell to register the new cell information with the running client.

The official public master CellServDB is maintained at grand.central.org, from http://grand.central.org/dl/cellservdb/CellServDB or . You can send updates for this to cellservdb@central.org.

The client CellServDB file must not reside under /afs (since it needs to exist before the client starts!) and is best located in local filespace.

After obtaining an updated CellServDB and distributing to clients, you will want to run a script similar to this Perl script. (It could be written in shell, but not comprehensibly. Feel free to reimplement in your preferred language.)

#! /usr/bin/perl
use strict;
use warnings;

#
# Given a CellServDB file (may be local, may be master), issue "fs newcell" for each listed
# cell.  We don't bother checking for changes, as that's much more expensive than making an
# unnecessary change.
#
# Expected usage via puppet:  rather than making it the restart/reconfigure action for the
# client (or server) we depend on an exec stanza which does so.  No point in running it if
# what changed is something else that requires a full restart.
#

my $cell;
my @srv = ();

while (<>) {
  chomp;
  if (/^>(\S+)(?:\s|\Z)/) {
    if (defined $cell and @srv) {
      system 'fs', 'newcell', $cell, @srv and die "fs newcell failed";
    }
    $cell = $1;
    @srv = ();
  }
  # same rules as afsd: if the name doesn't resolve, use the IP
  elsif (defined $cell and /(^\d+\.\d+\.\d+\.\d+)\s*#(\S+)\s*$/) {
    if (defined gethostbyname($2)) {
      push @srv, $2;
    } else {
      push @srv, $1;
    }
  }
  else {
    warn "line $ {.}: can't parse \"$_\"\n";
  }
}
# last entry
if (defined $cell and @srv) {
  system 'fs', 'newcell', $cell, @srv and die "fs newcell failed";
} else {
  warn "line $ {.}: no valid cells found\n";
}

3.20 How can I compile a list of AFS fileservers?

Here is a Bourne shell command to do it (it will work in GNU bash and the Korn shell, too, and even csh):

stimpy@nick $ vos listvldb -cell `cat /usr/vice/etc/ThisCell` | awk '/server/ {print $2}' | sort -u

3.21 How can I set up anonymous FTP login to access /afs?

The easiest way on a primarily "normal" machine (where you don't want to have everything in AFS) is to actually mount root.cell under ~ftp, and then symlink /afs to ~ftp/afs or whatever. It's as simple as changing the mountpoint in /usr/vice/etc/cacheinfo and restarting afsd.

Note that when you do this, anon ftp users can go anywhere system:anyuser can (or worse, if you're using IP-based ACLs and the ftp host is listed in any PTS groups). The only "polite" solution I've arrived at is to have the ftp host machine run a minimal CellServDB and police my ACLs tightly.

Alternatively, you can make ~ftp an AFS volume and just mount whatever you need under that - this works well if you can keep everything in AFS, and you don't have the same problems with anonymous "escapes" into /afs. (Note that you can often use host tmpfs mounts onto AFS directories to hide things or provide host-specific paths.)

Note that similar considerations apply to web access; it used to be not uncommon for accidental misconfigurations of MIT's web hosts to result in people's home directories showing up in Google searches. This will annoy people who do not think of their OpenAFS home directory as being world-visible! (even though they should realize it and set their ACL appropriately)

3.22 Is the data sent over the network encrypted in AFS?

OpenAFS has an fs subcommand to turn on encryption of regular file data sent and received by a client. This is a per client setting that persists until reboot. No server actions are needed to support this change. The syntax is:

fs setcrypt on
fs setcrypt off
fs getcrypt

Note that this only encrypts network traffic between the client and server. The data on the server's disk is not encrypted, nor is the data in the client's disk cache. The encryption algorithm used is fcrypt, which is a DES variant. Additionally, data read/written without a token is not encrypted over the wire. This (and the use of DES variants, both here and in general) is a shortcoming of AFS's security protocols and is being addressed by the development of a new rxgk protocol.

Enabling encryption by default:

  • ?RedHat Linux: (src) change the last line of /etc/sysconfig/afs to AFS_POST_INIT="/usr/bin/fs setcrypt on"
  • Windows (src) set the following registry value named SecurityLevel under HKLM\SYSTEM\CurrentControlSet\Services\TransarcAFSDaemon\Parameters to 2.

3.23 What underlying filesystems can I use for AFS?

See also SupportedConfigurations.

What filesystems can be used for fileserver partitions depends on what configure switches were used during compilation from sources. To be always on the safe side, use the --enable-namei-fileserver configure flag; that will give you a fileserver binary which can act on any /vicep* partition regardless of its filesystem type. With the namei file server, you can basically use any filesystem you want. The namei file server does not do any fancy stuff behind the scenes but only accesses normal files (their names are a bit strange though).

Older versions of AFS also provided an inode fileserver. On older Solaris it once gave a 10% speedup over the namei fileserver; but with modern operating systems and disks, the performance difference is negligible. The inode fileserver cannot run on every filesystem, as it abuses the filesystem internals to store AFS metadata and opens files directly by inode number instead of going through the normal filesystem access methanisms. The fsck distributed with the operating system will consider these inode-accessed files to be "dangling" and either link them into lost+found or delete them entirely; it will also often corrupt the AFS metadata, which it doesn't know about. As of OpenAFS 1.6, inode fileservers are no longer supported; you can still build from source with inode support, but it has bugs and should only be used in a read-only configuration to copy volumes to a namei fileserver host.

On the client side, the cache partition requires a filesystem supporting the inode abstraction for the cache (usually /var/vice/cache) since the cache manager references files by their inode. Fortunately, it does not store metadata in "unused" parts of the filesystem, and cache creation always provides proper names for the cache files so they won't be damaged by fsck.

The following file systems have been reported not to work for the AFS client cache:

  • ReiserFS
  • vxfs (HP-UX)
  • advfs (Tru64), it initially works but eventually corrupts the cache
  • efs (SGI) - Transarc AFS supported efs, but OpenAFS doesn't have a license to use the efs code
  • zfs (Solaris, FreeBSD, other ports) - you can however use a zvolume with a ufs or other supported filesystem

On certain OSes, the OpenAFS cache manager has some checks for unsupported filesystem types and will refuse to start, but these checks are not 100% reliable.

The following file systems have been reported to work for the AFS client cache:

  • ext2
  • ext3
  • hfs (HP-UX)
  • xfs (at least on IRIX 6.5)
  • ufs (Solaris, ?Tru64Unix)

3.24 Compiling OpenAFS from source

(Modern OpenAFS supports proper packaging for various systems; these notes are still somewhat applicable but mostly relevant for 1.2.x.)

The kernel component of OpenAFS must be compiled by the same kernel used to compile the kernel, e.g. Solaris must use the cc from SUNWspro and not gcc. ?Tru64Unix doesn't support modules, so you have to edit kernel config files and link statically into kernel. Dynamically loaded Kernel modules work on Linux, Solaris, Irix ...

./configure --enable-transarc-paths=/usr/etc --with-afs-sysname=i386_linux24
make dest
cd dest/i386_linux24

... and continue the install process described in IBM AFS documentation. If you do "make install", you will end up with some stuff installed into /usr/local but something not, regardless the --enable-transarc-paths option ... "make install" it's messy.

3.25 Upgrading OpenAFS

(Modern OpenAFS supports proper packaging for various systems; these notes are still somewhat applicable but mostly relevant for 1.2.x.)

These instructions assume a "dest" tree (the output of make dest, and the contents of the official binary distribution tarballs). It is generally preferable to use native packages when they exist; the packaging will handle most of the details of upgrading for you.

Upgrade of AFS on Linux

/etc/rc.d/init.d/afs stop
cd root.client/usr/vice/etc
tar cvf - . | (cd /usr/vice/etc; tar xfp -)
cp -p afs.rc /etc/rc.d/init.d/afs
cp ../../../../lib/pam_afs.krb.so.1  /lib/security
cd ../../../../root.server/usr/afs
tar cvf - . | (cd /usr/afs; tar xfp -)
# echo "auth sufficient /lib/security/pam_afs.so try_first_pass \
ignore_root" >> /etc/pam.d/login
cd /lib/security
vim /etc/sysconfig/afs
ln -s pam_afs.krb.so.1 pam_afs.so
cd /etc/rc3.d
ln -s ../init.d/afs S99afs
cd ../rc0.d
ln -s ../init.d/afs K01afs
cp /usr/vice/etc/afs.conf /etc/sysconfig/afs
/etc/rc.d/init.d/afs start

Upgrade of AFS on Solaris 2.6

cd /etc/rc3.d/
mv S20afs aS20afs
init 6
cd root.server/usr/afs
tar cvf - ./bin | (cd /usr/afs; tar xfp -)
cd ../../..
cp root.client/usr/vice/etc/modload/libafs.nonfs.o /kernel/fs/afs
cp root.server/etc/vfsck /usr/lib/fs/afs/fsck
cd root.client/usr/vice
tar cvf - ./etc | (cd /usr/vice; tar xf -)
cd ../../..
cp lib/pam_afs.krb.so.1 /usr/lib/security
cp lib/pam_afs.so.1 /usr/lib/security
cd /etc/rc3.d
mv aS20afs S20afs
init 6

Upgrade of AFS on Irix 6.5

/etc/chkconfig -f afsserver off
/etc/chkconfig -f afsclient off
/etc/chkconfig -f afsml off
/etc/chkconfig -f afsxnfs off
/etc/reboot
cd root.server/usr/afs
tar cvf - ./bin | (cd /usr/afs; tar xfp -)
cd ../../..
cp root.client/usr/vice/etc/sgiload/libafs.IP22.nonfs.o /usr/vice/etc/sgiload
echo "AFS will be compiled statically into kernel"
echo "otherwise skip following lines and use chkconfig afsml on"
cp root.client/bin/afs.sm  /var/sysgen/system
cp root.client/bin/afs /var/sysgen/master.d
echo "The next file comes from openafs-*/src/libafs/STATIC.*"
cp root.client/bin/libafs.IP22.nonfs.a /var/sysgen/boot/afs.a
cp /unix /unix_orig
/etc/autoconfig
echo "end of static kernel modifications"
cd root.client/usr/vice/etc
echo "Delete any of the modload/ files which don't fit your platform if you need space"
echo "These files originate from openafs-*/src/libafs/MODLOAD.*"
tar cvf - . | (cd /usr/vice/etc; tar xf -)
/etc/chkconfig -f afsserver on
/etc/chkconfig -f afsclient on
# /etc/chkconfig -f afsml on - afs is compiled statically into kernel, so leave afsml off
/etc/chkconfig -f afsml off
/etc/chkconfig -f afsxnfs off
/etc/reboot

Upgrade of AFS on ?Tru64Unix

cd root.server/usr/afs/
tar cvf - ./bin | (cd /usr/afs; tar xfp -)
cd ../../../root.client/bin
cp ./libafs.nonfs.o /usr/sys/BINARY/afs.mod
ls -la /usr/sys/BINARY/afs.mod
doconfig -c FOO
cd ../../root.client/usr/vice
cp etc/afsd /usr/vice/etc/afsd
cp etc/C/afszcm.cat /usr/vice/etc/C/afszcm.cat

3.26 Notes on debugging OpenAFS

In case of troubles when you need only fileserver process to run (to be able to debug), run the lwp fileserver instead of the pthreads fileserver (src/viced/fileserver instead of src/tviced/fileserver if you have a buildtree handy):

cp src/viced/fileserver /usr/afs/bin (or wherever)
bos restart localhost fs -local

then attach with gdb. (This may be less necessary with recent gdb; its pthreads support used to be quite abysmal. [ed.])

To debug if client running afsd kernel process talks to the servers from CellServDB, do:

tcpdump -vv -s 1500 port 7001

Other ports are:

afs3-fileserver 7000/udp    # file server itself
afs3-callback   7001/udp    # callbacks to cache managers
afs3-prserver   7002/udp    # users & groups database
afs3-vlserver   7003/udp    # volume location database
afs3-kaserver   7004/udp    # AFS/Kerberos authentication service
afs3-volser     7005/udp    # volume managment server
afs3-errors     7006/udp    # error interpretation service
afs3-bos        7007/udp    # basic overseer process
afs3-update     7008/udp    # server-to-server updater
afs3-rmtsys     7009/udp    # remote cache manager service

When tcpdump doesn't help, try:

fstrace setset cm -active
# make your error happen
fstrace dump cm

3.27 Tuning client cache for huge data

Use on afsd command line -chunk 17 or greater. Be carefull, with certain cache sizes afsd crashes on startup (Linux, ?Tru64Unix at least). It is possibly when dcache is too small. Go for:

/usr/vice/etc/afsd -nosettime -stat 12384 -chunk 19

> So I ran the full suite of iozone tests (13), but at a single file
> size (128M) and one record size (64K).  I set the AFS cache size to > 80000K for both memcache and diskcache.

Note that memcache size and diskcache size are different things.
In the case of memcache, a fixed number of chunks are allocated
in memory, such that numChunks * chunkSize = memCacheSize.  In
the case of disk cache, there are a lot more chunks, because the
disk cache assumes not every chunk will be filled (the underlying
filesystem handles disk block allocation for us).  Thus, when you
have small file segments, they use up an entire chunk worth of
cache in the memcache case, but only their size worth of cache
in the diskcache cache.

-- kolya

3.28 Settting up PAM with AFS

Solaris

auth        sufficient    /lib/security/pam_afs.so debug try_first_pass ignore_root debug
auth        required      /lib/security/pam_env.so
auth        sufficient    /lib/security/pam_unix.so likeauth nullok
auth        required      /lib/security/pam_deny.so

account     required      /lib/security/pam_unix.so

password    required      /lib/security/pam_cracklib.so retry=3 type=
password    sufficient    /lib/security/pam_unix.so nullok use_authtok md5 shadow
password    required      /lib/security/pam_deny.so

session     sufficient    /lib/security/pam_afs.so set_token
session     required      /lib/security/pam_limits.so
session     required      /lib/security/pam_unix.so

# reafslog is to unlock dtlogin's screensaver
other  auth sufficient /usr/athena/lib/pam_krb4.so reafslog

3.29 How can I have a Kerberos realm different from the AFS cell name? How can I use an AFS cell across multiple Kerberos realms?

OpenAFS defaults to using a Kerberos realm generated from the cell name by uppercasing. You can instead tell it the Kerberos realm to use with a truncated krb.conf file:

/usr/afs/etc/krb.conf        # Transarc paths
/etc/openafs/server/krb.conf # FHS paths

You do not list any KDCs in this file, just space-separated realms on a single line. See also below. You can list a maximum of 2 realms in this file in older AFS, but OpenAFS 1.6 and later allow any number of realms.

3.30 What are the bos instance types? How do I use them?

There are, as of this writing, 4 types of bos server:

  • simple - a single program which will be kept running as needed.
  • cron - a single program, plus a time at which it will be automatically run; typically used for cell backups. The time looks like 04:00 to run every day at a given time, or sun 04:00 to run once a week. Times may be specified in 24-hour or 12-hour (with am/pm suffix); weekdays may be full or abbreviated to 3 characters. Case is ignored. (A legacy usage allows the string now to be used; use bos exec instead.)
  • fs - a standard fileserver which has three programs that must be run together in a particular way. The fs server type will take care of starting, stopping, and restarting these programs in order to keep them working together.
  • dafs - demand attach fileservers are similar to standard fileservers, but have an additional component program to be synchronized. The dafs server type will ensure these are started, stopped, and restarted correctly while maintaining synchronization.

3.31 afsd gives me "Error -1 in basic initialization." on startup

When starting afsd, I get the following:

# /usr/vice/etc/afsd -nosettime -debug
afsd: My home cell is 'foo.bar.baz'
ParseCacheInfoFile: Opening cache info file '/usr/vice/etc/cacheinfo'...
ParseCacheInfoFile: Cache info file successfully parsed:
       cacheMountDir: '/afs'
       cacheBaseDir: '/usr/vice/cache'
       cacheBlocks: 50000
afsd: 5000 inode_for_V entries at 0x8075078, 20000 bytes
SScall(137, 28, 17)=-1 afsd: Forking rx listener daemon.
afsd: Forking rx callback listener.
afsd: Forking rxevent daemon.
SScall(137, 28, 48)=-1 SScall(137, 28, 0)=-1 SScall(137, 28, 36)=-1 afsd: Error -1 in basic initialization.

Make sure the kernel module has been loaded. Modern OpenAFS startup scripts should ensure this and report an error if it cannot be loaded, but startup scripts from older versions or on systems which can't use loadable kernel modules (requiring the kernel to be relinked) will not catch this and you will get these errors from SScall.

3.32 Error "afs: Tokens for user of AFS id 0 for cell foo.bar.baz are discarded (rxkad error=19270407)"

elmer@toontown ~$ translate_et 19270407
19270407 (rxk).7 = security object was passed a bad ticket

or alternately

elmer@toontown ~$ grep 19270407 /usr/afsws/include/rx/*
/usr/afsws/include/rx/rxkad.h:#define RXKADBADTICKET (19270407L)

A common cause of this problem (error 19270407) is the use of periods (".") in Kerberos V principals. If you have a Kerberos principal such as my.name@REALM.COM and create the corresponding pts userid my.name, you will get the cryptic error above. If you want to use such principal names and have OpenAFS 1.4.7 or later, you can pass the option -allow-dotted-principals to all server daemons to allow their use. See the -allow-dotted-principals option in the fileserver (or any server daemon) documentation for more information. (The problem here is that for compatibility reasons, OpenAFS uses Kerberos 4 name rules internally; while "." was the name component separator in Kerberos 4, in ?Kerberos5 it is "/" so OpenAFS translates . to / when passing names to Kerberos for verification. This means that a Kerberos 5 name with an embedded period cannot be used directly without disabling the translation; but with the translation disabled, you cannot easily use Kerberos 5 names with components. There is ongoing work in this area because rxgk requires proper support for these names.)

In general, the translate_et utility can be used to find out what an AFS error number means. This only works for AFS errors; some utilities may also report Kerberos errors in this way, and translate_et will not work for these. Some sites have alternative utilities that understand Kerberos as well as AFS errors (see for example (here)[file:///afs/sinenomine.net/user/ballbery/public/translate_err]).

3.33 I have tickets and tokens, but still get Permission denied for some operations.

This can be caused by the above, or by not being in a server UserList (/usr/afs/etc/UserList or /etc/openafs/server/UserList).

Also beware that, as described above, UserList accepts only Kerberos 4 name syntax: use joe.admin instead of joe/admin. See https://lists.openafs.org/pipermail/openafs-devel/2002-December/008673.html and the rest of the thread.

3.34 Recovering broken AFS cache on clients

>> Does anyone have a trick to force AFS to refresh its cache (for a
>> particular directory or even for all files?) The only way I know
>> how to accomplish this is to reboot, stop in single user mode,
>> rm -rf the cache files and let AFS rebuild everything.
>
> fs flush and fs flushv have cured corruption problems in the past
> on some of our clients.

Thanks for the tip - I was not aware of the flush* subcommands.
Here's a little of what I saw today:

ls -la
/bin/ls: asso.S14Q00246.all.log: Bad address
/bin/ls: asso.S14Q00246.all.lst: Bad address
/bin/ls: chr14markers.txt: Bad address
/bin/ls: geno.summary.txt: Bad address
/bin/ls: global.ind.S14Q00246.all.txt: No such device
/bin/ls: global.S14Q00246.all.txt: No such device
total 103
[ other ls results as usual ]

Flushing a particular file had no effect (the same error as shown above appears). Flushvolume took a long time, but when it eventually completed, the ls -la behaved exactly as one would expect.

Recent OpenAFS (1.6.4 and newer) has an fs flushall command in addition to the flush and flushvol commands.

Older AFS versions sometimes corrupted their cache filesystems in ways that fs flushvol cannot fix. Sometimes this can be corrected with

root@toontown ~# fs setca 1; fs setca 0

(set the cache to minimum size, and then back to normal; beware that in most versions of Transarc AFS, you will have to specify the actual cache size instead of 0!). If this does not work, you can force a cache rebuild by shutting down OpenAFS and removing /var/vice/cache/CacheItems (it is not necessary to remove all cache files), although you may want to remake the filesystem (mkfs or newfs) instead in case there is actual filesystem corruption.

If this happens regularly, please file an OpenAFS bug.

3.35 What does it mean for a volume to not be in the VLDB?

If a volume is not in the VLDB, you will be unable to perform operations on it using its name; all "vos" operations will need to be done using its numerical id, server, and partition. Furthermore, if a volume is not in the VLDB, it cannot be reached via mountpoints.

3.36 What is a Volume Group?

You can think of a Volume Group as an RW volume, and all of the clones of that RW (its RO clone, BK clone, and any other clones). All of the volumes in a Volume Group on the same fileserver can share storage for data that is the same between all of them. This is why, for example, an RO clone usually takes up very little disk space; since an RW and its RO clone are in the same Volume Group, they can share storage for unchanged data.

All of the volumes in a group usually have very similar volume ID numbers. For example, if an RW volume has ID 536870915, its RO clone will typically be 536870916. However, this is not required, as volume ID numbers can be almost anything. You can even manually specify what volume ID number you want for a volume when you create the volume with "vos create".

Currently, you can only have about 8 volumes in a Volume Group. However, this limitation is due to technical details of the fileserver "namei" disk backend. If that backend is improved in the future, or if different backends are developed, the number of volumes in a volume group could be much greater.

Some low-level documentation may refer to a "volume group ID". This is always the same as the RW volume ID.

3.37 What is a Clone?

A clone of a volume is a read-only copy of that volume which shares on-disk storage with the original volume. Backup volumes are a particular kind of clone volume. Read-only replicas which reside on the same partition as their read-write volume are another particular kind of clone volume. In some other storage systems this kind of volume is called a "snapshot".

Clone volumes must belong to the same volume group (see previous question) as the volume which they are a clone of.

In addition to backup and readonly clones, you may create up to three additional clones of a volume. To do this, use "vos clone".

When you "vos remove" a volume, its "backup" clone will also be removed automatically. However, clones created with "vos clone" are not removed automatically. Unfortunately, these "dangling clones" will no longer be in the VLDB (see above). They belong to a volume group whose leader (RW volume) no longer exists, which is a somewhat undefined state for AFS. Such volumes should be manually deleted as soon as possible.

3.38 What is a Shadow?

A shadow of a volume is a duplicate of that volume which resides on a different partition. Shadow volumes do not share storage with their original volumes (unlike clones). A readonly volume on a different partition than its readwrite volume could be considered one particular example of a shadow volume; however, in practice the term "shadow volume" is used to refer to volumes created with "vos shadow" and not to refer to readonly volumes.

A shadow of any readwrite volume may be created using the "vos shadow" command. This will create a new volume which is a shadow of the original volume, and will copy the contents of the old volume to the new volume. This will also set a bit in the header of the new volume that identifies it as a shadow volume. Shadow volumes do not appear in the VLDB (see above) -- "vos shadow" does not create a VLDB entry and "vos syncvldb" ignores shadow volumes.

You can "refresh" a shadow volume from its original with "vos shadow -incremental". This operation will first check to make sure that the target of the operation is a shadow volume, to prevent the administrator from accidentally corrupting a non-shadow volume. However, if you shadow from a readwrite volume to some shadow of another volume, the shadow will be corrupted and will have to be deleted. vos shadow will only copy data which has changed, so it is very efficient.

You can remove the shadow bit from a volume's header with "vos syncvldb -force". This will remove the shadow bit and create a VLDB entry for the volume, deleting any previous entry for the RW volume. However, the RW volume itself will not be deleted; it will simply exist without a VLDB entry.

Attempting to create shadows of two different RW volumes on the same partition with the same name is prohibited by the volserver. Technically it is possible to create two shadow volumes with the same name on different partitions; however, this is not advisable and may lead to undefined behavior.

(Some AFS administrators may refer to an RO clone of an RW volume on the same server/partition as a "shadow"; this terminology predates the existence of shadow volumes and should be avoided.)

3.39 Can I authenticate to my AFS cell using multiple Kerberos realms?

Yes. This can be useful if your organization has multiple Kerberos realms with identical user entries: For example you might have an MIT Kerberos realm for Unix-like systems, and an Active Directory domain for Windows with synchronized accounts.

In order to make this work, you need to do 4 things.

  1. Add a key for the afs service to the additional realm and store it in a keytab:

    $ kadmin -q ank -e des-cbc-crc:v4 -kvno <new kvno> afs/your.cell.name@YOUR.SECOND.REALM.NAME
    $ kadmin -q ktadd -e des-cbc-crc:v4 -k /path/to/afs.keytab afs/your.cell.name@YOUR.SECOND.REALM.NAME
    

    Note that a kvno must be specified for the key that is different than the kvno for your existing key(s) in the original realm. You can check on the kvno of the existing keys by running "asetkey list" on one of your servers. Since keys must be in ascending order in the AFS KeyFile, it will be easiest if you make the new kvno higher than any existing key's kvno.

    It's also worth noting that the process of adding the key to a keytab (at least with MIT krb5) actually creates a new key first, so your kvno will end up being higher than what you specified when you added the principal. You can check on the current kvno by using the command "kadmin -q getprinc afs/your.cell.name@YOUR.SECOND.REALM.NAME".

  2. Add the new key to the KeyFile on your AFS servers:

    $ asetkey add <kvno> /path/to/afs.keytab afs/your.cell.name@YOUR.SECOND.REALM.NAME
    

    Note that the kvno here needs to be the same one as is reported by the kadmin getprinc command.

  3. Create an AFS krb.conf with your additional realm's name in it, and place it on all of your AFS servers; see above.

  4. Restart your AFS servers.

At this point you should be able to run:

kinit you@YOUR.SECOND.REALM.NAME
aklog

and receive the same privileges as if you had run:

kinit you@YOUR.CELL.NAME
aklog

3.40 How can I ensure that the userids on client machines match the users' pts ids?

You can use libnss-afs for this.

3.41 What is Fast Restart?

When compiled with --enable-fast-restart, the file server will start up immediately, without first salvaging any volumes which cannot be attached.

Disadvantages to Fast Restart, as noted here include:

  1. Volumes in need of salvage remain offline until an administrator intervenes manually
  2. On an inode-based fileserver, salvaging a single volume crawls every inode; therefore, salvaging volumes individually (rather than partition-at-a-time) duplicates work.

In OpenAFS 1.6 there is a demand attach fileserver which provides even faster restart while reducing the drawbacks; you should use it instead.

3.42 Why does AFS reboot itself spontaneously at 4:00am every Sunday?

This was made to be the default behavior back in the days when OpenAFS servers had problems with leaking memory and other resources. These days, it is generally seen as not necessary.

You can disable this behavior with:

bos setrestart $SERVER_NAME never
bos setrestart $SERVER_NAME -newbinary never

Newer versions of OpenAFS do not enable this by default.

3.43 Why do I get an error -1765328370 when authenticating?

tweety@toontown ~$ translate_err -1765328370
krb5 error -1765328370 = KRB5KDC_ERR_ETYPE_NOSUPP

(See here for translate_err.) Usually this means that your KDC has support for des-cbc-crc and other weaker encryption types turned off. Re-enable support for DES encryption types and you will get further.

Check /etc/krb5.conf and make sure it has something like the following in it:

[libdefaults]
allow_weak_crypto = true

Also check kdc.conf (possibly located in /var/kerberos/krb5kdc; check the documentation for your Kerberos packages) and make sure that des-cbc-crc:normal is in the supported_enctypes list.

There is ongoing work to remove the need for DES enctypes.