Hack 60 Using CVS to Manage Data on Multiple Machines

Work with your data wherever you are without fear of getting out of sync with your home machine.

Here's the scenario: instead of just using one Mac, you regularly use two (a desktop and a laptop) and would like to keep up-to-date copies of all your data on all of your machines. After all, when working at home, you want to take advantage of the large monitor and dual processors of a desktop Power Mac, and when you are on the road, you want all the portability of an iBook or a PowerBook. Most solutions to this problem are haphazard and error prone.

However, a tool that software developers use can help you. It's called CVS. And with it you can work with all your data no matter where you are.

60.1 What Is CVS?

CVS is an open source tool that provides version control. Version control is the practice of maintaining information about a project's development by tracking changes and coordinating the development efforts of many programmers. CVS uses a centralized repository (sometimes called an archive or a depot) to store all the information about each and every file, as well as every change to those files, contained in a project. These kinds of systems are used in projects small and large, including the development of operating systems like Mac OS X.

Each and every developer of the project has a copy of these files on her own machine. As a developer makes changes, they are committed back into the central repository, allowing the other developers on the project access to the latest code. This allows many people to cooperate on the same source files with a minimum of fuss. If two developers make changes to the same file at the same time, CVS will defer the commit of the second file until the second developer resolves the conflict. Usually, these conflicts are dealt with easily, and development proceeds.

CVS supports all sorts of additional operations that are useful to large teams. However, for our purposes (which are much less demanding than software development), we can take the functionality that we've just described and use it to solve the problem of managing our own data on multiple machines. Even if you are the only person to use your data, CVS can help you maintain it easily on as many machines as your bank account can fund.

CVS comes as part of Mac OS X's Developer Tools. In order to use it, you'll need to install the Mac OS X Developer Tools [Hack #55].

60.2 Using CVS

So, how should we use CVS for the purpose of replicating our data on multiple machines? First, you need to identify a machine that can serve as the repository. If you have two machines, such as an iBook and a Power Mac, then you should use the Power Mac as your repository. If you are lucky enough to have a third machine that you use as a server for other purposes (maybe you are hosting your own domain or web site), then you should probably use that machine to store your repository.

Once the repository is set up, you can access it from the machine on which you set it up or from other machines. The first case, wherever both the repository and the working copy of your files are located, is an example of local usage. The second case ? for example, when you check out your files onto your iBook ? is called remote usage. In both situations, you use the same set of CVS commands, but you have to do a bit more setup work for the remote case.

60.3 Creating the Repository

Once you've decided on which machine to place the repository, you have to pick where on that machine you want your repository to live. You want to make sure it's in a location that you'll remember easily later. For my setup, I use the /Library/Depot directory. Once you've decided where you want it, create the directory and then initialize your repository with the following commands:

[Mercury ~] duncan% mkdir /Library/Depot
[Mercury ~] duncan% cvs -d /Library/Depot init

The -d argument lets CVS know where the repository is located. init tells CVS to initialize the directory as a new repository. This blesses the directory as a CVS repository and installs a copy of the files that will control how it works.

60.4 The First Checkout

To make sure all is well, we are going to perform an initial check out of the repository. To do so, make an empty directory (on the same machine as the repository) and execute the following command in that directory:

[Mercury ~/tmp] duncan% cvs -d /Library/Depot checkout .

Once again, the -d argument lets CVS know the directory in which the repository is located. The checkout . (don't forget the dot) tells CVS to check out a copy of everything in the repository. You should see the following output from CVS:

cvs checkout: Updating .
cvs checkout: Updating CVSROOT
U CVSROOT/checkoutlist
U CVSROOT/commitinfo
U CVSROOT/config
U CVSROOT/cvswrappers
U CVSROOT/editinfo
U CVSROOT/loginfo
U CVSROOT/modules
U CVSROOT/notify
U CVSROOT/rcsinfo
U CVSROOT/taginfo
U CVSROOT/verifymsg

The files that were checked out are the administration files. By editing, and then checking these files back in, we can change how CVS works. Mostly, we will want to leave these alone for our use, but there is one file that we will need to modify.

60.5 Identifying Binary Files

In addition to several quirks, CVS has one major irritation: it wants to treat all files as text files and can't, by itself, tell the difference between text and binary. It wants to treat all files as text because then it can save space in the repository by storing only the difference between files. For HTML files, this is great. However, for binary files that we work with all the time, such as Microsoft Word files (.doc) or Excel files (.xls), this strategy falls on its face and will make a mess of your data.

To fix this, edit the CVSROOT/cvswrappers file to look like this:

# This file affects handling of files based on their names.
#
# The -t/-f options allow one to treat directories of files
# as a single file, or to transform a file in other ways on
# its way in and out of CVS.
#
# The -m option specifies whether CVS attempts to merge files.
#
# The -k option specifies keyword expansion (e.g., -kb for binary).
#
# Format of wrapper file ($CVSROOT/CVSROOT/cvswrappers or .cvswrappers)
#
# wildcard [option value][option value]...
#
# where option is one of
# -f from cvs filter value: path to filter
# -t to cvs filter value: path to filter
# -m update methodology value: MERGE or COPY
# -k expansion mode value: b, o, kkv, &c
#
# and value is a single-quote delimited value.
# For example: 

# binary files

*.ai -k 'b'
*.doc -k 'b'
*.bmp -k 'b'
*.class -k 'b'
*.classes -k 'b'
*.dmg -k 'b'
*.eps -k 'b'
*.gif -k 'b'
*.gz -k 'b'
*.GZ -k 'b'
*.icns -k 'b'
*.jar -k 'b'
*.jpg -k 'b'
*.jpeg -k 'b'
*.nib -k 'b'
*.ofile -k 'b'
*.pdf -k 'b'
*.png -k 'b'
*.ppm -k 'b'
*.ppt -k 'b'
*.pqg -k 'b'
*.prj -k 'b'
*.ps -k 'b'
*.psd -k 'b'
*.sl -k 'b'
*.strings -k 'b'
*.tif -k 'b'
*.tiff -k 'b'
*.ttf -k 'b'
*.xls -k 'b'
*.Z -k 'b'
*.zip -k 'b'

This is not an exhaustive list, but it serves as the day-to-day list that I use in my repository. Make sure that any binary files that you plan on putting in your repository are on this list.

Once you have edited the file, you need to check it back in. To do this, issue the following command:

[Mercury ~/tmp] duncan% cvs commit -m "Sync"

This tells CVS to commit our changes back to the repository. The -m argument is the commit message that will be kept in the repository. When you execute this command, you should see the following output:

cvs commit: Examining .
cvs commit: Examining CVSROOT
Checking in CVSROOT/cvswrappers;
/Library/Depot/CVSROOT/cvswrappers,v <-- cvswrappers
new revision: 1.2; previous revision: 1.1
done
cvs commit: Rebuilding administrative file database

This output will tell you each and every action that is taken by CVS. In this case, it notices that we've modified one of the configuration files and rebuilds its administrative database.

You might notice that we didn't use the -d argument to CVS this time. We need to tell CVS where the repository is only if we haven't checked it out yet into the directory that we are working in. Once checked out, CVS leaves itself enough information to figure things out.

60.6 Checking Out on Remote Machines

To check out a repository on other machines, we are going to use the ability to run CVS over SSH. This requires two things:

The SSH server is up and running[Hack #71] on the machine that the repository is located on.
The CVS_RSH environment variable is set [Hack #52] on the client machine that we are going to check out the repository onto.

There are a few different ways you can satisfy the second requirement. You can set the environment variable on the command line with the setenv command. To do this, simply execute the following line:

[Titanium ~/tmp] duncan% setenv CVS_RSH ssh

Of course, this will soon become annoying, as you'll always have to remember to execute this command. You could always set it in your ~/.tcshrc file, but the better option is to set it in your ~/.MacOSX/environment.plist file. This will make sure that it is set for every application that runs, allowing programs that have built-in CVS integration, such as Project Builder, to use your repository seamlessly. All you need to do is create the ~/.MacOSX directory (if it doesn't exist) and save the following as your environment.plist file:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist SYSTEM "file://localhost/System/Library/DTDs/PropertyList.dtd">
<plist version="0.9">
<dict>
<key>CVS_RSH</key>
<string>/usr/bin/ssh</string>
</dict>
</plist>

This is by far the best solution, although you'll need to log out of your machine and back in for it to take effect.

Once you've done this, you're ready to check out the repository. To do so, we're going to use a variant of the cvs checkout command that we used before that will tell CVS that our repository is located on a different machine. This command is of the form cvs -d :ext:[user]@[machine]:[repository directory] checkout. On my machine, I execute the following:

[Titanium ~/tmp] duncan% cvs -d :ext:duncan@Mercury.local:/Library/Depot checkout .

Once again, don't forget the dot at the end! If this is the first time that you've used SSH between your machines, you'll see some output asking if you are sure you want to connect. You will then be challenged for your password for the machine containing the repository. After that, the files will be checked out as before.

There is another way to access a CVS repository remotely (called pserver access), but it is more difficult to set up and not as secure for our purposes. If you'd like to set up a CVS pserver, consult a good CVS book (see the See Also section later in this hack).

60.7 Day-to-Day Use

Now that we've successfully checked out the repository onto two machines, we're ready to start using CVS for our files. The rest of this hack will give you the basic commands you need to work with your new repository.

60.7.1 Adding files

Let's say that we want to keep some pictures in the repository. To do so, we create a Pictures subdirectory in our checked-out copy of the repository, copy the images into it, and then add the files to CVS. The following commands illustrate how we do that:

[Mercury:~/tmp] duncan% mkdir Pictures
[Mercury:~/tmp] duncan% cp ~/Pictures/me.jpg Pictures/me.jpg
[Mercury:~/tmp] duncan% cvs add Pictures
Directory /Library/Depot/Pictures added to the repository
[Mercury:~/tmp] duncan% cvs add Pictures/me.jpg
cvs add: scheduling file 'Pictures/mejpg' for addition
cvs add: use 'cvs commit' to add this file permanently
[Mercury:~/tmp] duncan% cvs commit -m "Sync"
cvs commit: Examining .
cvs commit: Examining CVSROOT
cvs commit: Examining Pictures
RCS file: /Library/Depot/Pictures/me1.jpg,v
done
Checking in Pictures/me.jpg;
/Library/Depot/Pictures/me.jpg,v <-- me.jpg
initial revision: 1.1
done

To check out the file onto the other machine, we issue the cvs update command as follows:

[Mars:~/tmp] duncan% cvs update -d

The -d option to the update command tells CVS to check out any subdirectories that were added since the last time we performed an update. You should see the following output:

cvs update: Updating .
cvs update: Updating CVSROOT
cvs update: Updating Pictures
U Pictures/me.jpg

Voila! Your data is now mirrored and updated between multiple machines. Anything you add to one machine will appear on other machines. All you need to remember to do is to add files to the repository, commit any changes you make, and regularly run the cvs update -d command.

60.7.2 Removing files

Occasionally, you'll want to remove a file from the repository. To do so, simply remove the file from your local copy, then issue a cvs delete command. Here's an example:

[Mercury:~/tmp] duncan% rm Pictures/me.jpg
[Mercury:~/tmp] duncan% cvs delete Pictures/me.jpg
cvs remove: scheduling 'Pictures/me.jpg' for removal
cvs remove: use 'cvs commit' to remove this file permanently
[Mercury:~/tmp] duncan% cvs commit -m "Sync"
vs commit: Examining .
cvs commit: Examining CVSROOT
cvs commit: Examining Pictures
Removing Pictures/me.jpg;
/Library/Depot/Pictures/me.jpg,v <-- me.jpg
new revision: delete; previous revision: 1.1
done

Moving files is a pain with CVS. There is no cvs move command, so you have to delete the file from where it was and add it to wherever else you want it to be.

60.8 See Also

This hack gets you started with using CVS to manage your data. However, at some point you'll probably want to dig deeper into what CVS can do. The following resources can be of help:

CVS Pocket Reference (http://www.oreilly.com/catalog/cvspr/) by Gregor N. Purdy (O'Reilly). This small and affordable guide gives you the complete list of CVS commands and options to those commands.
The CVS web site (http://www.cvshome.org/) contains the source code for CVS, FAQs, and the 184-page official user manual for CVS by Per Cederqvist et al.

?James Duncan Davidson