Joe St Sauver, Ph.D.
Director, User Services and Network Applications
joe@uoregon.edu
One of the special features of the new NetApp filers is their ability to take periodic read-only "snapshots" of the filers' contents, thereby providing protection against accidental file loss. While we've always done backups to magnetic tapes stored locally and offsite (and we continue to do so now), those mag tape backups were only taken once a day, and were really meant as insurance against catastrophic system failure rather than as a convenient "safety net" capable of easily handling backup and restoration of individual user files.
Reflecting that intent--as well as the staff time required to consult with users to determine what was lost, plus the time to retrieve, mount, and scan the tapes and then restore the files to disk--charges for file restoration from system mag tape backups have always been steep.
Moreover, retrieving a file from magnetic tape could potentially be delayed for days at a time, depending on when a given file happened to be lost. For example, a file that was accidentally deleted late on a Friday would generally not be able to be restored until Monday morning at the earliest.
The backup situation on Darkwing has improved significantly with our new filers' ability to take snapshots. Snapshots are fast, painless, and free--a great safety net we're happy to make available for your use on Darkwing. Now you can "go back in time" and transparently retrieve copies of files that remain unchanged from the way they appeared hours or days earlier. You can also recover accidental file deletions, botched editing sessions, and most other routine file-related misadventures, unaided and at your own convenience.[1]
Historical snapshots of your files are stored in a read-only .snapshot ("dot snapshot") subdirectory on your account, a subdirectory that's located immediately under your default home directory. If you'd like to see what that subdirectory looks like, log into Darkwing using ssh [2] , then type:
% cd $HOME/.snapshot <-- change to the .snapshot directory
% ls -la <-- see what files/directories are there
You'll notice that the .snapshot subdirectory has a series of additional
subdirectories:
hourly.0, hourly.1, hourly.2...through hourly.26, representing the most
recently taken snapshot (hourly.0) all the way down to a snapshot that was
taken 26 hours ago (hourly.26) nightly.0, the most recent nightly snapshot, through nightly.29weekly.0, the most recent weekly snapshot, through weekly.3When it's time for a new snapshot to be taken, each of the existing directories gets rotated, meaning each directory is renamed and moved. For example, when a new hourly snapshot is taken:
hourly.26) goes awayhourly.25 becomes the new hourly.26hourly.24 becomes the new hourly.25, and so forth, with the
most recently taken hourly snapshot becoming the new hourly.0A similar process happens for the nightly snapshots each night, and for the weekly snapshots each week.
NetApp makes snapshots in a very nimble and efficient way, taking advantage of the fact that while you may have hundreds or thousands of files in your account, most of them don't change very often. This allows the filer to simply store pointers to content, plus a relatively modest set of file changes, rather than physically replicating all your files on a character-for-character basis every time a snapshot is taken. ( For more information about how snapshots work, see "Snapshot™ Technology," http://www.netapp.com/products/software/snapshot.html)
Assume that over a course of a week or so you created a data file on Darkwing
called mydata.txt with several thousand observations, entering a few hundred
observations per day. While doing final editing on that dataset you accidentally
made a catastrophic mistake, a mistake that ruined hundreds of observations.
Moreover, assume you failed to notice that mistake before you saved your
file (although that devastating mistake quickly became apparent as soon
as you began to do your analyses). What to do?
If mydata.txt was okay as of the snapshot that occurred an hour or so ago, you could simply restore a clean copy of the file from the snapshot. (To avoid confusion, let's call the restored version of that file mydata2.txt).
Begin by logging into Darkwing securely via ssh. Then type:
% cd $HOME/.snapshot/hourly.0<-- change to the snapshot directory
% cp mydata.txt $HOME/mydata2.txt<-- copy the snapshot copy to your normal default directory
% cd $HOME<-- change to your default directory and begin using the recovered file
Need an earlier version? Restore the file from hourly.1 or hourly.2, (or from
daily.0, daily.1, etc.) instead of from hourly.0
If you're a UO user and have questions or comments about data snapshots on Darkwing, please feel free to contact me by writing joe@uoregon.edu We'd love to hear what you think about the new snapshot facility.
[1] Obviously, a user can restore a file from a snapshot only if (a) that file was on the filer long enough to be captured as part of a snapshot, and (b) the restoration is done while the snapshot is still available online.
This means that if you rapidly create a file and then accidentally delete it before it's been around long enough to have a snapshot made of it, the file cannot be restored from a snapshot.
Similarly, if you accidentally delete a file and then go abroad for three months, when you return the following quarter, the file's snapshot will no longer be available (although you may still be able to pay a fee to have the file restored from tape backups by Systems staff).
[2] If you don't know how to log in securely, please see the instructions at http://micro.uoregon.edu/security/ssh/