Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Linux Software

Linux Backups Made Easy 243

mfago writes "A colleague of mine has written a great tutorial on how to use rsync to create automatic "snapshot-style" backups. Nothing is required except for a simple script, although it is thus not necessarily suitable for data-center applications. Please try to be gentle on his server: it is the $80 computer that he mentions in the tutorial. Perhaps try the Google cache." An excellent article answering a frequently asked question.
This discussion has been archived. No new comments can be posted.

Linux Backups Made Easy

Comments Filter:
  • thank you... (Score:3, Insightful)

    by cmckay ( 25124 ) <cameron.mckay@co ... u ['ora' in gap]> on Saturday September 07, 2002 @12:12PM (#4212589) Homepage
    ...for posting a link to the Google cache in the story description on the main page! mfago, you are a genius!

    Perhaps more article submitters (or editors) could add these links more frequenly?
    • Re:thank you... (Score:2, Interesting)

      by zrodney ( 253699 )
      Google cache

      yes -- that was a refreshing change from the
      usual postings where the page is /.ed . thank you!
    • Also wait for Google blocking requests refered from slashdot.org when they find out how much bandwidth is at stake :)

      I prefer the idea that has been suggested by many previously, putting copies of linked articles right here on Slashdot.
    • Re:thank you... (Score:2, Insightful)

      No way, I'd rather Joe Blow's server go down than waste google's bandwidth. Google doesn't have any ads on their cache pages. Slashdot should setup their own caching, or pay for a caching service, if they want to link it from the main page.
      • Re:thank you... (Score:4, Insightful)

        by Anonymous Coward on Saturday September 07, 2002 @01:38PM (#4212888)
        Are you serious? Crush some guys server rather than using the publically available Google copy, because the Google page DOESN'T HAVE ADS?????? Who pays this guy for his server and bandwidth?? Do you make sure every page you view has ads on it? Are you a marketing exec or something??

        This "ads pay for everything on the internet" mentality is INSANE!!
      • Yes, lets not take unfair advantage of the hapless fools at google who aren't putting ads on their cached pages. Give me a break. Last I checked, those cached pages start with some text about google, and they contain google.com in the URL. EYEBALLS ==REVENUE for Google.

        I'm sure they're perfectly happy to get the exposure from Slashdot linking to their cache. If they weren't I'm sure their programmers could figure out if ($ENV{'REFERRER'}=~/slashdot/i) {print "Content-type:text/plain\n\nplease don't link directly to our cache.";}
    • by mikerubel ( 606951 ) on Saturday September 07, 2002 @03:49PM (#4213281) Homepage
      This slashdotting comes as a bit of a surprise; many readers have sent me improved scripts that I haven't quite gotten around to posting yet. I'll try to put them up later this weekend when the slashdotting dies down.

      The site was never down; it's just that my roommate, a windows user, noticed the connection was slow and reset the cable modem. He's quite upset about being unable to play Warcraft III. :)

      I've never had a slashdot nick before, so I just created this one today. I'll try to go through some of the comments and provide useful feedback.

      Thanks for your interest everyone!

      Mike

      • I know that listing my actual backup configuration here is a security risk; please be kind and don't use this information to crack my site. However, I'm not a security expert, so if you see any vulnerabilities in my setup, I'd greatly appreciate your help in fixing them. Thanks!

        First suggestion: Don't list your actual backup configuration.
  • First Mirror (Score:4, Informative)

    by doublem ( 118724 ) on Saturday September 07, 2002 @12:13PM (#4212590) Homepage Journal
    I had the chance to be the first post, but decided to mirror the site first.

    My mirror is here [matthewmiller.net]
  • 'man dump' (Score:2, Interesting)

    What's wrong with dump? It works great, and you can send stuff to gzip, bzip2, etc for data compression... even pipe the stuff over ssh to a server somewhere else. Dump also supports incremental backups. It also works on a lower level than rsync (which works on the filesystem level) and supports multiple volumes easily.
    • Why not read the article? Then you'll see why the author thinks rsynch is a better tool for network-based backups. You may not agree with the author but if you actually took the time to read the article you'd see that he is fully aware of the existence of dump
    • Dump works well, except if you accidently created a filesystem using reiserfs before you deciede on the backup method...I don't like using tar for backups, it just doesn't work quite as nicely (and takes too many arguments to get a clean backup). Dump works low level enough to make everything soo simple.
    • The other thing I forgot to mention is that rsync does not support incremental backups. Sure, it will incrementally update the tree on the other end, but it will not allow you to go back to your filesystem snapshot from last saturday if you have done an rsync of your data since that point. It doesn't effectively keep a backup of old data, it just syncs the current data. This would make it difficult to recover from, for example, a box that was hacked and trojanned last week when you've done an rsync since.
      • Re:'man dump' (Score:2, Informative)

        by GigsVT ( 208848 )
        Read the fucking article, that's the point. He uses hard links to make a second copy of the backed up directory, exploiting the fact that rsync always unlinks before changing a file, thereby effectively doing incremental backups without wasting hard disk space.
        • There's no need to get abusive, geesh... grow up.

          I specifically said "rsync" does not support that. Whether he fools around with a script to do it for him, is another story. MY POINT is that dump does this ALL FOR YOU, and would have required LESS time to implement and would be a more reliable solution, as it was designed for doing filesystem backups!
          • Re:'man dump' (Score:4, Informative)

            by GigsVT ( 208848 ) on Saturday September 07, 2002 @12:45PM (#4212704) Journal
            It's an expression, it's not particularly abusive.

            rm -rf backup.3
            mv backup.2 backup.3
            mv backup.1 backup.2
            cp -al backup.0 backup.1
            rsync -a --delete source_directory/ backup.0/

            There. That's the script basically. Add more snapshot levels as needed, stick it in cron at whatever interval you need.

            dump only supports ext2/3. This supports any file system, and retreiving a file from backups is as simple as running "cd" to the directory of the snapshot you need and "cp" the file out.

            I run backups from Linux to IRIX and other UNIXs using gnu rsync and openssh. This little trick is going to be very handy for me. I can't waste my time worrying about which filesystem type the files came from originally.
      • Re:'man dump' (Score:2, Informative)

        by TarpaKungs ( 466496 )
        Hi

        rsync --backup-dir ...

        2 years ago I wrote a script to do pretty much what the linked product does - ie: maintain a duplicate set of data areas on another machine via rsync.

        I use the --backup-dir option to relocate copies of the files which the current rsync run would otherwise delete or modify.

        With a bit of rotation, we can have users helping themselves to a full view of their
        home directory as of last night and also be able to restore files effectively from each day of the week going back 7 days in our case.

        Sure does cut down on the number of tape restore requests.

        As mentioned it is incredibly efficient - we deal with about 900GB of data backed up in this way - but rsync actually transfers about only 10-30GB of differences each night.

        Only problem is my script was a crap prototype which is why I'm not letting anyone see it ;-)

        But I do have a design in my head for a more professional effort (will be opensourced) - I'm might even get enough peace at work to write it one day!
        • Re:'man dump' (Score:2, Informative)

          by mikerubel ( 606951 )
          Hi TarpaKungs,

          I was originally using the --backup-dir trick, and you're right, it allows you to back up the same data. The advantage to doing it as described in the article is that you get what appear to be full backups at each increment. This makes it simpler for your users, who can now think of the backup directories as full backups.

          Hope that helps--

          Mike

      • So Linux 2.4 was released with a major known bug in it which causes a critical backup feature to not work at all putting you at risk of losing all your work?

        I thought we beat up Billy Boy for doing that.
      • Comment removed based on user account deletion
        • Dump doesn't work with reiserFS, sync or no sync. AFAIK, it only works with the ext* systems, and it depends on the filesystem's internal structure being known. Low-level backups are bad.

          Tar or other systems that get the files through the regular file reading interface are better because they take advantage of the filesystem interface abstraction layer instead of going around it. That works well, and there's no reason to do backups otherwise. None. Not a single one. IMHO. :)
          • Dump doesn't work with reiserFS

            The fact that reiserfs doesn't include a "dump" of its own isn't a failing in "dump", but a failure of the ReiserFS developers.

            Yes, dump is and always was fs-specific. That's something that's always been understood.

            It's also the only way to back up ACL's and other extended metadata. Data backup is good, but file metadata is important, too. You wouldn't back up your data with no file names, would you? File names are a small part of the metadata associated with a file. Tar and cpio only get a subset of that data.

            As filesystems move toward extending the amount of metadata they store (ACL's, and extended attributes now, ReiserFS is moving toward ever more complex metadata), backup programs are going to have to be extended to store that information in the archives. Until they do, only dump is reliable.

            Spread the word.
            • My point is that going around the kernel-provided filesystem access methods is bad. Dump's *implementation* is a bad one. If there's data stored that can't be read using standard utilities and the standard filesystem interface, then it shouldn't need to be backed up.
              • The problem isn't that the data can't be read by standard interfaces, but that tar and cpio just don't know about them yet. ACL's, for example, are a critical feature, long missing from open source UNIX platorms (they're common on some other UNIX platorms, and of course NT).

                Tar and cpio back up the standard UNIX permission set, but that set is really inadequate. Until they can back up the full set of ACL's, they're basically useless on systems that use them.

    • one good reason... there is NO good HOW-TOs for dump on ext2/3... man is great, but somebody has to write something humanly readable/understandable for those of us who just don't bother to read all features of a command before using it.

      step 1) this
      step 2) that
      step 3) done

  • Works great! (Score:3, Insightful)

    by schmutze ( 205465 ) on Saturday September 07, 2002 @12:20PM (#4212617) Homepage
    I work with Mike and started using his scripts a while back for my own department. With HD space so cheap these days, it makes sense to have an online backup. Especially for those of us who can't afford a NetApp. It really saves time for restoring those every day user deletes. Way to go Mike!
  • I am the "computer guy" for a small company, and I use this method to make back-ups of our Samba file server. It's great! The main file server has Samba and everyone works off of it. The backup server has almost twice the disk space, but it doesn't really need that much. It never seems to be more than a couple of percent bigger. I keep 'snapshots' going back various time intervals up to a week, and do the tape backup off of the backup machine early in the morning. Thank you Mike Rubel!
  • So... (Score:1, Informative)

    by Squeezer ( 132342 )
    Slashdot is now a reference for tutorials? Ever try www.tldp.org or www.linuxtoday.com (they post links to tutorials).
  • I use tar to maintain critical daily backups. I am still pretty new to Linux, so does this essentially do the same thing?
    • by Anonymous Coward
      And people wonder why computer techs get a bad name.
      • Yet another Anonymous Coward spewed forth:

        [referring to using 'tar' to do daily backups]
        And people wonder why computer techs get a bad name.

        Eh? There's nothing wrong with tar per se. For example, let's say you want to transport your backups over a network securely (i.e., via ssh). Your choices are:

        1. Allow ssh access with no password (public-key access, preferably). I'm leery of this, because allowing anything like this to run automatically means entrusting all the auth data to the machine, where it can be compromised.

        2. Copy the backups asynchronously from making them, allowing user-initiated authentication. This was the approach I opted for when I had to put together a backup system overnight at one company.

        Couple of cron jobs that ran incremental tar's on a list of directories, storing them in the scratch partition with higher permissions (so user processes cleaning up after themselves couldn't nuke them accidentally). Then at my leisure I would run the transport script (mornings about 10 AM, typically) which would suck the backups across and copy them to the tape. This worked fine for the time the project was active. Note that I was backing up to tape, which meant I needed to manually rotate tapes anyway, so this system helped ensure that new backups didn't overwrite old ones if I came in late -- and we definitely did not want these backups exported to our network. I also had the advantage of only needing to worry about one server.

        Just because tar is old and a bit... esoteric at times, doesn't mean it's therefore automatically a stupid idea to use it. If it's what you know, and it gets the job done, there's no need to feel guilty about not using a fancier system. Even Linus likes tar, because it's rock-solid reliable.

        Now if you have (faint hope) a valid criticism of this guy's use of tar in his environment, then I'm all ears. But I doubt that, since he didn't give enough detail for you to have one.

        I don't know why I even bother with this given it's an AC post, except that assholes like this are a major reason why Linux advocates get a bad rep.

  • I use rdist to do much the same thing.
    A simple example for my home directory is:

    #
    # Make a local copy of the contents of the home directory.
    # Also make a local copy populated with hard links.
    #
    # This has the effect of preserving snapshots through time
    # without too much overhead. (Cost of hard links + changed files.)
    #

    ~ -> localhost
    install -oremove /misc/backup/current ;
    except ~/tmp ;
    cmdspecial ~ "DATE=`/bin/date +\"%%Y-%%m-%%d.%%T\"` ; cp -al /misc/backup/current /misc/backup/snapshot.$DATE" ;

    Note that I get dated backup directories, and that I can add as many "except" clauses as I want, so I don't need to backup junk directories.
    (.mozilla caches, etc.)
    My backup drive is mounted via automount, so it is rarely mounted. Just change "localhost" to host the backup on another machine.
  • seems reasonable. tar would back up files, and dd, unreal as the syntax is, would also do the same thing.

    I guess the whole thing goes to prove that, within anything computer related, there is more than one way to do it. Clever tutorial, gang. =^_^=

  • Check out glastree (Score:3, Informative)

    by Soylent Beige ( 34394 ) on Saturday September 07, 2002 @12:47PM (#4212715)
    Been using a script called glastree on several production file servers for quite some time now.
    It work just great! At one site I've got about 7 weeks of depth from 3 different servers all
    mirrored via ssh-nfs on one lowly Penti 133. We still spin tapes mind you, but glastree has
    been flawless.

    Been meaning to buy the author a virtual beer for some time now . . .

    http://igmus.org/code/

    From the website:
    'The poor man's daily snapshot, glastree builds live backup trees, with branches for each day. Users directly browse the past to recover older documents or retrieve lost files. Hard links serve to compress out unchanged files, while modified ones are copied verbatim. A prune utility effects a constant, sliding window.'
    --
    • 'The poor man's daily snapshot, glastree builds live backup trees, with branches for each day. Users directly browse the past to recover older documents or retrieve lost files. Hard links serve to compress out unchanged files, while modified ones are copied verbatim. A prune utility effects a constant, sliding window.'

      It looks like this might be almost the exact same thing as is linked in the article. It's the same basic premise.
  • We have a hybrid network of Win2k and Linux servers at work. Our backup server is a Win2k box with a little over a terabyte of storage. We have an internal "utility" Linux box that is running Samba and rsync. For our production Linux boxes, we only have rsync to use for backups. The interesting way we back up the production boxes is by rsyncing to a backup share on the Win2k backup server that is mounted on the utility box using samba. At first, we thought that this would make things kinda slow, but actually they run at full speed.

    Just thought I'd share our little Linux backup experience.
  • Somewhere, Michael is laughing his butt off at having slashdotted his friend, especially considering he posted the story with a mirror so 'it's not his fault'.*

    *(change all pronouns to the appropriate gender)

  • Does anyone know how something like this would work on Mac OS X? The backup utility is the only thing I like about their .NET^H^H^HMAC service.

    • SilverKeeper (http://www.silverkeeper.com/ [silverkeeper.com]) from LaCie is the only free backup solution for OS of which I am aware. While not as full-featured as Retrospect, it's not bad if all you want to backup is your /User directory and maybe a few other things. You can set up specified things to backup, and then restore them or synchronize them. (In fact, its synchronization feature makes it extremely handy with an iPod, where you can use it to ensure that the Documents folder on both devices is always the same without having to delete the folder on the iPod and then recopy it each time.) If you need to backup everything on the disk, however, pretty much your only choice is going to be to use the extremely buggy ditto command with the command line utilities for manipulating .dmgs, or alternatively to purchase Retrospect. .Mac's backup solution is awful and does not seem, IMVHO, to offer anything over SilverKeeper. You'd be better of spending that $100 on Retrospect anyway if that is the only thing you are interested in.
    • You can do the exact same thing in OS X. You just need to get rsync, which can be installed as part of fink [sourceforge.net] if you have it.

      Otherwise, you can get the rsync sources yourself and build it without too much trouble.

    • My version of OSX has cron and rsync. I dpon't see why it wouldn't work. It should even be possible for someone to write a simple GUI for those that would like that.
    • This will work fine with OS X if you use UFS.

      This won't work with HFS because of the file forks. If you use UFS with OS X, the file forks appear as normal files. Eg, if you have a file named "foo", "._foo" is the resource fork. I don't know where they keep the finder fork, and I've never cared to investigate.

      Here's a tip if you have to use OS X for a file server of any kind: use two partitions (or two disks), one HFS and one UFS. The OS and any applications are installed on the HFS partition and all data goes on UFS. Use HFS for the OS because a lot of stuff breaks when running under UFS and UFS performance is still roughly twice as bad as HFS in 10.2 (run your own little benchmarks if you don't believe me). Keep user data on UFS so you can use tools like tar, rsync, etc. to back up and manipulate files. Remember, tar won't work on most HFS files (those with forks). If you're deploying OS X Server, you should definitely keep user data on a separate partition anyway since any tiny little mistake (eg, LDAP typo in Directory Assistant) will require a reformat-reinstall.

      Another tip: if you create a tarball off a UFS filesystem and then untar that onto a HFS filesystem, it will preserve the forks correctly. This has come in quite useful in making "setup" scripts for end-user machines, where all the applications to install are stored in tarballs created on a UFS machine and you can untar them onto the target HFS machine (the advantage is that you can script this - add in a couple of niutil commands and you can recreate a user machine in a couple minutes from one script).

      I have a couple of OS X Server machines (bosses like the GUI user management stuff). I just tried rsync over NFS to a Linux box and it works fine since the data is on a UFS partition on the OS X Server box. PITA to set up an NFS share remotely (since I don't have Macs at home -> no Remote Desktop, no usable VNC servers for OS X -> have to do it over ssh -> must figure out how NFS exports are stored in netinfo -> gnashing of teeth), but it works and I might try this little trick next week since we're not doing anything systematic for backups on the OS X boxen.

      Also, radmind [umich.edu] is a great tool for managing filesystems of OS X client machines. It supports HFS (by using AppleSingle internally).

  • by Anonymous Coward
    Backups are for wimps. Real men upload their data to an FTP site and have everybody else mirror it.
  • by Anonymous Coward on Saturday September 07, 2002 @12:54PM (#4212739)
    This sounds great, I would like to thank the author for the article. Only one thing really should be added. The way that you should do rsync for a back up server is to do rsync over ssh with a passwordless connection. (see http://www.unixadm.net/howto/rsync-ssh.html with google cache [216.239.53.100])

    Also, it should probably also be done from the real server to the backup server so that you can not just break one machine and get into all. (if you break into the real machine as root then you should be able to get into the backup machine)

    This allows the backup machine to have only one open port. ssh which can be tcpwrapped to allow connections only from the machines that it backsup.

  • by Dr. Awktagon ( 233360 ) on Saturday September 07, 2002 @12:56PM (#4212748) Homepage

    I've been doing backups this way on Linux for aLongTime(tm). On FreeBSD I've also used dump/restore to an NFS-mounted RAID drive (does dump work okay on Linux these days? I've always been afraid to try it for some reason, maybe earlier versions weren't stable).

    rsync is just so cool. First of all, it can work over the network through ssh, or through it's own daemon (faster), or on a local filesystem. You can "pull" backups from the server or "push" them from the client. Over the network, it can divides the files into blocks and just sends the blocks that are different. It has a fairly sophisticated way to specify files to exclude/include (for instance, exclude /home/*/.blah/* can be used to not save the contents of everybody's .blah directory, but keep the directory itself). You can set up a script to just backup given subdirectories so you can checkpoint your important project without backing up the whole show. etc etc.

    I use it both to save over the network using the rsync daemon, and to a local separate drive. On a local drive it's great, because you can easily retrieve files that you've accidentally deleted, just using cp. It's also great for stuff like "diff -r /etc /backups/etc" to see if something changed.

    I never thought of his technique for incremental backups, but since it uses hard links, I wonder how that interferes with the original hard links in your files?? Looks interesting.

    There are many flags and options that rsync has, here are the ones I use to pull complete backups from another host onto a local drive (yeah --archive is a bit redundant here).

    rsync --verbose --archive --recursive --links --hard-links \
    --perms --owner --group --devices --times --sparse \
    --delete --delete-excluded --numeric-ids --stats --partial
    --password-file=/root/.rsyncd.password \
    rsync://backupuser@xyz.dom.com/full/ \
    /backups/systems/xyz/
  • by heydan ( 112791 ) on Saturday September 07, 2002 @12:57PM (#4212753) Homepage
    The backup scheme described here uses hard links to avoid storing multiple copies of identical files, but when a large file changes even in a small way it stores a whole fresh copy of that file. rdiff-backup is more efficient because it stores one complete copy of your current tree with reverse diffs that allow you to step back to previous versions if you need to. If a large file changes in a small way, only the reverse diff is stored to encode that. This is very handy for cases where, for example, a multiple megabyte e-mail inbox has had just a few kilobytes of new messages appended to the end (although the rsync/rdiff-backup algorithm is also efficient with changes in the middle of a file). Being more efficient in this way translates directly to an increase in the number of past versions you can fit in the same space which can make all the difference if it takes you a while to realize that a given file has been accidentally deleted or damaged.

    http://rdiff-backup.stanford.edu/
    • Thanks for mentioning this!

      Rdiff-backup is an excellent utility, and Ben Escoto (its author) and I link to each other. You must realize, though, that the purposes are different. Rdiff-backup is more space efficient for things like text, email, and so on. My rotating snapshot trick is less space-efficient, but much simpler for the average user to understand ("just go into your snapshot directory and copy the old file back into reality"). It works on all kinds of files, and barely touches the CPU (since it isn't doing diffs). I would use rdiff-backup for administrative backups of email, code, and that sort of thing, where text is involved and user restore is not an issue.

      Different tools for different jobs!

      Mike

    • I've used rsync for my backups until now, but I've downloaded rdiff-backup 0.9.5 and I love it already!

      New users: use the development version, it's a lot more efficient if you have a lot of small files, because it uses librsync instead of executing rdiff for each file. I've measured a factor 20 speedup on my devel directory!
  • I was about to start using --backupdir with my rsyncs to do incremental, but this is a lot more slick. Right now I just run it with --delete weekly, so my live backups vary from none to 7 days old for deleted files. We run tapes too, so it wasn't a big deal, but the tape robot is on the way out, so I needed to get true incrementals going soon.

    It's stories like this that keep me reading Slashdot. (Other than ranting on YRO stories, but that is no where near as cool as a neat trick like this)
  • by MadAndy ( 122592 ) on Saturday September 07, 2002 @01:09PM (#4212795)
    This method, like most backup solutions, doesn't take a backup as at a specific instant, but instead takes it over a period of time - the length of time required to make the backup, which can be a problem if the data being backed up is changing all the time.

    A few years ago I saw a neat (expensive!) disc array that could 'freeze' the disc image at a single point in time so that a backup could be taken from the frozen image. The backup software saw only the frozen image, while the rest of the OS saw the disc as normal including updates made after the freeze occurred. The disc array maintained the frozen image until the backup was complete, guaranteeing a true snapshot as at a specific instant in time.

    I wonder whether such a thing would be possible in software. Possibly it can even be done through cunning application of the tools that we already have. I imagined that you might be able to do something like it by extending the loopback device interface. Does anyone out there have any cunning ideas?

    • by gordon_schumway ( 154192 ) on Saturday September 07, 2002 @01:21PM (#4212827)
      Then you should check out LVM. From the LVM HOWTO [tldp.org]:
      A wonderful facility provided by LVM is 'snapshots'. This allows the administrator to create a new block device which is an exact copy of a logical volume, frozen at some point in time. Typically this would be used when some batch processing, a backup for instance, needs to be performed on the logical volume, but you don't want to halt a live system that is changing the data. When the snapshot device has been finished with the system administrator can just remove the device. This facility does require that the snapshot be made at a time when the data on the logical volume is in a consistent state, later sections of this document give some examples of this.
    • A few years ago I saw a neat (expensive!) disc array that could 'freeze' the disc image at a single point in time so that a backup could be taken from the frozen image. The backup software saw only the frozen image, while the rest of the OS saw the disc as normal including updates made after the freeze occurred. The disc array maintained the frozen image until the backup was complete, guaranteeing a true snapshot as at a specific instant in time.

      Sounds like the Network Appliance [netapp.com] Filer's "snapshot" feature, but less advanced. (You can also get exactly the feature described under Linux purely in software, via LVM, now.) Under the NetApp version, you gain an extra directory ".snapshot", which contains previous versions of each file. So, if you screw up editing some file (delete/corrupt it, whatever) you can just grab a previous snapshot copy. Like having a series of online backups - but without all the extra space+hardware needs. Like CVS, but without the hassle (or fine-grained control) of doing "commits". Just tell the Filer "take a snapshot now" and 30 seconds later, it's done. Or "take snapshots every hour".

      Neat feature - you could almost get this using LVM under Linux, but not quite...

      • This and the remote mirroring is why I love our netapps so much. I have never had to pull files from tape for anything that is on the netapp because we have it set to pull snapshots hourly during the day and each day for a week plus each friday for a month. This way you have tight granularity for the day and week and can still pull back a file from up to a month ago. I don't care that our net f880 cluster is around $150K for only 4TB of raw space, or about 2 TB of usable space, it pays for itself in lower admin time and the basically zero loss of data it provides (yes we still do tape backups but mostly for disaster recovery, like I said I have never in 2 years pulled anything from tape for the netapp.)
    • FreeBSD 5 will ship with UFS snapshots which will do what you want; it's also used to freeze the disk state for background fsck's, among other things. They're even stackable.
    • As others have noted, you can get snapshots using LVM.

      What I would really like, however, is the ability to have the file system keep versions of a file as the file is written to or deleted; I don't want a shapshot every hour, I want a new single-file snapshot for every change to the file. And I want to be able to set or clear an attribute to control which files/directories this gets done in (i.e., chattr +u [linuxcommand.org], which currently doesn't really do anything). And I want the old snapshots to age and vanish on their own, say, 3 days after they are made (or however many days the sysadmin chooses).

      Under Windows, with Norton Utilities, you can get this sort of functionality with the Norton Protected Recycle Bin. I have been wishing for this on Linux for quite some time.

      I remember reading about something called the "Snap filesystem" which would someday offer this, but I can't find anything about it now on the web.

      steveha
    • A few years ago I saw a neat (expensive!) disc array that could 'freeze' the disc image at a single point in time so that a backup could be taken from the frozen image.

      We used to do this years ago before any such "options" were provided by drive manufacturers.

      We were doing large Oracle backups, and there were issues with taking too much time to do a backup.

      What we did was to throw some extra drives into the (at the time, software) RAID, so that we had a mirror of what we wanted to backup. At backup time, we'd shut down the Oracle instance, break the mirror, and then re-start the Oracle instance. The whole procedure resulted in less than 2 minutes of downtime for the instance, which was more than acceptable. We'd then take the "broken" mirror, re-mount it under a "temp" mount point, and then take our time backing it up (it usually took about 6-8 hours). Once we were finished backing it up, we'd then re-attach the broken mirrors and re-silver it. This was all done via software RAID, before journalling was available.

      We did this about once a week, and it worked out great.

  • Not snapshots (Score:5, Informative)

    by Florian Weimer ( 88405 ) <fw@deneb.enyo.de> on Saturday September 07, 2002 @01:12PM (#4212804) Homepage
    The method Mike describes does not create snapshots, so you can't use it to create consistent backups: Files can be written while they are read by rsync, and lots of software (including databases) requires cross-file data consistency (some broken software even expects permanent inode numbers!). rsync can be used for backups (if you trust the algorithm), but in most cases, you have to do other things to get a proper backup.

    At home, I store xfsdump output encrypted with GnuPG on an almost public (and thus untrusted) machine with lots of disk space (on multiple disks). At work, I do the same, but the untrusted machine is in turn backed up using TSM. In both cases, incremental backups work in the expected way. Of course, all this doesn't solve the snapshot problem (I'd probably need LVM for that), but with the encryption step, you can more easily separate the backup from your real box (without worrying too much about the implications).
    • Re:Not snapshots (Score:2, Informative)

      by mikerubel ( 606951 )
      These are not snapshots in the sense of LVM or NetApp; they do not freeze the whole filesystem at a particular point in time between atomic transactions. This technique is a hack for something like a small-office file server. It helps deal with accidental deletions or overwrites, which seem to account for the majority of restore jobs. Think of it as an easier and more intuitive replacement for tar-to-tape. If you're running a database where every transaction counts, you'll need to spend the money and buy a more reliable system!

      Mike

  • by ywwg ( 20925 ) on Saturday September 07, 2002 @01:13PM (#4212808) Homepage

    if [ `df |grep /mnt/backup |wc -l` != "1" ]
    then
    echo Backup drive not mounted, skipping procedure
    exit 2
    fi
    cd /mnt/backup
    rsync -vaz --exclude-from=/root/exclude $1 $2 $3 $4 $5 / .


    where exclude =
    /mnt/cdrom /mnt/usb /mnt/backup /mnt/abyss1 /mnt/abyss2 /proc /tmp


    stick in a cronjob. you can also add --delete if you want. it's basic, but easy.
  • Please try to be gentle on his server: it is the $80 computer that he mentions in the tutorial.

    ... Slashdotted!!!

    Did anyone else think to themselves..I'm gunna click on that link just because it said go easy on it?

    #!/bin/sh rm -Rf /SAVE/bkup.tar.gz.5 mv /SAVE/bkup.tar.gz.4 /SAVE/bkup.tar.gz.5 mv /SAVE/bkup.tar.gz.3 /SAVE/bkup.tar.gz.4 mv /SAVE/bkup.tar.gz.2 /SAVE/bkup.tar.gz.3 mv /SAVE/bkup.tar.gz.1 /SAVE/bkup.tar.gz.2 mv /SAVE/bkup.tar.gz /SAVE/bkup.tar.gz.1 tar -zcf /SAVE/bkup.tar.gz /etc /var/spool/mail /home /var/www

    Then I have an FTP script that runs once per day on the OTHER server sitting there (dare I say, the MS box) that grabs the BKUP.TAR.GZ from the linux box..And does much the same as far as replication.

  • at 4.20? is that right? the tutorial he included on rsync alluded to this.

    I guess its better to trust your server at 4.20 than the operator. well, for many operators that is. even if its 4.20pm, I'd still prefer to let the machine do the critical work instead of some sysadmins. knowing what I know about many sysadmins at 4.20 that is..

    [hint: double entendre on 420. not sure if the author knew this or not. or maybe I just stated what was terribly obvious.]

    • You're right, 4:20 is a good time to do this. Let the system do the work for me while the sysadmin has a toke^H^H^H^H smoke. :)

      IMHO, this is a great solution - I've been looking for something like this for fuss-free backups at work. Viola.

      Being the only "computer guy" at work sucks ass when you're the programmer/sysadmin/engineer/tech. Gah.

  • I don't consider snapshot backups backups; they're snapshots.

    I've been using a utility called Flexbackup -- it's a perl script which will do multi-level backups (i.e. incremental), spew to tape or file, use tar, afio or dump and compression. Oh yes, and it will use rsh/ssh for network backups. I wish I could buy the author a beer or few but it seems to be unsupported now. Oh well.

    Email me if you want a copy and can't find it. I've also got a patch to fix a minor table of contents bug with modern versions of mt.

    • I don't consider snapshot backups backups; they're snapshots.
      Care to explain the difference for the uninitiated? Why can't a "snapshot" serve the same function as a backup?
      • Care to explain the difference for the uninitiated? Why can't a "snapshot" serve the same function as a backup?

        I didn't say it couldn't serve as a backup, but it's not a backup in the sense that I can keep the last 6 months' worth of changes and pull from any of them. With snapshots I need to either keep 6 months of full daily backups or postprocess the daily snapshots and turn them in to differential backups.

        An example might help. I do daily backups of our servers. Let's call the daily backups level 3 backups. Now each week I do a level 2 backup. Each month I do a level 1 backup, and every quarter I do a level 0 backup. Let's analyze:

        • Level 0 - full backup, every quarter
        • Level 1 - Monthly backup, just changes from the last month's backup
        • Level 2 - weekly, just changes from last week's backup
        • Level 3 - daily, just changes from yesterday
          • I store the Level 0 backups on DVD-(+?)RW, and the rest on two 6-tape magazines. Level 1&2 on DDS3 IIRC, and Level 3 on DDS. I can pull back any file changed in the last quarter, just like someone could pull back a file from a particular day in CVS.

        With full snapshot backups this would take an insane amount of disk space. As I said earlier I could postprocess the snapshots and create differential backups but why do the extra work when tar/afio does this automatically? RSync isn't that special, and with an incredible script like Flexbackup it's even less special.

        It would be great if rsync could tell the other end "this file has changed, here are the changes" and have the backing-up end copy the file and apply the changes -- i.e. allowing the creation of differential backups. That's not what it's designed for, though.

  • It seems that it would be much more efficient if each application handled its own backup scheme. I don't need to backup my whole drive. Certainly not my mp3s or my applications.
    • Anthony,

      You can exclude any part of the filesystem from the backups, or particular types of files, or files that match a particular pattern; see the "exclude" section in the rsync man page.

      I'm not sure I agree that applications should handle their own backups! Don't forget that applications are run as their owners, so if they are broken or hacked, they can destroy the backups too. Far better, I think, to have the backups removed where user-level processes can't touch them. And probably a lot simpler too!

      Mike

      • You can exclude any part of the filesystem from the backups, or particular types of files, or files that match a particular pattern; see the "exclude" section in the rsync man page.

        I don't know about you, but my filesystem certainly isn't organized enough for that to be useful.

        Don't forget that applications are run as their owners, so if they are broken or hacked, they can destroy the backups too.

        Well, I was thinking more along the lines of backing up to a third party server over the internet, in which case there wouldn't be permission to delete old copies until after a certain period of time. I dunno, in the case of my system, there's very little that needs to be backed up. In fact, I really can't think of anything.

  • I'm wondering what happens to the hard links when rsync decides it only needs to update part of a file. If it is guaranteed to write a brand-new file with the merged changes, that's good. If, on the other hand, it changes the backup file in-place, then all the older backups that are only hard links will also see those changes, and that's a Bad Thing.

    Anyone know anything about this issue? I can't find the necessary info in the rsync docs [anu.edu.au].

    Judging by the fact that this technique does seem to work, I presume that rsync never modifies a file in-place, but I wonder if that's a guarantee, or just the current behaviour?

    (Also, I am aware of the --whole-files command-line argument, but that's an orthogonal issue.)

    • I just found the answer looking through Mike Rubel's source code:
      # step 4: rsync from the system into the latest snapshot (notice that
      # rsync behaves like cp --remove-destination by default, so the destination
      # is unlinked first. If it were not so, this would copy over the other
      # snapshot(s) too!
      I wonder how he discovered this? I can't find it in the man page.
      • Re:The answer? (Score:2, Interesting)

        by mikerubel ( 606951 )
        I wonder how he discovered this? I can't find it in the man page.

        Rsync source code, then a lot of testing! :)

        Mike

        ps: You're right, if there is any change in the file, the original is unlinked first, then the new one is written over top of it. So it does work as advertised! Thanks for your help answering questions btw.

  • That's what I'm using at the moment. I use a cron job to throw all my important directories into my repository every night. Then I burn it onto an RW.

    This works because I don't throw my mp3/ogg, pr0n, etc into the repository. I'll have to figure out a new solution when I hit the 650MB/800MB limit, but it works for now. I'll probably just have my repository on a different computer and use ssh or a get another HD speciffically for backup purposes.

    I started using this system after reading the Pragmatic Programmer [addall.com]. They recommend throwing using CVS for everything that is important. It's great for more than just code. And this way, whenever I install a new distro, I have all my settings since I save my .emacs, .mozilla, .kde, .etc directories.

  • Backing up to your disk is all very good agains errors of manipuation, but what if the disk fails?

    And what about people like me who backup to a DLT (or whatever) tape drive? Not much use then either.

    In any case I don't see this as being extremely useful in the real world (i.e. beyond the casual backing up of a home machine)...

    • Hard drives come out as being much cheeper than tape even in the long run.You don't need removable disks, you just need to have the machine in a different building if possible. A tape library to hold the amount of data that I need to hold would be over 5K and then I would have to buy tapes which are around $100 a peice, that doesn't seem very economical to me being that for less money I can build two 1TB, and yes thats a T for terabyte, backup systems and put them both in separate buildings. That way if one completely fails I still have all of my backups.
  • I have a similar script called rsync-backup [stearns.org]. This one does automatic daily snapshots, works over ssh, and uses rsync and hardlinks (to save space), chroot, and an ssh forced command for security.

I've noticed several design suggestions in your code.

Working...