Forgot your password?
typodupeerror
Linux

Best Backup Server Option For University TV Station? 272

Posted by samzenpus
from the saving-the-reruns dept.
idk07002 writes 'I have been tasked with building an offsite backup server for my university's television station to back up our Final Cut Pro Server and our in-office file server (a Drobo), in case the studio spontaneously combusts. Total capacity between these two systems is ~12TB. Not at all full yet, but we would like the system to have the same capacity so that we can get maximum life out of it. It looks like it would be possible to get rack space somewhere on campus with Gigabit Ethernet and possibly fiber coming into our office. Would a Linux box with rsync work? What is the sweet spot between value and longevity? What solution would you use?'
This discussion has been archived. No new comments can be posted.

Best Backup Server Option For University TV Station

Comments Filter:
  • by Zlurg (591611) on Wednesday September 16, 2009 @10:43PM (#29449669)
    Holy crap we're approaching the need for an Ask Slashdot FAQ. I feel old.
    • Re:Done to death. (Score:5, Informative)

      by Magic5Ball (188725) on Wednesday September 16, 2009 @10:54PM (#29449739)

      Cue usual discussion about defining the problem correctly, choose the right tool for the job, etc.

      Specifically:
      "Would a Linux box with rsync work?" - It depends on the objective business requirements you've defined or been given. If those requirements include "has to be implemented on Foo operating system", then those requirements are not just for a backup solution.

      "What is the sweet spot between value and longevity?" - Simple: Graph accumulated TCO/time based on quotes from internal and external service providers. Throw in some risk/mitigation. Find the plot which best meets your cost/time/business requirements.

      "What solution would you use?" - Almost certainly not the solution you would use, because my needs are different. What is your backup strategy? What are your versioning requirements? What are your retention requirements? (How) do you validate data? Who should have access? What is an acceptable speed for access to archived data? What's an acceptable recovery scenario/timeline, etc.

      If you do not already know the answers to those questions, or how to find reasonable answers, ask neighboring university TV stations until you find one which has implemented a backup solution with similar business requirements as your's, and copy and paste the appropriate bits. You'll likely get better answers from people who have solved your exact problem before if you search Google for the appropriate group/mailing list for your organization's level of operating complexity, and ask there instead of asking generalists on slashdot, and hoping that someone from your specialist demographic is also here.

      • So, some quick answers here:

        "Would a Linux box with rsync work?" - It depends on the objective business requirements you've defined or been given. If those requirements include "has to be implemented on Foo operating system", then those requirements are not just for a backup solution.

        However, the fact that it's been suggested means it probably would work. A better solution (also old enough to be in the FAQ) is rdiffbackup.

        "What solution would you use?" - Almost certainly not the solution you would use, because my needs are different.

        True, you often need a custom solution. Just as often, a generic solution works. For much of the population, if they're on OS X, I'd say use Time Machine. If they like Internet backup, I'd say use Jungle Disk. And so on.

        In this case, yes, there are questions that need to be asked regarding the volume of data. But the differences between vari

        • by dgatwood (11270)

          In this case, for the Mac OS X installation, my answer would be the same as any other user (or at least the client side portion is the same):

          Time Machine to an XServe attached to a giant hardware RAID 5 array via fibre channel. In other words, the same way I back up my laptop except with a serious server providing the disk instead of an ABS....

          You should be able to back up the office file server to a Mac OS X Server box just as easily as you could back up to a Linux box, but the reverse isn't true. Backin

          • Backing up a Mac OS X installation with resource forks, extended attributes, etc. to a Linux box is nontrivial at best

            Depends what you need. If it's just someone's laptop, a raw disk image is still useful. If it's an external drive or a network share, you can format the drive such that it can be plugged directly into a backup server, and you can use a Linux fileserver.

            My choice would be a Mac laptop, a disk image, and a Linux fileserver for anything that won't fit on internal storage.

    • by neiras (723124) on Wednesday September 16, 2009 @10:55PM (#29449745)

      I feel old.

      Well, your UID makes you older than me.

      I SAID, YOUR UID MAKES YOU OLDER THAN ME.

      Also, my name is NOT "sonny boy", and this is my lawn, not yours. Where do you think you are, old timer?

    • Especially since this isn't even an "Ask Slashdot", it's in the "Linux" category. It's just the editors not reading their own site. "Throw this out there, this should be some red meat for the troops," that sort of thing.
  • by neiras (723124) on Wednesday September 16, 2009 @10:50PM (#29449713)

    Try one of these babies [backblaze.com] on for size. 67TB for about $8,000.

    There's a full parts list and a Solidworks model so you can get your local sheet metal shop to build cases for you.

    Talk to a mechanical engineering student on campus, they can probably help with that.

    • by Anonymous Coward on Wednesday September 16, 2009 @11:02PM (#29449785)
      You might have mentioned the Slashdot article [slashdot.org] on these from two weeks ago.
    • Different Solutions (Score:3, Informative)

      by Anonymous Coward

      My university is developing a local backup and co-location data center, and I have been one of the major forces in decided what software we go with. If you are looking for linux style freedom, as mentioned before, rsync is all you need. If you happen to be looking for something more professionly supported, there are many options, but I will tell you some of what I have seen. At significant cost, the primary system I run into is EVault, which works ok, is very stable, and doesnt have too many crazy features.

      • by mlts (1038732) * on Thursday September 17, 2009 @12:54AM (#29450553)

        Backups for UNIX, backups for Windows, and backups all across the board almost require different solutions.

        For an enterprise "catch all" solution, I'd go with TSM, Backup Exec, or Networker. These programs can pretty much back up anything that has a CPU, although you will be paying for that privilege.

        If I were in an AIX environment, I'd use sysback for local machine backups and backups to a remote server.

        If I were in a general UNIX environment, I'd use bru (it used to be licensed with IRIX, and has been around so long, it works without issue with any UNIX variant.) Of course, there are other solutions that work just as well, both freeware, and commercial.

        If I were in a solidly Windows environment, I'd use Retrospect, or Backup Exec. Both are good utilities and support synthetic full backups so you don't need to worry about a full/differential/incremental schedule.

        If I were in a completely mixed environment, I'd consider Retrospect (it can back up a few UNIX variants as well as Macs), Backup Exec, or an enterprise level utility that can back up virtually anything.

        Please note, these are all commercial solutions. Bacula, Amanda, tar over ssh, rsync, and many others can work just as well, and likely will be a lot lighter on the pocketbook. However, for a business, some enterprise features like copying media sets, or backing up a database while it is online to tape or other media for offsite storage may be something to consider for maximum protection.

        The key is figuring out what you need for restores. A backup system that is ideal for a bare metal restore may be a bit clunky if you have a machine with a stock Ubuntu config and just a few documents in your home directory. However, having 12 terabytes on Mozy, and needing to reinstall box from scratch that has custom apps with funky license keys would be a hair puller. Best thing is to use some method of backups for "oh crap" bare metal stuff, then an offsite service just in case you lose your backups at that location.

        Figure out your scenario too. Are multiple Drobos good enough, or do you need offsite storage in case the facility is flooded? Is tape an option? Tape is notoriously expensive per drive, but is very economical once you start using multiple cartridges. Can you get away with plugging in external USB/SATA/IEEE 1394 hard disks, backing to them, then plopping them in the Iron Mountain tub?

        • by cblack (4342)

          Do not consider Backup Exec in a partially Linux/UNIX environment or one with large numbers of data files.
          That is all.

        • VMWare Snapshots

          Are you backing up just data, or configurations or what? Backup Solutions are nice and all, but you're still missing something .... all the crap^H^H^H^H configurations that you've collected over the years of using that particular setup.

          And once you go to VMWARE (or other VM product) you'll quickly realize that the abstraction away from specific Hardware is very nice indeed.

          However, if one is REALLY concerned about backups, a duplicate Hardware setup in a seperate location sitting idle (or co

        • Re: (Score:3, Informative)

          by pnutjam (523990)
          for a multi-vendor environment, take a look at Unitrends [unitrends.com]. I use them and they are really sweet, disk to disk, any OS, bare-metal windows (and linux), hot swappable off-site drive or off-site vaulting. Plus, there is no charge for clients if you want to backup a database, or exchange server. It's all inclusive, even the open file client.
          In my experience, getting open files backed up is the hardest thing in a 24/7 environment.
    • by illumin8 (148082) on Thursday September 17, 2009 @12:03AM (#29450257) Journal

      Try one of these babies on for size. 67TB for about $8,000.

      There's a full parts list and a Solidworks model so you can get your local sheet metal shop to build cases for you.

      Talk to a mechanical engineering student on campus, they can probably help with that.

      Better yet, just subscribe to Backblaze and pay $5 a month for your server. Problem solved.

      • by Firehed (942385)

        For that much data, that's only a practical solution if you've got a dedicated 100Mbit or faster (1Gbit?) line just to upload. And downloading the data back is going to take quite some time as well.

        Plus I think the $5/mo is only for home/personal use - that tends to be the case with most of their competition at least.

      • by mlts (1038732) * on Thursday September 17, 2009 @12:31AM (#29450411)

        Remote storage at a provider like Backblace, Mozy, or Carbonite is a good tertiary level backup, just in case your site goes down, but you are limited by your Internet pipe. A full restore of terabytes of videos through a typical business Internet connection will take a long time, perhaps days. Of course, one could order a hard disk or several from the backup company, but then you are stuck waiting for the data to physically arrive.

        Remote storage is one solution, but before that, you have to have local ones in place for a faster recovery should a disaster happen. The first line of defense against hard disk stuff is RAID. The second line of defense would be a decent tape drive, a tape rotation, and offsite capabilities. This way, if you lose everything on your RAID (malware or a blackhat formats the volume), you can stuff in a tape, sit on pins and needles for a couple hours, and get your stuff back, perhaps back a day or two.

        For a number of machines, the best thing to have would be a backup server with a large array and D2D2T (disk to disk to tape) capabilities so you can do fast backups through the network (or perhaps through a dedicate backup fabric), then when you can, copy them to the tapes for offline storage and the tub to Iron Mountain.

        Of course, virtually all distributed backup utilities support encryption. Use it. Even if it is just movies.

      • Problem: Windows/Mac only.

    • Recommending a backup solution where if one power supply dies you immediately corrupt the entire array? Yeah, that's JUST what he needs...
      • So build two.

        A backup server doesn't need redundancy if it's a backup server.

        • by mysidia (191772) on Thursday September 17, 2009 @01:27AM (#29450711)

          The hard drives are desktop class, not designed for 24x7 operation. Not designed for massive write traffic that server backups generates.

          Latent defects on disks are a real concern.

          You write your data to a disk, but there's a bad sector, or miswrite, and when you go back later (perhaps when you need the backup), there are errors on the data you are reading from the disk.

          Moreover, you have no way of detecting it, or deciding which array has recorded the "right value" for that bit...

          That is, unless every bit has been copied to 3 arrays.

          And every time you read data, you compare all 3. (Or that you have two copies and a checksum)

          Well, the complexity of this redundancy reduces the reliability overall, and it has a cost.

          • That is, unless every bit has been copied to 3 arrays.

            3 arrays? Why? Do it in software, do RAID 5. How likely is it that you'll have a bad sector, or a miswrite, that hits both the stripes and the parity?

            you have no way of detecting it, or deciding which array has recorded the "right value" for that bit...

            Or use ZFS. It'll checksum everything, so yes, it'll know which array has the right value.

    • by drsmithy (35869)

      Try one of these babies on for size. 67TB for about $8,000.

      Although, if you want a solution that's fast and reliable, you probably shouldn't.

  • by belthize (990217) on Wednesday September 16, 2009 @10:51PM (#29449715)

    A couple of details you'd need to fill in before people could give legitimate advice.

    What's the rate of change of that 12TB. Is it mostly static or mostly dynamic. I would assume it's mostly write once read rarely video but maybe not.

    Do you have a budget ? As cheap as practical or is there leeway for bells/whistles.

    Is this just disaster recovery. You say if the station gets slagged you want a backup. How quickly do you want to restore. Minutes, hours, next day ?

    Do you need historical dumps ? Will anybody want data as it existed last month ?

    Is it just data you're dumping or some windows App complete with Windows registry junk that needs to be restored (don't know anything about Final cut pro)

    If you just want to dump data and restore isn't critical, you just need to be able to do it in some time frame then sure rsync'ing to some striped 6 (or 12) TB SATA array is plenty good.

    • by Krondor (306666) on Wednesday September 16, 2009 @11:25PM (#29449981) Homepage

      The parent is absolutely right. We don't have enough details to really make a recommendation, but if the question is 'can rsync replicate 12 TB with an average rate of churn over a 1 Gbps link reliably'? The answer is an emphatic and resounding YES!

      I used to maintain an rsync disaster recovery clone that was backing up multiple NetWare, Linux, Unix, and Windows servers to a central repository in excess of 20 TB over primarily 100 Mbps links. We found that our average rate of churn was 1% / day which was easily accomplished. It was all scripted out with Perl and would notify on job status each night or failures. Very easy to slap together and rock solid for the limited scope we defined.

      When you get into more specifics on HA, DR recovery turn around times, maintained permissions, databases and in use files, versioning, etc.. things can get significantly more complicated.

      • Re: (Score:3, Informative)

        by mcrbids (148650)

        I second that motion....

        We do something similar with rsync, backing up about 6-8 TB of data. We have php scripts that manage it all and version the backups, keeping them as long as disk space allows. Heck, you can even have a copy of our scripts [effortlessis.com] free of charge!

        With these scripts, and a cheap-o tower computer with huge power supply and mondo cheap, SATA drives, we manage to reliably backup a half-dozen busy servers off-site, off-network, to a different city over the Internet automagically every night.

        Yes, mo

      • Your analysis may not work in this case. This is not a backup system for a large number of business/educational users. It's for a relatively small number of video editing stations. One new video project can easily generate hundreds of gigabytes of new data that needs to be backed up. The average daily churn rate may be comparable, but the peak churn could well be many times that.

        Digitized video is not usually backed up the same way as conventional files or databases. Raw digitized video files do not c
    • by Anonymous Coward on Wednesday September 16, 2009 @11:42PM (#29450131)

      >

      Is it just data you're dumping or some windows App complete with Windows registry junk that needs to be restored (don't know anything about Final cut pro)

      If you think Windows registry junk could possibly be involved with Apple's pro video software, you are quite right, you don't know anything about it.

  • by mnslinky (1105103) * on Wednesday September 16, 2009 @10:52PM (#29449731) Homepage

    That's all you need. We even use a script to create versioned backups going back six months using perl as a wrapper.

    Assuming the same paths, edit to your liking. I've made the scripts available at http://www.secure-computing.net/rsync/ [secure-computing.net] if you're interested. It requires the system you're running the script for have root ssh access to the boxes it's backing up. We use password-less ssh keys for authentication.

    The README file has the line I use in my crontab. I didn't write the script, but I've made a few modifications to it over the years.

    • by moosesocks (264553) on Wednesday September 16, 2009 @10:57PM (#29449759) Homepage

      Actually, I'd suggest using OpenSolaris so that you can take advantage of ZFS. Managing large filesystems and pools of disks is *stupidly* easy with ZFS.

      You could also do it with Linux, but that would require you to use FUSE, which has a considerable performance penalty. I'm not sure about the state of ZFS on FreeBSD, although I imagine that the Solaris implementation is going to be the most stable and complete. (For what it's worth, I've been doing backups via ZFS/FUSE on Ubuntu for about a year without any major problems)

      • Re: (Score:2, Interesting)

        by Anonymous Coward

        Actually, I'd suggest using OpenSolaris so that you can take advantage of ZFS. Managing large filesystems and pools of disks is *stupidly* easy with ZFS.

        You could also do it with Linux, but that would require you to use FUSE, which has a considerable performance penalty. I'm not sure about the state of ZFS on FreeBSD, although I imagine that the Solaris implementation is going to be the most stable and complete. (For what it's worth, I've been doing backups via ZFS/FUSE on Ubuntu for about a year without any major problems)

        The FreeBSD port of ZFS actually works pretty damn nicely. I'm using a RAID Z configuration on my FreeBSD 7.2 server and it works great!

    • by fnj (64210)

      We even use a script to create versioned backups going back six months using perl as a wrapper.

      Kudos for publishing the code! Can you comment on your script vs rsnapshot [rsnapshot.org], which is an established incremental rsync based solution which also uses hard links to factor out unchanging files? Rsnapshot is also a perl script, by the way.

      • Well I can see mnslinky's script can be used for offline backups, but I don't think rsnapshot does this, (but I haven't studied either in-depth.).

        My host uses rsnapshot, and it is very convenient to use.

  • by darkjedi521 (744526) on Wednesday September 16, 2009 @10:55PM (#29449747)

    Does your university have a backup solution you can make use of? The one I work at lets researchers onto their Tivoli system for the cost of the tapes. I think I've got somewhere in the neighborhood of 100TB on the system and ended up being the driving force behind a migration from LTO-2 to LTO-4 this summer. If you are going to go and role your own and use disks, I'd recommend something with ZFS - you can make a snapshot after every backup so you can do point in time restores.

    Also, I'd recommend more capacity on backup than you have now to allow versioning. I was the admin for a university film production recently (currently off at I believe Technicolor being put to IMAX) and I've lost track of the number of times I had to dig yesterday's or last week's version off of tape because someone made a mistake that was uncorrectable.

    • by Cato (8296)

      Having looked a bit at ZFS, it really needs x64 hardware with plenty of RAM (2GB plus), and Solaris has by far the best implementation. FreeBSD 7.x is next, followed by Linux's ZFS/FUSE. All IMHO, but from the reports I've seen it's a bit early to trust it on a non-Solaris platform and even on Solaris there are some bugs. (All IMHO, and there are production users on FreeBSD who are happy with it.)

      LVM on Linux lets you do snapshots, but after losing thousands of files and several LVM logical volumes, inc

  • Just build a clone (Score:4, Insightful)

    by pla (258480) on Wednesday September 16, 2009 @11:03PM (#29449795) Journal
    What solution would you use?

    First of all, I love linux. Use it for my own file servers, and media machines, and routers, and pretty much everything except desktops.

    That said...

    For your task, I would probably just build an exact duplicate of the "real" machine and sync them nightly. Always keep in mind that if you have no way to quickly recover from a disaster, you don't actually have a backup.


    That said, and if possible, I would also build the "backup" machine with more storage than the "real" machine. As someone else pointed out, you'll probably discover within a few days that your food-chain-superiors have no concept of "redundancy" vs "backup" vs "I can arbitrarily roll my files back to any second in the past 28 years". Having at least nightly snapshotting, unless your entire dataset changes rapidly, won't eat much extra disk space but will make you sleep ever so much better.
    • by SheeEttin (899897)

      For your task, I would probably just build an exact duplicate of the "real" machine and sync them nightly. Always keep in mind that if you have no way to quickly recover from a disaster, you don't actually have a backup.

      Of course, the only problem with that is if you have a hardware failure on-site, the backup, being built of the same thing, is probably going to fail about the same time.

      • Good point - that's why all the disks in a RAID array should come from different manufacturers or at least different batches/manufacturing plants + your 'spare' server should be a different brand or made from different components.

        In the mid 90's I was working for a training company in London and they hosted all their training data, courseware, disk images etc. on a big RAID 5 array with 5 disks. One day, the tech guy arrived at work to discover the drive bearings had seized on 3 disks!

    • First of all, I love linux. Use it for my own file servers, and media machines, and routers, and pretty much everything except desktops.

      Why wouldn't you use it for your desktops?
      • Re: (Score:3, Insightful)

        by petrus4 (213815)

        Why wouldn't you use it for your desktops?

        Linux still doesn't have the "interface complexity vs implementation complexity," problem completely balanced on the desktop, just yet; although then again, to be fair, neither does anyone else. (Except maybe Apple, and that's a maybe)

        Ubuntu can make a very pretty looking desktop, but updates will often hose the entire system, and in my experience, it can also crash if you give it a hard look.

        On the other hand, you can use LFS, Slack, or Arch to make yourself something extremely hardware efficient and robu

    • Re: (Score:2, Insightful)

      by atarashi (303878)

      Well, first you would need to define goals.
      What do I want to backup? (only Data, or OS + Apps + Data)
      Is my Data rather static or does it change a lot?
      How fast does it change?
      Do I have enough bandwidth to cope with the backup? (12TB is a lot! It would take more than a day to copy it over a GBit link... so, how much of the data changes over a day?)
      Do i need daily backups? or even hourly?
      How fast do i need to restore everything?
      Do i need different versions? (Then the needed storage might be much higher than 12

  • by Z8 (1602647) on Wednesday September 16, 2009 @11:05PM (#29449805)

    You may want to check out rdiff-backup [nongnu.org] also. It produces a mirror like rsync, and uses a similar algorithm, but keeps reverse binary diffs in a separate directory so you can restore to previous states. However, because it keeps these diffs in addition to the mirror, it's better if you have more space on the backup side.

    There are a few different frontends/guis to it but I don't have experience with them.

    • Re: (Score:3, Interesting)

      by metalhed77 (250273)

      I love rdiff backup but I'd never use it on any large datasets. I attempted to use it on ~ 600 GB of data once with about 20GB of additions every month and it ran dog slow. As in taking 6+ hours to run every day (there were a lot of small files, dunno if that was the killer).

      For larger datasets, like what the poster has, I'd go with a more comprehensive backup system, like bacula. I use that to backup about 12TB and it's rock solid and fast. There's a bit of a learning curve, but the documentation is very g

    • by pla (258480)
      You may want to check out rdiff-backup also. It produces a mirror like rsync, and uses a similar algorithm, but keeps reverse binary diffs in a separate directory so you can restore to previous states.

      Seriously people, learn the tools you have available on any stock Linux system.

      Even assuming you run a much older system with an FS that doesn't support online snapshotting... "cp -al <source> <destination>". Period.
  • Why a backup server? (Score:3, Interesting)

    by cyberjock1980 (1131059) on Wednesday September 16, 2009 @11:08PM (#29449845)

    Why not a complete duplicate of all of the hardware? If the studio combusts you have an exact copy of everything.. hardware and all. If you use any kind of disk imaging software, you can simply recover to the server with the latest image and lose very little data.

  • lose the drobo (Score:2, Informative)

    i recommend losing the drobo as fast as you can - i know 4 people who bought these and all 4 lost data in the first year.
    • Re: (Score:3, Interesting)

      by mlts (1038732) *

      I have not heard of any catastrophic data losses firsthand, but I don't like my data stored in a vendor specific format I couldn't dig out by plugging the component drives into another machine.

      If you are a homebrew type, you might consider your favorite OS of choice [1] that can do software RAID, building yourself a generic server level PC, and use that for your backups. This way, when you need more drives, you can go to external SATA frames.

      [1]: Almost all UNIX variants support RAID 5, Linux supports RAI

  • by Len (89493) on Wednesday September 16, 2009 @11:10PM (#29449861)
    Everything your TV station broadcasts will automatically be backed up here. [mininova.org]
  • BackupPC (Score:4, Informative)

    by dissy (172727) on Wednesday September 16, 2009 @11:15PM (#29449887)

    What I use is BackupPC [sf.net]. It's a very nice web front end to tar over ssh.

    For linux, all the remote servers need are sshd listening somewhere, and with the backuppc servers public key in an authorizedhosts file. It will pipe tar streams over an ssh connection.

    For windows, it can use samba to backup over SMB

    I run a copy on my home file server, which backs up all the machines in the house, plus the couple servers I have out in colo.

    When it performs an incremental backup, after it is done it will populate its timestamped folder with hardlinks to the last full backup for duped files. so restoring from any incremental will still get the full version no matter when it was last backed up.

    Also after each backup, it will do 2 hashes on every file and the previous backup. If the files match, it deletes the second copy and again hardlinks it to the first copy of the file.
    I have nearly 3 months worth of backup retention, backups every 3 days (every day on a couple), but for the base system and files that rarely change, each 'copy' does not take up the same amount of disk space.
    It is very good at saving disk space.

    Heres some stats from its main page as an example

    There are 7 hosts that have been backed up, for a total of:
            * 26 full backups of total size 38.34GB (prior to pooling and compression),
            * 43 incr backups of total size 0.63GB (prior to pooling and compression).

    Pool is 10.11GB comprising 108499 files and 4369 directories (as of 9/16 01:00),

    Restoring gives you a file browser with checkboxes. after you tell it what you want, it can send you a tar(.gz) or .zip file, OR it can directly restore the file via tar over ssh back to the machine it was on, by default in the original location but that can be changed easily too.

    The main downside is the learning curve. But once you get things down, you end up just copying other systems as templates, updating the host/port/keyfile/etc settings.
    Also, with all those hard links, it makes it a pain to do any file/folder manipulation on its data dir.
    Most programs won't recognize the hard link and just copy the file, easily taking up the full amount of storage.

    But works just as well with only itself and one remote server.
    schedule it to start at night and stop in the morning, set your frequency and how much space to use before it deletes old backups, and let it run.

    • We backup 15TB nightly (using tar over NFS) with BackupPC running on two servers each with 10TB of storage pulling data from a high performance NAS (BlueArc). We retain 30 days of incremental backups and do a full for the various home directories every 30 days.
    • Re:BackupPC (Score:4, Informative)

      by IceCreamGuy (904648) on Thursday September 17, 2009 @12:35AM (#29450435) Homepage
      I couldn't agree more; BackupPC is really great. Not only does it support Tar over SSH and SMB, but it also supports rsync over SSH, rsyncd and now in the new beta, FTP. I backup everything to a NAS and then rsync that every weekend to another DR disk (you have to be careful about hardlinks when copying the pool, since it uses them in the de-duplication process). There are several variants of scripts available on the wiki and other sites for initiating shadow copies on Windows boxes, and with a little tinkering you can even get that working on Server 2008, though of course it really shines with *nix boxes. Highly recommended - the only drawbacks are that, as the parent mentioned, the learning curve can be intimidating at first, and the project has been pretty quiet the past few years since the original developer stopped working on it. Amanda (the MySQL backup company) seems to have picked it back up and they are the ones who released the most recent beta. Did I mention it has a really convenient web interface, emails about problems, auto-retries failed backups (while it's not in a blackout period), and somebody wrote a great Nagios plugin for it? I'm pretty sure I did, oh yes definitely.
    • by JayAEU (33022)

      Very true indeed, BackupPC really is a one-stop solution for doing sensible backups of any number of hosts (local or remote) over a long time. The learning curve isn't as steep anymore, since they introduced a more capable web interface.

      I also have yet to see another program that does what BackupPC does any faster.

    • by j_sp_r (656354)

      I switched away from BackupPC because the archive size started exploding. Lots of large files getting changed a little was the main reason I think. Also, copying the backup pool takes ages

    • Re: (Score:2, Interesting)

      by miffo.swe (547642)

      I love BackupPC more today than ever. I had a run with some of the more often used commercial offerings and the grass is NOT greener on the other side. Despite fancy wizards and support BackupPC beats any one of them anytime.

      I backup about 230 GB of user data each night and still the pool is only 241 GB after many months of use.

      "There are 6 hosts that have been backed up, for a total of:

      * 51 full backups of total size 1895.95GB (prior to pooling and compression),

  • by Jeremy Visser (1205626) on Wednesday September 16, 2009 @11:15PM (#29449891) Homepage

    Don't use rsync to make backups. Because you don't just want to backup against spontaneous combustion â" inevitably, there will be accidental deletions and the like occurring in your studio. If you use rsync (with --delete, as any sane person would, otherwise your backup server will fill up in days, not years), then when some n00b runs `rm -rf ~/ReallyImportantVideos`, they'll be deleted from the backup too.

    Remember that pro photography website that went down, because their "backup" was a mirroring RAID setup? Yep â" they lost all their data on one fell swoop when somebody accidentally deleted the whole lot. Don't make the same mistake.

    Use an incremental backup tool. Three that come to mind are rdiff-backup [nongnu.org], Dirvish [dirvish.org], and BackupPC [sourceforge.net].

    I would think that rdiff-backup would suit your needs best. I currently use BackupPC at home, which is great for home backups, but I think that it's overkill (and possibly a bit limited) for what you want.

    Hope this helps!

    • Oh dear...when will Slashdot learn to escape stuff with UTF-8? On PHP, it's easy -- htmlentities($unsafe, ENT_COMPAT, 'utf-8') will do the trick. Not sure what Perl needs.

    • by pla (258480) on Wednesday September 16, 2009 @11:26PM (#29449991) Journal
      Don't use rsync to make backups. Because you don't just want to backup against spontaneous combustion - inevitably, there will be accidental deletions and the like occurring in your studio.

      rsync actually includes an option to make hardlinked snapshots as part of the syncing process, nowadays.

      Personally, I don't trust it and always do that part manually, then let rsync do what it does best... But yeah, even "vanilla" rsync contains exactly the functionality you mention.
      • Re: (Score:3, Informative)

        by adolf (21054)

        *nod*, at least for various definitions of "manually."

        I have a script which makes a hard-linked clone of the latest backup, and then rsyncs to that (with some manner of special commandline switch which is made for this scenario and that I can't be bothered to look up right now). It's easy, and it lets me have layered backups not totally unlike (though nowhere near as slick as) Netapp's snapshots.

        I have done bare-metal restores of Linux boxen from backups made like this. Works just fine, with an iota of bo

    • by MrNemesis (587188) on Thursday September 17, 2009 @07:20AM (#29452031) Homepage Journal

      rsync makes it pretty easy to implement a bargain-basement backup system if you're willing to do a bit of hacking around with scripts and soft/hard links. Make your backups into e.g. /backups/2009/09/17/* and update the symlink for /backups/latest to point to that dir; when the next backup comes along, use the --link-dest=/backups/2009/09/17/ to hardlink all files that have stayed the same, but copy over the newer versions into your /backups/latest. This way you get a) the absolute minimum space taken up without resorting to snapshots and b) and easy way of looking at and restoring individual files or the whole tree from a given date/time. For bonus points set up a vacuum script that automagically deletes the oldest backups whenever your backup partition gets to 90% full or whatever. Run your set of scripts every hour or so (but don't forget to include lock files/semaphores so you don't end up running nine instances of the script simultaneously).

      As far as syncing large amounts of data, firstly use rsync 3 if you can - it's a hojillion times faster with large numbers of files and much easier on your memory. If you're going over the internet, tunnel through SSH using inline compression (if your data is easily compressible that is) - heck, tunnel through SSH on your private network, rsync makes it ridiculously easy. Using this technique I managed to keep a mirror of a 2TB file server over a 2Mbps SDSL link no more than an hour or two out of date.

      That's how I remember it working anyway - don't have a box I can try it out on here, but in all honesty rsync and a bit of bash/python/whatever is capable of reproducing all sorts of "enterprisey" backup features for zero cost and almost zero effort (and, I'll almost certainly say, zero approval from your boss). IMHO it's one of the killer apps of UNIX.

      Disclaimer: I am not an employee of Rsync Overlord Corp, just a satisfied customer ;)

  • If you're considering doing incremental or archival backups I would look into using dar. It's sort of like tar on steriods, and is great little utility. It's also nothing like bleeding edge, runs on both Linux / BSD platforms and has a windows port (that I've neever used). Combining dar w/ ssh and some simple shell scripts might be the sort of solution you're looking for.

  • by TD_3G (595883)
    While our storage needs are nowhere near that size, I can attest to the greatness of Bacula. The hardware part is probably up to you, but as far as software, I cannot preach this software enough. 1) It's completely cross platform in terms of systems you can pull data from. The Director and Storage Daemon run flawlessly on every distro of Linux I've tried it on (Slackware, Debian, and Fedora)... and the restores are easy as pie with some of the available interfaces. Configuration is a pain and can take
  • iSCSI rocks... and these things have everything built in. Seriously cool units. Costly though - but you know where that money goes when you use it - or should I say, spend 10 minutes setting it up and then job done.
  • ... BitTorrent pirates. You'll always find last night's shows backed-up on TPB the next morning. Yaaarrr!

  • But rsnapshot works even better. When I worked for the RI Sec State's office we found tape backup wasn't cutting it for us. We picked up a cheapie HP server loaded it up with storage and bought a bunch of terabyte capacity external drives for off sites.

    You don't know what a relief it was to be able to go to a web interface and restore files from there. Worked great with linux boxes, but you had to jump through a few hoops to deal with the Windows servers we had.
    • by skogs (628589)

      mod parent up. rsnapshot is painless and elegant.

  • by failedlogic (627314) on Wednesday September 16, 2009 @11:43PM (#29450135)

    Have each student create their "own TV station" as part of their degree requirement - no matter the area of study. Similar to research essays, you'll get the following results: 1) students who completed the assignment with no outside assistance 2) students that copied certain small portions of the data you are backing up and presenting it as their own 3) students that plagiarize everything - yes some students will debate that the same content the TV station has accumulated over the years - all 12 TB - is actually their original work.

    As this data appears on the University network, the entire TV station will be backed-up in a local "Cloud". And if these types of assignment become popular at other universities, you can expect to find redundant off-site backups. By this point, the 12 TB will appear on BitTorrent (and probably on Newsgroups and IRC for the dedicated plagiarists). A full restore will only take a few days - as long as the full 12 TB is seeded.

    • Re: (Score:3, Funny)

      by Shinobi (19308)

      "3) students that plagiarize everything - yes some students will debate that the same content the TV station has accumulated over the years - all 12 TB - is actually their original work."

      And then you can also flag future FSF cultists. Win-win. ;)

  • I went down the current list of comments, and for all the people who write their own rsync tools, please go review 'rsnapshot'. It's quite efficient: it's major flaw is that it lists snapshots as 'hostname.1', 'hostname.2', etc., instead of 'hostname.YYYYMMDD', which would ease things for users grabbing their own old files from online.

    • Re: (Score:3, Insightful)

      by Janek Kozicki (722688)

      rsnapshot + mdadm raid6. Agreed 100%. That's what I'm currently using. Works like a charm for over 2 years now (and single HDD failure in meantime).

  • Here's what I do (Score:3, Interesting)

    by MichaelCrawford (610140) on Thursday September 17, 2009 @12:36AM (#29450443) Homepage Journal
    First let me point out that there are natural disasters that could potentially take out your backup, if it's on the same campus as your TV station - think of Hurricane Katrina. And for sure you want your Final Cut projects to survive a direct nuclear hit.

    Anyway, I have a Fedora box with a RAID 5 made of four 1 TB disks. There is a partition on the RAID called /backup0. That's not really a backup, but more meant as a convenience. I back up all my data to /backup0, then right away use rsync to copy the new data to an external drive that is either /backup1 or /backup2.

    I have a safe deposit box at my bank. Every week or two I swap the external drive on my desk with the external drive in the safe deposit box.

    So the reason I have that /backup0 filesystem is so that I don't have to sync the two external drives to each other - otherwise I would have to make twice as many trips to the bank, and there would be some exposure were my house to burn down while I had both external drives at home.

    My suggestion for you is to find two other University facilities that are both far away, and offer to trade offsite backup services with them.

    You would have two backup servers in your TV station - one for each of your partners - and they would also each have two, one each for you, as well as for each other.

    That way only a hit by a large asteroid would lose all your data.

    I got religion about backing up thoroughly after losing my third hard drive in twenty years as a software engineer. Fortunately I was able to recover most of that last one, but one of the other failures was a total loss, with very little of its data being backed up.

    • Quite commonly backups are done by copying an entire filesystem, and then doing incremental backups of just the files that have changed.

      I'm very concerned about just being able to find the particular file that I need, so I have my backups organized by topic - on each of my backup filesystems, there is a directory for my financial data, for my source code, for each of my websites and so on.

      In each directory I put a bzip2ed tarball named for the date - for example "OggFrog_SVN_2009-09-16.tar.bz2". Most o

  • simple cheap and easy

  • If online backup is an option, why not try http://www.wuala.com/ [wuala.com] ?

  • Not being pedantic but it aint a lot to backup. Just get a pair of MSA2000 with 1TB SATA disks. Total cost £20,000 inc tax. MSA2000fc if you can do fibre. Then just get a LT04 tape robot, HP, Overland or similar and do a disk to disk to tape backup setup. So you not only have tape backups for if the entire place burns down, and you also have disk based backup for a quick restore when someone accidently deletes a file. Also with DDT the throughput will be high enough to quickly complete the backups in

  • These are all built on top of rsync and turn it into a real backup tool by storing multiple versions of your files. The challenge will be the very large video files, but if you only write to these once, they are a good option.

    Rsnapshot uses hard links combined with rsync --delete - rather than actually delete an old copy of a file, it unlinks it, and when there are no changes in a file, it simply creates a link to it under the current snapshot. It's not as space efficient as DAR but your big files are pro

  • This is something that I wrote and use myself and for my customers. It is easy to set up and use.

    The backups on the archive server appear as complete copies of directories of the backed up machines. There will appear to be one complete backup for each day - this lets you find/restore a consistent set of files from a particular day.

    The script cleverly avoids copying files that have not changed. It economises on disk use by only keeping one copy of each file - but makes that one copy appear in the various

  • by Mysticalfruit (533341) on Thursday September 17, 2009 @10:28AM (#29453269) Journal
    Even though I'm writing this from a linux box, if you're going to be storing that much data and you want to do it cheaply, you should really look at ZFS as the filesystem of choice for the backend.

    As for moving the data over there, sure use rsync and then use zfs's snapshot features so you have some rollback capability.

    Why ZFS? So I'm envisioning that you're going to need a mid range machine (duel power supplies) and hanging off that you're going to have a whole pile of JBOD. You could spend the money on something that does hardware based raid, but if you're cost concious, your best route is to buy a JBOD box and fill it with 1.5TB disks. You could try to manage all of this with LVM and possibly XFS, but it would be nightmare. ZFS basically rolls RAID/LVM/FS into a single layer. Thus adding disks to your array becomes trivial. Also, I would recomment that each user/application get it's own sub filesystem on the array, that way you'll have much finer granularity for snapshots/quotas/etc.

    I didn't intend this post to be an advertisement for ZFS but I have such a setup with ~14TB of disk on it right now and it works great. As for the OS on top, you could go with opensolaris, or netezza (which is just debian rolled ontop of the opensolaris kernel.

If A = B and B = C, then A = C, except where void or prohibited by law. -- Roy Santoro

Working...