UNIX Process Cryogenics? 555
shawarma asks: "Due to a recent
power outage, I've had to shut down a server running a process that had
been running for ages calculating something. The job it was doing would
have been done in a few days, I think, but I had to shut it down before the
UPS ran out of juice. This got me thinking: Why can't I freeze down the
process and thaw it back up at a later time? It ought to be possible to take
all the connected memory pages and save them in some way, preserve file
handles and pointers, and everything. Maybe net-connections would die,
but that's understandable. Has any work been done in this field? If not,
shouldn't there be? I'd like to contribute in some way, but I think it's a bit
over my head.." Laptops have been doing this in some form for years:
most laptops, when they run out of power, or when told by the user will
go into "suspend" mode which is similar to what the poster is describing,
however outside of laptops, I haven't seen this done. Sleeping processes
also do something similar, sending their memory pages into swap so other
running processes can use the memory. What, if anything, is preventing
someone from taking this a step further?
the mode you are speaking of (Score:2, Informative)
Re:the mode you are speaking of (Score:2, Informative)
Saving application state (Score:2, Insightful)
problematic (Score:2)
External dependancies (Score:3, Insightful)
We do it in Condor (Score:5, Informative)
Free-as-in-beer, on most major UNIX platforms. Check out our publications, we have several that give all the details you'd need to write it yourself.
Plenty of others, too - libckpt, there was a "Checkpointing Threaded Programs" paper at USENIX this past summer... there are some kernel patches that can do, most of them under the GPL.
Re:We do it in Condor (Score:5, Informative)
As the poster said, there are plenty of others:
I'm sure I left serveral out. Checkpoint-restart has been part of the high-performance computing scene for years. Having been a systdmin on large, high-performance, computing platforms for the last few years of my professional life, my experiences with checkpoint-restart have been a mixed bag. All of the existing systems have limitations. Depending on the application, those limitations can be no problem, or they can be deal-breakers.
OS X needs this especially (Score:5, Interesting)
Re:OS X needs this especially (Score:2)
Re:OS X needs this especially (Score:5, Interesting)
I don't know how directly comparable this example might be, but I used to use VMware (under Linux) to suspend Win98 when I didn't need it. If I needed to do something under Win98 (like browse the web), VMware would load up Win98 where I last left it. It saved the minute or so of waiting for the VM to POST and load Win98.
(If VMware provided better support for DirectX, I might not have needed to switch my home workstation from Linux to Win2K. It's been more than a year since I checked, though, so things might've improved.)
Re:OS X needs this especially (Score:2)
Re:OS X needs this especially (Score:2)
- j
Re:OS X needs this especially (Score:2)
Re:OS X needs this especially (Score:2, Interesting)
Which is funny, because VMware has exactly this capability.
It needs some refinement, and sometimes it's slow when it picks back up again, but it generally works in my experience. It is obviously not only possible, but implementable using current technology
BeOS? (Score:2)
Search on "Checkpointing" (Score:3, Redundant)
There have been a number of projects that do this under Unix over the years. Many of them do it for the purpose of process migration. Others do it just for recovery.
One such project that I used in the early 90s was Condor.
The typical approach is to do something along the lines of forcing a core dump and then doing some magic to restart the process from the core file.
Hmm, VMWare can do this in a different way. (Score:5, Interesting)
While this isn't quite what you are looking for, it spawn an idea of the level this can be taken to. Think of how neat it is for distributed applications. Of course, something like this has to exist somewhere. . .
Extended core dump? (Score:5, Interesting)
I suspect that all the pieces of a solution are written and it's just a tricky pick-choose-and-integrate problem.
And damn but I'd love to have this ability.
--G
Re:Extended core dump? (Score:4, Interesting)
When compiling Emacs from the sources, the initial executable file is only a (relatively) small virtual machine executing elisp bytecode.
Then, it is started, and several basic elisp packages are loaded and initialized.
Once initialized, it makes a dump of itself on a file on disk (IIRC actually dumping core by sending a fatal signal to itself).
The dump is prepended with an appropriate loader which restore the Emacs process (in its initialized status) in memory, and the resulting file is used as the main Emacs binary (what you can usually find in /usr/bin).
This works for Emacs because it knows when it is checkpointed, and special care is taken not to do anything that depends on parts of the running environment that can't be fully restored.
hhgttg (Score:3, Funny)
In that case, Arthur Dent should know the answer.
eros-os (Score:2, Interesting)
http://slashdot.org/article.pl?sid=99/10/28/01512
about an operating system with "journaled" processes of a sort, that would automatically back up images of it's processes.
you can (Score:5, Informative)
Re:you can (Score:5, Informative)
http://falcon.sch.bme.hu/~seasons/linux/swsusp.ht
this is what you need.
Re:you can (Score:2, Insightful)
Re:you can (Score:3, Informative)
At least, last time I checked that's how it was. There may have been improvements made. It would require somewhat major changes to the VM and each filesystem in the current Linux implementation to get it working with journalled systems, or if Linux finally gets a journal-capable VM (similar to IRIX's, perhaps), it would just require some VM changes if it's done right.
(Begin semi-OT stuff)
Oh, and please, please everyone ask Linus not to rip out memory zones just because it's a BSD-like idea.
Kernel 2.6 will probably be able to support hibernation without funkiness in the filesystems themselves, just a good VM setup. The new framebuffer system (Ruby) will rock, too (think 'echo "640x480-16@60" >
process migration is the term you want (Score:2, Interesting)
Obviously those techniques would apply to what you are asking about.
google has lots of links about it [google.com]
it's encrypted in your brain waves! (Score:5, Funny)
Volgons? (Score:3, Offtopic)
Re:Volgons? (Score:2)
Re:it's encrypted in your brain waves! (Score:2)
That must have annoyed the Vogons, who were coming to do the same thing. Not to mention the mice!
I took a quick look... (Score:2, Funny)
I think the same solution would apply here: Find Arthur Dent.
No need, my good man (Score:2, Offtopic)
You sure of that? (Score:4, Funny)
Resurrecting core files (Score:2)
Suspend (Score:4, Informative)
To make suspend work, you'd have to dump your entire memory image to disk. Then you swap in the entire image, kernel and user pages alike.
Re:Suspend (Score:2)
This CAN be trivially done on any un*x i know... (Score:2, Redundant)
2) Use the core and process image to restart it
(for example in the debugger such as gdb, if you
don't want to write specialized software).
To the best of my knowledge perl "compiler" uses
precisely this technique to produce perl "executables" - dumps them out as a core right
after compilation and reuses it later on.
You can do this to a kernel as well, if you
REALLY want to.
However, since indeed many things may be dependant
on state of kernel, files, network connections, devices etc. etc. doing this is not adviseable.
Good coding practice for long-running processes is
to actually spend some time on writing the state
saving functionality to support process restart.
Anyway, (call it a flame if ya will) but the fact
that
disquieting - level of technical knowledge here
gets reduced day after day.
Re:This CAN be trivially done on any un*x i know.. (Score:2)
Re:This CAN be trivially done on any un*x i know.. (Score:3, Informative)
Solaris Suspend & Resume (Score:3, Informative)
The per-process sounds neat, but usable only if you've got a simple critical task you're running. For a more complicated application, multiple processes may be working together, and you'd have to suspend all of them at the same time.
One big question I would have would be file handles... if you restore a process that thinks it owns file handle #5 and some other process is already using it, it would be awkward to get either process to use a different handle.
File Descriptors are per-process (Score:3, Informative)
More important is how do you tell the kernel what file descriptor 5 pointed to? What if the file/pipe doesn't exist any more?
Future of Process Management (Score:3, Interesting)
What's said here is certainly very reasonable. But the extensions of whats being suggested are even more fantastic. Once a process is completely removed from memory, with file handles and storage and status all kept away safely, is there any reason that the process is really tied to that computer? Why wouldn't it be possible to take that 'frozen' process, transfer it to another machine with access to the same filesystem on some level (some translation of file handles would likely be neccesary), and thaw it there, allowing someone to move a running process to another machine? Need to replace your web server's only CPU, but don't want downtime? Move the process to a backup machine, replace the original's hardware, and move the process back.
I even thought I had heard that someone was working on just such a project, or at least thinking about the details of implementing it. (I'm just getting started in learning UNIX internals myself). Anybody have more references to information on this sort of thing?
different approach: Savepoints (Score:2, Interesting)
Of course this solution is not as general as the "process cryogenics" you describe, but it's also easier to implement because you have more information about the problem.
Re:different approach: Savepoints (Score:2)
No reason not to (Score:2)
There's no reason why you can't do it either in an app by saving state or in the OS by saving memory to disk as on a laptop.
GEOS had the concept of state-saving in the OS circa 1990, so it's nothing new. The UI saves its state, what apps are running, what windows are open, etc. and restores it exactly as you left it when you restart. If an app has extra data to save, such as where it was in a lengthy computation, it can save it, too.
A slightly different approach than brute-force writing out all of used memory, but both work quite well with the speed of current hard drives.
Checkpoint/restart (Score:3, Interesting)
A friend of mine (Hugh Redelmeier) ran a very long (~400 day) computation on a PDP-11 in the mid-1970s. The program ran stand-alone, and part of the test plan involved flipping the power switch on and off a few times -- very amusing to watch the program keep on running right through power failures. (Main memory on the machine in question was magnetic cores, which are non-volatile.)
Re:Checkpoint/restart (Score:2)
I was peripherally involved in some early efforts to include checkpoint/restart in POSIX with respect to standardizing fault tolerance and high availability features. I was a US DoD employee at the time. The military's interest was to be able (in a semi-portable standard way) to reset to a known good previous state in the case of some arbitrary failure mode in safety critical systems, i.e. flight controls, stores (weapons) management, etc. AFAIK, the POSIX standards efforts never went very far due to many different, sometimes conflicting needs. The more business-oriented high availability people had needs for very similar OS functionality that was markedly different in character from the military's viewpoint. My involvement ended in the early to mid 90's, so my understanding of the situation may be more than a little stale.
VMWare (Score:2, Informative)
Creed
VMWare isn't a solution to a cpu bound process (Score:2)
Granted, it is a solution, but your job that ran in 3 days just got pushed out to a week. It's just a tradeoff.
What the poster really needs is to rewrite the program to drop intermediate data along the way. If you have hourly checkpoints you can minimize the amount of data lost. How to implement checkpoints is left as an exercise to the reader
Build in persistence yourself. (Score:5, Insightful)
--Blair
P.S. Alternatively, you could write a program to have the rebooted computer pull scrabble tiles from a bag structure and print them to the screen. You might at least get some clue as to whether it was asking the right question.
Re:Build in persistence yourself. (Score:3, Insightful)
1. Multiply ones digits
2. Multiply tens digit by ones digit
3. Multiply previous result by ten
4. Add results from steps 1 & 3
5. Display previous result.
If my program crashes at any point before step 5, I have to start all over. So, I save my intermediate results at step 1, step 2, step 3, and save my final result at step 4. This is checkpointing my intermediate steps.
Your suggestion, on the other hand, is to periodically save the entire system state. This is checkpointing the processes.
I see a need for both types of checkpointing - applications periodically checkpointing data (like the autosave feature in the market-leading word processor) and system-state saves (like the sleep feature of some laptops). Reliability and recoverability should be engineered in at all layers.
User Control (Score:2, Interesting)
This next one would complicate things a bit: the user should also be able to wake up the process the same way, i.e. kill -WAK $PID. This means that an index of hibernated processes also needs to be kept synchronized between the kernel process tables and a file on disk, to be preserved between reboots.
Maybe I'll write another kernel patch...
Been there, done that (Score:2, Informative)
For long-running processes, rather than shut down the process when the UPS kicks in, I've always found it easier to have the program snapshot its data tables periodically (say every half-hour) and build a "resume from disk" feature into the program. This lets you restart the program from its last check-point even in the event of uncontrolled program termination (e.g. kill -9 and the like).
-JS
The hardware will be a big issue.... (Score:2)
3Com PCMCIA cards are about the only ones I've used that allow the laptop to power them down and back up again, and resume network activity without a complete machine reboot.
Think of VMware as a process wrapper (Score:2, Insightful)
bb
Hibernation comments are missing the point (Score:5, Insightful)
I think the feature to be discussed is Operating System (not BIOS) level support of the hibernation of a single process. It'd be nice if I could do a:
kill -HIBERNATE `cat
and have that program get frozen to disk. Then if I could resurrect just that process later it'd be a handy feature for the long running program that you want to postpone until after you've done whatever you needed to do in single user mode.
Re:Hibernation comments are missing the point (Score:5, Insightful)
#!perl
use strict;
my $pid = $$;
print $pid
If you stop it between those two $pid commands, there's no guarantee that you're going to get the same pid value back. Programs would have to be specifically programmed to handle this sort of thing (there are other examples, this is just the most basic; network programs particularly would have problems).
Re:Hibernation comments are missing the point (Score:3, Insightful)
That will not be easy (Score:2, Interesting)
If you are using a scanner, or a mouse, or whatever, that device may not be there or may not be available when the process is brought back. Furthermore you may have a file descriptor opened on a local (or network shared) file which no longer exists or has changed drastically.
There are further non-device-dependent problems with shared memory, opened-but-unlinked files, parent PID, IPC resources.
Having said all of the above... I suppose that for the very rare case that your program is completely memory and CPU dependent you could retire and recover a task.
my $0.02
Apple Tried this with OS 9 (Score:3, Interesting)
The idea was that when you put your computer to sleep, instead of keeping the SDRAM (or whatever the laptop had) powered to preserve the memory contents, it would write it all to a special sector on the hard drive that the firmware knew to read from when starting from sleep. This allowed sleep to be even more low-power than it already is, since a hard drive does not require power to retain data.
EPCKPT (Score:5, Informative)
--
This would be useful for more than just blackouts (Score:2)
Yeah, you could "nice" down the process so that it doesn't slow things down while your logged in... but then system processes at higher priorities might slow down your number crunching when you're not logged in... It'd be best to be able to run it at high priority at night only.... ya know, use those unused cycles.
Application level solution. (Score:2, Interesting)
Just because the OS doesn't support it automagically it doesn't mean that you can't solve it for yourself with a little bit of extra work and planning.
Software suspend (Score:2, Informative)
App Specific "Resume" (Score:2)
Long ago and far away (about 15 years ago) I recall that TeX was frequently built in a fashion that required running the binary on some "initialization" information. That process took some nontrivial amount of time back in those days (I'm sure now it would be an eyeblink), and the program could be made to \dump its state in some way.
Then, when you ran TeX in everyday circumstances, the digested initialization file was read in by the application as part of the usual startup process.
I'm probably botching the explanation of how this really worked, but I guess my point is that the "resume" function had to be coded into the specific application.
Windows 2000 and Hibernation (Score:5, Informative)
Once you've enabled it, you create a hibernation file on the C: drive. Hibernation should only take place when there is minimal disk activity (eg, don't hibernate while trying to save your Word document). The system saves the contents on RAM to the hard drive, and then shuts down. When the machine boots, a flag was set (I assume) indicating the system should resume from hibernation... so the hibernation file is read from disk and written to RAM and you're back up and running, in less time than it takes to boot. Plus it keeps your uptime from resetting back to zero.
Some things to note:
You will need WHQL certified drivers, or at least properly-written drivers. I have a SB Audigy and the first drivers I used (the ones on the included CD) caused a blue screen on resume from hibernation. When a updated driver was released, it fixed this issue.
Applications need to be properly-written as well, as there is some sort of Win32 suspend signal that is sent to apps just before the system hibernates, so the app must support this and the resume command when the system is restored.
Hibernation works great on my laptop and on my workstation, and I especially like the fact that I don't need to create a separate partition or install special drivers to make it work (you can even use it on an NTFS formatted drive).
Re:Windows 2000 and Hibernation (Score:2, Funny)
Who would of thought it was possible.
Rule 1 with hibernation, no creative products.
Re:Windows 2000 and Hibernation (Score:3, Interesting)
APM BIOSes can also do this, but they aren't as standard: Often the implementation details are specific to the hardware. For instance, Phoenix BIOSes (at least as of two years ago, I haven't messed with this stuff much since then) tend to want to put the STD (suspend-to-disk) data in a special file in a Windows partition, while some others (Dell for sure, since I used to work this stuff for them) save this info in a special STD partition (type 84, IIRC) which is a more generic solution, but requires more knowledge when setting up the box. (When was the last time you thought you might need an STD partition when building your box? BTW, they should be at a minimum, PhysicalMemorySize + 1 MB for state info, video register settings, etc.)
Re:Windows 2000 and Hibernation (Score:3, Interesting)
Agreed, and as you go on to explain, and I believe I alluded to in my post, there are many proprietary implementations via the BIOS or DOS drivers, etc.
My point was that Windows 2000 separates the hibernation feature from the BIOS. As far as the BIOS can tell, the system is booting normally... but once the BIOS loads the NTLDR, Windows takes over of course and handles the hibernation. This is why it works so well and does not have all of the "stupid issues" such as custom drivers, partitions, or the like. The end result is not a MS-only function, but the implementation is, as far as I can tell.
Re:Windows 2000 and Hibernation (Score:3, Interesting)
And simply having a WHQL-certified drivers doesn't necessarily mean it'll work. I had a Future Domain SCSI controller in my computer that loaded with the default Win2k WHQL driver, but I could never hibernate it. When I swapped it out with an Adaptec 2940UW, I was able to enable Hibernation in my Control Panel settings.
Process-saving is known, but not what you want (Score:4, Informative)
But isn't it overkill for a data-crunching operation? As many other people have noted, it would seem you're much better off checkpointing your data to disk, rather than relying on low-level OS process wizardry.
Sig: What Happened To The Censorware Project (censorware.org) [sethf.com]
Already available for Linux (Score:2, Interesting)
Bad coding? (Score:2)
James
Cryogenic freeze / Hibernation (Score:2, Interesting)
This would also be good for tracking down bugs using the "before and after" technique.
Such a program could be tied into the UPS monitor in such a way as to save everything that couldn't be stopped.
CDC Cyber 205 (Score:5, Interesting)
As usual, this is ancient. Back at FSU, we had a CDC Cyber 205, a vector pipeline supercomputer, back in 1985. Any process could be crashed for a shutdown, and it produced a file that worked exactly like an executable and resumed computation from the time it was crashed.
Re:Yeah, CDC's NOS/BE could do this 25 years ago (Score:3, Insightful)
Because we're hopeless caught up in trying to reinvent a somewhat limited computing paradigm (unix). No one, except for some CompSci projects that never really go anywhere, have any real interest in making a new operating system that builds on the lessons of all the previous operating systems and includes reasonable features like process checkpointing/suspension.
I'd bet there are patent considertions as well -- maybe many of the good OS features are not reproducable due to existing patents.
How hard could this be to experiment with? (Score:5, Interesting)
I was thinking about this and here was my dirty hacky idea. You need kexec, lobos, or something similar (actually a fairly modified version of it) you'll need on the order of 8MB of disk space and some kernel mods, which might not be that extensive.
I was thinking we develop some driver or process that consumes all of the memory and CPU in a system. It forces all of the processes to swap out, it would probably need to be a driver of sorts on current linux systems. Then it could dump the kcore out to a file somewhere, sync it, and hibernate. Then when the kernel boots up, if the right arg is passed in it could either load this image back in to ram in place of the kernel and then jump into it (easier said than done) early in the boot (page tables are made long before you have access to the drives and such so the logistics of this would need to be figured out) or it could boot up and use a different swapper partition and then have some kind of tool like kexec to load that image back in to ram and start it up. Or something, some how you should be able to recover the state of the system. File handles and everything would be there.
The harder part would be hardware and network transparency. You'd need to modify all of your drivers to make sure that the hardware could be reset and they could deal with it. I think it's a little easier for the network side because it would be similar to simply unplugging the network cable, you have open sockets that are talking to nothing and some software can deal with that pretty well. There is also some kind of system integrity or robustness piece that is needed, if the system some how changes when you bring your old image back it could break things, munge files, etc..
doesnt SETI@home do this, sorta? (Score:3, Informative)
the seti@home client uses its *.sah files to save the state of a calculation. of course, this is program dependent, not OS dependent. I guess if you have the source files for the program doing the counting.....
STANDALONE CONDOR CHECKPOINTING (Score:5, Informative)
Using the Condor checkpoint library without the remote system call functionality and outside of the Condor system is known as
"standalone" mode checkpointing.
To link in standalone mode, follow the instructions for linking Condor executables, but replace condor_syscall_lib.a with libckpt.a. If you
have installed Condor version 5.62 or above, you can easily link your program for standalone checkpointing using the condor_compile
utility with the little-known "-condor_standalone" option. For example:
condor_compile -condor_standalone [options/files....]
where is any of cc, f77, gcc, g++, ld, etc. Just enter "condor_compile" by itself to see a usage summary, and/or refer to
the condor_compile man page for additional information.
Once your program is relinked with the Condor standalone-checkpointing library (libckpt.a), your program will sport two new command
line arguments: "_condor_ckpt " and "_condor_restart ".
If the command line looks like:
exec_name -_condor_ckpt
then we set up to checkpoint to the given file name.
If the command line looks like:
exec_name -_condor_restart
then we effect a restart from the given file name.
Any Condor command line options are removed from the head of the command line before main() is called. If we aren't given
instructions on the command line, by default we assume we are an original invocation, and that we should write any checkpoints to the
name by which we were invoked with a "ckpt" extension.
To cause a program to checkpoint and exit, send it a SIGTSTP signal. For example, in C you would add the following line to your code:
kill( getpid(), SIGTSTP );
Note that most Unix shells are configured to send a TSTP signal to the foreground process when the user enters a Ctrl-Z. To cause a
program to write a periodic checkpoint (i.e., checkpoint and continue running), sent it a SIGUSR2:
kill( getpid(), SIGUSR2 );
In addition to the command-line parameters interface described above, a C interface is also provided for restarting a program from a
checkpoint file. The prototypes are:
void init_image_with_file_name( char *ckpt_name );
void init_image_with_file_descriptor( int fd );
void restart( );
The init_image_with_file_name() and init_image_with_file_descriptor() functions are used to specify the location of the checkpoint file.
Only one of the two must be used. The restart() function causes the process image from the specified file to be read and restored.
Search in the slashdot archives for kernel patches (Score:5, Informative)
Just found it here [kernel.org], it's the 'swsusp' patch.
Java has lightweight persistence... (Score:2, Interesting)
Doesn't matter... (Score:2, Funny)
Solid-state memory (Score:2)
The disadvantages are speed (solid-state memory is getting faster all the time, but it is still slower than volatile RAM), cost, and lack of current standardized implementations (I'm not even sure there are any working implementations.)
For some background research in solid-state memory, check out this site [nta.org] (it's a bit old, but still interesting.
It is possible...but it could be messy... (Score:3, Interesting)
I think the better solution is to write a new signal called "SIGFREEZE" and have programs just write code that could handle such an event. Let the program figure out how to save their own stuff.
A good example would be a program that was calculating pi. The programmer would have to implient a signal handler that would when it recieved a SIGFREEZE would stop its computating and write what its currently working on out to file. The other thing the programmer should be doing is periodically writing their data out to a file anyway. Then the programmer should have implement a command line option that would facilitate reloading from a saved state.
Thats my take on it...
If you see any problems with it... bring it on.
Checkpointing? (Score:2)
If memory serves me (hey, it is Friday after all and both brain cells are pretty tired) we looked into something like what the poster was asking about years ago. In those days, we were running some simulations on a PDP-11/70 that took 7-10 days to complete. In the event of a general power failure we wouldn't have been able to run on backup power for very long. DEC's RSX had a feature whereby a task could be checkpointed to disk. Then, presumably, it could be reloaded and resumed at the same state it was in at the time of the checkpoint. We never did implement it since it would have introduced too much delay into the project schedule (adding it to the simulation, testing, etc.) but it sounds like the sort of thing that could be useful in current day OSs. Anyone know of any general purpose operating systems today that have this feature? I haven't heard of any and wonder (not too seriously, mind you) if anyone sells core memory for a PC architecture computer. Of course, it wouldn't be very fast but you'd worry a lot less about power failures that are longer than the UPS's ability to provide power.
Sun Already Does This (Score:4, Interesting)
10 years ago I worked on a Unisys Unix box that did it automatically, meaning you could pull the power out of the wall without any warning and then plug it back in later. When the system rebooted, it would say "there's been a power failure, recovering" and then put all the processes back to the way their before. Even with an open vi session where I was actively typing, I wouldn't lose more than a character or two.
I found out the machine had it quite by accident because my loser boss turned the box off one evening without doing a proper shutdown... Once I saw what it did, this required further testing
Still, what would be even better is if it could be done on a per process basis. I can think of many reason why you might want to suspend a process for a few days and bring it back later (say something you only wanted to run outside of work hours), but had no intention of shutting the whole box down. And this should be implemented in the kernel, not hacking each program to provide this functionality.
A case for Python (Score:3, Informative)
Python [python.org] supports a concept that it calls 'pickling' (which is also known as Object Serialization).
It's extremely easy to save the state of any object along with the objects it references to disk with literally a couple of lines of code (like, 3). You cannot pickle whole processes, but it's effortless to write some skeleton code to resume the process from its last pickle. You can also define specific methods in each object that are called on pickle/unpickle for special cases (restoring network connections, for example).
The fact that it's an interpreted language shouldn't deter you. Python integrates easily with modules compiled from C, allowing you to accelerate time critical aspects of your code while rapidly developing the not so critical aspects.** Python was designed to solve the problems you're working on.
Oh, and if you're short on time, don't worry; Python is extremely easy to learn.
** As most programmers have found, about 90% of their program's execution is spent in 5% of their code.
Re:Use Windows XP (Score:2)
Re:Use Windows XP (Score:2)
My laptop has no problems suspending/hibernating linux.
The question here is about process hibernation, not the whole box.
Darwin/MacOS X (Score:4, Informative)
Re:Use Windows XP (Score:2, Funny)
Ewan
Re:Use Windows XP (Score:4, Redundant)
Re:Really worth the effort? (Score:4, Insightful)
This sounds like common sense to me. You never know when the disk is going to poop, the power shut off, the network reset.
At my old job, we were required to record the status of all jobs that took longer than an hour (on a 6 cpu SGI). They never crashed on their own, but I would usually interrupt them if the requirements changed or whatever. If they ever did crash, then there was a record of exactly where they left off.
Re:Really worth the effort? (Score:3, Informative)
Re:Really worth the effort? (Score:2)
Or is that too sensible?
(and if it's a proprietary package and it can't pick up from where it left off, find a different one).
Re:Really worth the effort? (Score:3, Insightful)
Re:Really worth the effort? (Score:3)
Re:Really worth the effort? (Score:2)
This could be done without doing anything to your BIOS; youc could just dump all the memory allocated to a certain program to disk and put that process in a list of hibernating processes. What's so hard about that?
But it gains you nothing (Score:2)
It only is worth it if you expect to have to halt the program more than once. Assuming only one halt and restart, VMware is still slower.
Re-crashing problem (Score:2)