How To Use a Terabyte of RAM 424
Spuddly writes with links to Daniel Philips and his work on the Ramback patch, and an analysis of it by Jonathan Corbet up on LWN. The experimental new design for Linux's virtual memory system would turn a large amount of system RAM into a fast RAM disk with automatic sync to magnetic media. We haven't yet reached a point where systems, even high-end boxes, come with a terabyte of installed memory, but perhaps it's not too soon to start thinking about how to handle that much memory.
Memory usage (Score:5, Interesting)
Re:Memory usage (Score:5, Interesting)
Re:You only need 16GB of RAM for this to be useful (Score:3, Interesting)
In all honesty, though, I don't really get the point of this. Isn't the buffer cache already supposed to be doing kind of the same thing, only with a less strict mapping?
What about copy-on-write for executables? (Score:4, Interesting)
Re:You only need 16GB of RAM for this to be useful (Score:3, Interesting)
Not so far off (Score:4, Interesting)
By Moore's Law, we should hit 1TB in a high-end server 6 years, high-end desktops (assume 8GB of RAM, currently selling for $180 CAD) in 10.5 years, and the average midrange desktop (assume 2GB of RAM, currently selling for $45 CAD) in 13.5 years.
We might be a while off in consumer applications, but for high-end servers, 6 years doesn't seem very far away.
Video Streaming Server (Score:3, Interesting)
http://www.motorola.com/content.jsp?globalObjectId=7727-10991-10997 [motorola.com]
Sounds like a good use for a terabyte of RAM to me.
Disclosure: I currently work for Motorola, but I don't speak for them, and don't have any involvement with this product beyond salivating over it when it was announced that we were buying BroadBus.
We'll be there soon enough. (Score:3, Interesting)
take it to the next step... (Score:5, Interesting)
Next step beyond that: stop using a filesystem at runtime. Just assume your data can all fit in memory (why not, if you have a terabyte of it?) This simplifies the code and prevents a lot of duplication (why copy from RAM to RAM, just to make the distinction that one part of RAM is a filesystem and another part is the working copy?) But you will need a simple way to serialize the data to disk in case of power-down, and a simple way to restore it. This does not need to be a multi-threaded, online operation: when the system is going down you can cease all operations and just concentrate on doing the archival.
This assumption changes software design pretty fundamentally. Relational databases for example have historically been all about leaving the data on the disk and yet still fetching query results efficiently, with as little RAM as necessary.
Next step beyond that: system RAM will become non-volatile, and the disk can go away. The serialization code is now used only for making backups across the network.
Now think about how that could obsolete the old Unix paradigm that everything is a file.
Re:You only need 16GB of RAM for this to be useful (Score:5, Interesting)
Re-inventing the disk cache wheel (Score:4, Interesting)
Re:Memory usage (Score:1, Interesting)
Re:1 TB of memory... (Score:4, Interesting)
In the 80s, the overhead of a lisp machine just to make your application customizable was absurd (hence the emacs jokes). Writing an editor all in C was a great idea. Speed! Memory savings! This approach made vi very popular.
Now that it's 2008 and every new computer has a few gigs of RAM, it's not so absurd to write an editor in a dynamic language running on top of a minimal core. An experienced elisp coder can add non-trival functionality to emacs in just a few hours. emacs makes that easy and enjoyable.
vi(m) may use less memory, but that just doesn't matter anymore. If you want to customize it (non-trivially), you have to hack vim and recompile. So while emacs jokes are hilarious, it dates you to the early 80s. There is no reason to write tiny apps in assembly anymore. Big apps that can be extended are a much better approach.
Re:1 TB of memory... (Score:5, Interesting)
Re:Cache as RAM, RAM as hard disk (Score:3, Interesting)
Then a power outage wouldn't be an issue. Power comes up, machine PXE boots off a machine in a neighboring town, state, country, whatever.
I know--not really feasible, but you'd be the king of basement dwellers if you could pull it off...
Floating point voxel octree Google Earth (Score:4, Interesting)
Re:1 TB of memory... (Score:2, Interesting)
I don't program much in Lisp, although I have some familiarity with it, but on Linux my editor and my window manager are both written in Lisp: emacs and stumpwm. They work quite well... stumpwm includes an entire lisp interpreter in its binary and comes in at just 33M; you can hit C-t : at any time to evaluate a Common Lisp expression, and of course the window manager can be modified on the fly if you're a leet haxxor.
Re:You only need 16GB of RAM for this to be useful (Score:3, Interesting)
Granted, it doesn't run Linux (or if it does, it's kept hidden from the user.) But with these awesome specifications, I have to wonder why they don't just sell general purpose computers -- people would port Linux to them, and they'd clean up! Is there something special about their processors that they're good at doing java or what?
Re:You only need 16GB of RAM for this to be useful (Score:5, Interesting)
Here's a question... if you actually had a system that had 1TB of RAM, wouldn't you like to see a lot of your hard drive contents being loaded into RAM in the background because you have the RAM to store it, and you know that it can be discarded at any time because its just cache memory and not committed memory? I mean, you've gone to all the trouble and cost of getting yourself that much RAM... do you ONLY want to ever make use of it all on the rare occasion you need to edit a 500megapixel picture in photoshop? Do you want your ram to sit idle the rest of the time, and have your hard drive grind away because
Speed vs tmpfs? (Score:5, Interesting)
How it seems to work:
Actual "ramdisk" -- that is, like /dev/rd -- that is, appears as a block device. You can run whatever filesystem you want on it, but it's still serializing and writing out to... well, RAM, in this case. No sane way for the kernel to free space on that "disk" that's not actually used.
How I wish it worked:
No Linux that I know of has used an actual ramdisk in forever. Instead, we use tmpfs -- a filesystem which actually grows or shrinks to our needs, up to an optional configurable maximum size. It'll use swap if available/needed. It's basically a RAM filesystem, instead of a RAM disk.
Even initrds are dead now -- we use initramfs. Basically, instead of the kernel booting and reading a ramdisk image directly to /dev/rd0, it instead boots and unpacks a cpio archive (like a tarball, but different/better/worse) into a tmpfs filesystem, and uses that.
So, how I would like this to work is, use a tmpfs filesystem -- as I suspect it will be faster, and in any case simpler, than a ramdisk -- and back it to a real filesystem on-disk. The only challenge here is that it's not as deterministic -- it would be more like a cp than a dd.
An even better (crazier) idea:
Use a filesystem like XFS or Reiser4 -- something which delays allocation until a flush. In either case, it would take a bit of tweaking -- you want to make sure no writes, or fsyncs, block while writing to disk, so long as the power is on -- but you'll hopefully already be caching an obscene amount anyway, so reads will be fast.
In this case, forcing everything out to disk could be as simple as "mount / -o remount,sync" -- or something similar -- forcing an immediate sync, and all future writes to be synchronous.
Conclusion:
Either of the two ideas I suggested should work, and could perform better than a traditional ramdisk. If it is, in fact, a simple disk-backed ramdisk (not ram filesystem), then it's both not as flexible (what if your app suddenly wants 50 gigs of RAM in application space?) and a bit of a hack -- probably a hack around traditional disk-backed filesystems not being able to take advantage of so much RAM by themselves.
In fact, glancing back at TFA, it seems there are some inherent reliability concerns, too:
Now, true, this should never happen, but in the event it does, the inherent problem here is that the ramdisk doesn't know anything about the filesystem, and so it doesn't know in what order it should be writing stuff to disk. Ext3 journaling makes NO sense for a ramdisk when the ramdisk itself knows nothing about the journal -- the journal is just going to slow down the RAM-based operation. Compare this to a sync call to XFS -- individual files might be corrupted, but all the writes will be journaled in some way, so at least the filesystem structure will be intact.
This gets even better with something like Reiser4's (vaporware) transaction API. If the application can define a transaction at the filesystem level, then this consistent-dump-to-disk will happen at the application level, too. Which means that while it would certainly suck to have a UPS fail, it wouldn't be much worse than the same happening to a non-ramdisk device, at least as far as consistency goes. (Some data will be lost, no way around that, but at least this way, some data will be fine.)
DSOrganize (Score:2, Interesting)
Re:Memory usage (Score:3, Interesting)
I can get a 64bit mobo, 64bit proc, and still ahve problem finding on that can take more then 8Gigs of ram.
I want to load up my games into a ram disk and play them from their. I've didi it in the bad ol'/good ol' days. I want to put a 2 hour movie entirely in RAM. I want 100+gigabytes of RAM, damn it. I've beens tuck at 4 Gigs for years. ENough already.
also, I want a pony.
Some of us do have access to 1TB or more of RAM (Score:5, Interesting)
All RAM is used as cache anyway. When an application allocates some RAM, it's in lieu of directly manipulating the permanent (disk) storage because it's horribly horribly slow. That's really an operating system failure. Network file systems, disk, RAM should all be completely transparent, the OS should abstract all that away and allow application programmers to handle it simply as storage.
Re:Some of us do have access to 1TB or more of RAM (Score:4, Interesting)
I have a gaming rig I custom built 5 or 6 years ago with some very sweet OCZ ram with 2-2-2-2 timings, but now when i was wish-listing a new gaming PC the best ram i could find was 3-4-4-15 timings that's ALMOST HALF THE SPEED that means that it's going to hit those 'unable to fetch ram for the CPU' TWICE AS OFTEN with horrendous results... And it's getting worse, DDR3 ram is all running at 5-5-5-15 timings stock, and mind you 4-4-4-15 is the normal variety of DDR2 'fast' ram, this was again OCZ over-clocked ram...
with multi-core processors this is only going to get worse, with a dual processor rig, to truly keep both processors from missing cache you realistically need 1-1-1-10 ram and they KEEP MAKING THINGS WORSE by bumping up the amount of 'burst' data the RAM can put out, instead of how FAST the ram can access and reload ram!!!
really with such pathetic timings realistically a dual core is going to be spending about 20% of it's cycles 'waiting on ram' if it needs randomly accessed memory, that can't be 'burst read' a lot of applications need random access, database, server farms, complex 3-d video game graphics... the reason why 512MB graphic cards cost so much is they really all need REALLY FAST random access memory that is way faster than 'stock' DDR3... and the reason why frame rates don't scale with more processor pipelines very well, is because those cards keep missing strokes because the system wasn't able to load the memory in time for the processor to work on it...
I can't think of a single mainstream computers need to 'burst' more GB/second Instead of improving latency, yet the crazy computer scientists keep making it worse by engineering for 'burst' mode rather than latency.
it almost makes one want to use normal DDR 1 ram, with the sweet 2-2-2-2 timings instead of ddr2...
Databases have a better answer (Score:2, Interesting)
I really don't like the idea that this is racing with the UPS. When the battery gets old it's ability to hold a charge drops and a timeout that was sufficient in the old days won't be now. I've also had situations where the battery was supposed to tide you over till the generator kicked in; but the system was never tested for the generator failing at exactly the wrong moment.
I'm pretty sure the answer to this is a simple generation number on the blocks so you can use a database checkpoint scheme.
1) Every write to the ramdisk brands that block with an ever increasing number (transaction number).
2) When you initiate a checkpoint the driver finds all blocks that have changed since the last checkpoint and writes them to a "physical log", followed by the checkpoint marker.
3) The same blocks are then written to the actual disk area; nb application writes to these blocks must be diverted.
4) The "physical log" is cleared.
5) Block diversions from (3) are cleared.
Using this well known scheme the disk is always either in a consistent state or easy to get there.
Note the "diversions" may mean that clean blocks must be discarded from the ramdisk/cache to prevent the applications being blocked by the checkpoint.
If you want the ability to have the system 'roll forward' after a crash you need a transaction log where the updates are written to the log as they happen, because this is linear it happens at the maximum transfer rate of the disk; but it's still limited in performance.
This also looks a lot like doing a backup from a volume snapshot.
Re:You only need 16GB of RAM for this to be useful (Score:2, Interesting)