Reaching Beyond Two-Terabyte Filesystems 173
Jeremy Andrews writes: "Peter Chubb posted a patch to the lkml, with which he's now managed to mount a 15 terabyte file (using JFS and the loopback device). Without the patch, Peter explains, "Linux is limited to 2TB filesystems even on 64-bit systems, because there are various places where the block offset on disc are assigned to unsigned or int 32-bit variables."
Peter works on the Gelato project in Australia. His efforts include cleaning up Linux's large filesystem support, removing 32-bit filesystem limitations. When I asked him about the new 64-bit filesystem limits, he offered a comprehensive answer and this interesting link. The full thread can be found here on KernelTrap.
Reaching beyond terabytes, beyond pentabytes, on into exabytes. I feel this sudden discontent with my meager 60 gigabyte hard drive..."
Testing (Score:3, Interesting)
Re:Testing (Score:2)
# This creates a "sparse file" of 16 terabytes.
# It will not test all attributes of file creation,
# as the blocks on disk are not actually written,
# but it will fail on modern Linux boxes. Now,
# the question of whether Perl is 64-bit clean,
# down to the seek(2) call is interesting....
$tmpf = "ohmyyourabigoneaintcha";
open(TESTFILE,">$tmpf"
seek(TESTFILE,0,(1024**4) * 16);
close(TESTFILE);
print "Test file ($tmpf) is ", -s($tmpf), " bytes\n";
Re:Testing (Score:1)
Re:Testing (Score:1)
Since NTFS support under Linux is pretty shoddy, maybe it's time to get serious here and switch to Windows 2000. Recall that NTFS theoretically has NO maximum file size. [pcguide.com]
On the other hand, if you are doing your calculations using Linux-proprietary software, you could mount the Win2k storage array as a samba volume under Linux, and store your data using, say, gigabit ethernet. Another solution is to write proxy software to create an in-between filesystem between the program and the actual filesystem. The data would be stored contiguously in a "virtual filesystem", which would actually consist of multiple files in the actual file system.
Since this software is pretty new, I don't know if I'd trust it with any Terabyte-sized files right now.
To see a real-world example of huge amounts of data, visit Microsoft TerraServer. [msn.com] From the site:
"All the imagery and meta-data displayed on the TerraServer web site is stored in Microsoft SQL Server databases. The TerraServer image data is partitioned across three SQL Server 2000 1.5 TB databases. USGS aerial imagery is partitioned across two 1.5 TB databases. The USGS topographical data is stored in a single 1.5 TB database. Each database server runs on a separate, active server in the four-node Windows 2000 Datacenter Server cluster... (Let mySQL try THAT...)"
"Microsoft TerraServer runs exclusively on Compaq servers and storage arrays. Compaq Corporation donated the 4 Compaq ProLiant 8500 database servers. The disk storage equipment, 13.5 TB in total, was donated by the StorageWorks division of Compaq Corporation. The web servers are eight Compaq ProLiant DL360, "1u" processors."
See... Bill DOES know where you live!
Re:Testing (Score:2)
1.5 TB 2 TB
Re:Testing (Score:1)
Re:Testing (Score:2)
Actually big organizations with lots of money are much more likely to use a free OS for custom implementations because it's a lot more reliable, faster (and cheaper, but they have tons of cash, so that's not the deciding point) to modify Linux/BSD than to hope some other corp will put out an OS that works for you.
Re:Testing (Score:2)
Did you notice that IDC did not release numbers this year for the first time?
Probably because Windows is no longer number 1.
And Windows was only dominating "old" niches like printservers and fileservers. In every market that is younger than 10 years Windows presence is pretty weak.
Brain Contents (Score:3, Interesting)
Aside from all sorts of quantum fiddly bit problems, I wonder just how long it will be before we can store the state of every neuron in a brain (doesn't have to be human, at least not at first) on a hard drive.
Of course, then what would you do with it?
Re:Brain Contents (Score:1)
Seriously, I'm wondering how exactly they "estimate" that.
Re:Brain Contents (Score:2)
I don't think the fact that the brain is not digital should prevent equivalency efforts. Music is not initially digital either, but we still know the issues in translating. Generally, the same issues apply: how "accurate" do you want the representation? For example, does the neuron "firing threashold" value need to be stored at double precision? Maybe one byte is enuf. Do we need to store the activation curve for each one, or can each cell be tagged into a "group" that supplies a sufficient activation approximation formula? That we don't really know.
How accurate the representation needs to be is still hotly debated. We can do things to our brain like drink wine or coffee, which alter its state a bit, and kill some cells, yet it does not crash (at least not stay crashed). Thus, it does seem to have fairly high tolerances, meaning that super-detailed emulation is probably not necessary for a practical representation.
Re:Brain Contents (Score:2)
Second, the estimate goes up as our notions of what big storage is. I don't think any seriousperson would acribe any sort of byte value. But if you think about it, we don't have that much capacity in our minds, our data storage is extremely lossy and we have very good minds that involve deriving a likely past state based on very few details.
Re:Brain Contents (Score:1)
BTW How do you back up a exo-byte, with an Iomega exo-drive?
What's up with Constellation 3D and those other guys who are developing TB capable disks? Anyone get it to market yet or soon? C3D said on their website that they had an HD-TV recorder working and displayed at a treade show it but I haven't seen anything about it yet.
Re:Why, make HAL of course! (Score:1)
Re:Brain Contents (Score:2)
I like to think of hypnotism as some sort of "Debug Mode" that allows direct access to the lower levels of your brain that people have trouble accessing normally.
Re:Brain Contents (Score:2)
Understanding the mind is a an iffy proposition. In a way it's like measurement of sub-atomic particles, to define and measure your subject you must contextualize your own relationship to it so you have no choice but to define meanings arbitratily and nobody is satisfied that what you're doing is scientific. You can use language to describe the mind, but the mind is a concept that only has meaning within language and at the same time is the source of your language skills, so it has a tendency to infinitely receed from definition and lead to endless bickering over petty details that do little to clarify the object at hand.
But simply recording enough sensory data to replay the sum total of one's sensory experiences is totally doable.
I sat down and looked at how much DivX 640 by 352 compressed video you'd get on a terrabyte and it was like five or six months of non-stop video and audio data at a fairly decent quality. Given that blue laser DVD is supposed to be hitting terrabyte per disc already, it looks like recording the sights and sounds of ones entire life, albeit at compromised quality will easily become doable, if not commonly done in our lifetimes.
Pentabytes? (Score:4, Funny)
Re:Pentabytes? (Score:2)
Re:Pentabytes? (Score:1)
What's greater than Exabyte ? (Score:2)
When things go larger and larger, I get confused.
Okay, I know what's Exabyte, but what are the-still-larger ones ?
Re:What's greater than Exabyte ? (Score:2)
That buys you 6 more orders of magnitude...good enough for government work!
Re:mod this up... what the hell is a peNtabyte?? (Score:1, Funny)
Hmmmm (Score:1)
No, it doesn't.
He's got a lot of work (Score:1)
General Kernel stuff
Fix all kernel warnings
All kernel warnings? That's almost like being a fire-fighter in hell..
OS X does this for some time now. (Score:2, Flamebait)
It's not very suprising that Linux is lacking these features. It's more hobbyist style and still contains some serious design failures like the missing microkernel Mac OS X has for some time now.
Many people here at slashdot bitch at the academic/professional world but at examples like this you see that professional, thoughful design always pays off in some time.
Re:OS X does this for some time now. (Score:2, Interesting)
Yes, it's hobbyist based. Yes, it's great that FreeBSD supports it. Honourful! But Linux has had more important features to implement before this - because only a very few people have had access to these kind of disks.
However, 2 TB is not that much - and it's about time Linux supports it.
Re:OS X does this for some time now. (Score:1)
As far as what you actually said, I think we have a chicken-and-egg fallacy here that actually seems to limit the scope of Linux. You say that 0% of Linux's intended users need 60 TB (or <2 TB). But that's just it -- as long as Linux doesn't support 60 TB files, none of the people who need 60 TB files will use Linux. Who is doing the intending here? Is there some group that decides what are "intended" markets for Linux? No, I see people applauded all the time for using Linux in random and completely unintended uses, and it is amazing how many different ways Linux can be used.
So what are you trying to say anyway -- that it is ok that Linux isn't as good as FreeBSD/OS X because anybody who uses Linux is not going to be worried about big-time stuff anyway? Yuk.
I think this is a great patch -- it fixes a problem that didn't need to be there and that prevented Linux from entering a fairly important niche. This opens up another group of "intended" users.
Re:OS X does this for some time now. (Score:1)
On the other hand, approximately 0 % of Linux' intended uses does need 60 TB at this time.
Probably because 100% of users who need 60 TB at this time see that Linux can't do it, and decide to use something else.
Re:OS X does this for some time now. (Score:1)
missing? your going to have to be a bit more clear, it is a bit like saying that a car is clearly defective because it dosen't use the type of engine you like.
Re:OS X does this for some time now. (Score:1)
Re:OS X does this for some time now. (Score:2, Funny)
Really? Who did you take it from?
- A.P.
Re:OS X does this for some time now. (Score:2)
- A.P.
Re:OS X does this for some time now. (Score:2)
And what do you mean by "automatical"? Overall, I think your post probably has more propaganda than real experience behind it.
-Paul Komarek
Speed is not all that counts (Score:1)
Microkernels may be slightly slower by nature than monolithic kernels like Linux, but the difference is rapidly becoming a nonissue with increasing processor speeds and better kernel designs.
In the meantime, microkernels are allowing for a host of new and useful features that monolithics just can't do: user-mounted file systems, increased security at the kernel level, dramatically increased ease and speed of development of kernel-level components, the ability to load entire separate operating systems interfacing with the same or separate hardware with no external software... to name a few.
Eventually speed will no longer be considered a primary goal, in fact, it is slowly but surely becoming trivial. Microkernels will win out if monolithic/Linux advocates can only use the speed argument to try to show the superiority of their kernel.
Some Disk Array (Score:1)
Now last time I looked the biggest common HD was a 180Gb Seagate Barracuda, so they would still need nearly 100 of these babies to get to 15Tb, costing well over $100,000, and that's before you get to the power/housing/cooling nightmare.
Or do they have some fancy way to store bits using thin air that the rest of us don't know of.
Re:Some Disk Array (Score:2)
Re:Some Disk Array (Score:1)
Re:Some Disk Array (Score:1)
Of course, they probably actually did it with a real file, but there's no reason it couldn't work this way.
Re:Some Disk Array (Score:1)
Now if I can just get the 760-MPX chipset to stop locking up everytime the system boots I'll be happy and finally post benchmarks.
-TheDarAve-
Re:Some Disk Array (Score:2, Informative)
The next round of storage servers that I buy will probably be even bigger, and it'd be nice to be able to use them as one big partition. Pity that I'll have to wait for 2.6 for that.
Re:Some Disk Array (Score:1)
pentabytes? (Score:5, Funny)
Re:pentabytes? (Score:2)
Re:pentabytes? (Score:2)
--me
Re:pentabytes? (Score:2)
No, it's the symbol used for protection when computers summon the minion of The Gates and The Balmer: Clippit!
Peter Chubb... (Score:1)
Wow! (Score:5, Funny)
Keep it up guys - until they create some sort of 'Linux kernel mailing list' the Slashdot front page is my only source for this information.
Re:Wow! (Score:2)
I suppose you suggest everybody wade through 250 mails/day to find the interesting ones? The logical extension of your argument is that non news sites are needed because people can do their own research.
xfs for linux (Score:5, Informative)
26^3 = 9 x 10^18 = 9 exabytes
check out the feature list. [sgi.com]
Re:xfs for linux (Score:1)
Re:xfs for linux (Score:2)
arithmetic? (Score:3, Informative)
For those who wish to communicate with the rest of the world, the following calculations actually make sense:
For the uninitiated, these terms are described here [cofc.edu]
Even accounting for your typographical error, 2^63 != 9 * 10^18 (9223372036854775808 != 9000000000000000000)
Re: -bibytes (Score:2)
Intelligent people have no problem with the idea that a kilobyte has 1,024 characters. Hard drive manufacturers always have, but they are hardly paragons worthy of emulation.
Stop out the kibibyte nonsense now, before it gets any further.
Re:xfs for linux (Score:2, Insightful)
This is exactly this problem that was adressed with that patch referenced in the story.
Do you really need it? (Score:2, Interesting)
The programmers didn't imagine that in pair of years the base will be so big that it will not fit into any available HDD.
Maybe it will be the lesson for some people who are going to misuse the file system features?
Re:Do you really need it? (Score:1)
Now I won't have to.. (Score:1)
Looks like only two filesystems work today (Score:1)
Only two filesystems, XFS and JFS, seem to really
work with larger than 2 TB in size hard disks.
Files that big (Score:3, Interesting)
but I worry about other data types.
For example, I grumple at the MS stupidity of putting all datafiles into one large container file in a database base under Access in Windows. Which is why I never use it. I prefer discrete files. If one gets hosed, then it is easier to fix.
obviously a database that is that big would run into other performance issues as well. Some of which is handled by moore's law, and some of which isn't.
for similar reasons I tend to divided my drive into various partitions, regardless of which OS I use.
Re:Files that big (Score:1)
Nooooo! another 150 spam emails and the database will corrupt!
Re:Files that big (Score:2)
Personally, I'd love to see MS use it's Jet (Access) database for their next version of Windows - they'd loose all their marktshare in five days tops.
Actually (Score:1)
when a terabyte is not a terabyte (Score:1, Informative)
As you may know if you've been following recent IEC [www.iec.ch] and IEEE [ieee.org] standards (or if you've ever bothered to figure out exactly how large a terabyte is), what disk manufacturers call a terabyte and what this article calls a terabyte differ slightly.
When used in the standard way [nist.gov], the "tera" prefix means 1 * 10^12, so a terabyte would be 1 000 000 000 000 bytes. Unfortunately, computer systems don't use base 10 ("decimal"), they use base 2 ("binary"). When trying to express computer storage capacities, somebody noticed that the SI [www.bipm.fr] prefixes [www.bipm.fr] kilo, mega, giga, tera, and so on (meaning 10^3, 10^6, 10^9, 10^12,
This discrepancy causes some confusion [pels.org]. For instance, if you could afford to purchase such a 2 terabyte hard disk, you might well be annoyed when your system tells you your disk is almost 200 gigabytes (2 * (2^40 - 10^12)) smaller than you thought it would be (most systems would report a 2 terabyte disk as a 1.8 terabyte disk).
The moral of the story is one of:
Interestingly the Slashdot community seems to think [slashdot.org] it should be a combination of 1 and 2.
Re:when a terabyte is not a terabyte (Score:2)
The solution is to label hard drives in accordance with the rest of computer technology. A kilobyte is 1,024 bytes, not 1,000. The kibibyte does not exist!
fsck times (Score:4, Funny)
Danny.
Re:fsck times (Score:2)
I prefer the former myself.
Re:fsck times (Score:2, Funny)
Wow, fsck used to mean fsck, and not....uh...ahem
You know you read /. (and Penny Arcade) too much when you read that and think about Gabe putting a harddrive down his pants.
What about NFS File systems (Score:1)
Truman Show (Score:2)
You could fit movies of everything anyone's ever seen on a Beowulf cluster of these filesystems!
Re:Truman Show (Score:2)
I forget the specifics, but lets say 30 years x an average of 400 cameras (number grew as show got larger) x good quality divx (.3MB/sec)
105EB for the Trueman Show.
I'm sure there is alot of cruft that isn't required -- like 8 hours a night among other things, and you only actually need a few of the cameras recording at any given time -- but it's still a few EBs for decent viewing quality.
I also left out any additional channels dedicated to describing it, extra sound channels for announcers, etc.
Re:Truman Show (Score:1)
That's 13.822174 GB per day. Lets say Truman lives until he's 80. That's 402501.70688 GB of film, which is what? 393.068073 TB ? And that's just assuming one camera
Re:Truman Show (Score:1)
No problem (Score:2, Funny)
Problem solved: Use lzip [sourceforge.net]
MBA Managers won't notice ;-)
For the hardcore, we can build lzip into the FS. So we'll have Reiserfs, ext2, ext3, JFS, and lzipFS. Heck lzipFS might be faster than RAM!
Re:No problem (Score:2)
tunelzipfs -c [compression %]
Re:No problem (Score:3, Funny)
In other words when you try to save a file to lzipFS it might as well return, "yeah" immediately. You tell lzipFS to fsync() and it'll return "yeah" immediately
class lzipFS { /* what we save will be lossy so, what's the point? */
.....
long int fsync() {
// cache->doflush();
return YEEEEAH_FSYNC_SUCCESFUL;
.....
}}
Re:No problem (Score:2)
Maybe Peter Chubb is wrong. (Score:2, Informative)
From the Linux Kernel mailinglist on the status of XFS merge into 2.5:
I know it's been discussed to death, but I am making a formal request to you to include XFS in the main kernel. We (The Sloan Digital Sky Survey) and many, many other groups here at Fermilab would be very happy to have this in the main tree. Currently the SDSS has ~20TB of XFS filesystems, most of which is in our 14 fileservers and database machines. The D-Zero experiment has ~140 desktops running XFS and several XFS fileservers. We've been using it since it was released, and have found it to be very reliable. Uh, so Peter Chubb says there is a 2 TB limit, but these science guys on Fermilab are using Linux with 20 TB filesystems on the SGI XFS port.
Re:Maybe Peter Chubb is wrong. (Score:1)
-Aaron
So, what about the maximum filesize? (Score:2)
Not an impossible amount of data (Score:2, Informative)
1) Where would you store it?
Well, you could store it in a holographic Tapestry drive [inphase-technologies.com]. The prototype, just unveiled a few months ago, stores 100GB in a removable disk, and that is nowhere near the max density of the technology. In their section on projects for the tech, they say that a floppy-sized disk should hold about 1TB in a couple years. Impressive.
2) What would you do with it?
Well, other than high-definition video or scientific experiments, nothing on your own PC, unless you are making a database of all the MP3s ever made or backing up the Library of Congress. But on a file server, you could easily use this much space. The 2TB limit will probably never affect most home users (realizes he will be quoted as an idiot in 10 years when 50TB HDs are standard). On the other hand, Tapestry will probably be useful in portable devices, esp video cameras.
Filesystem on tape (Score:3, Funny)
Woohoo! A filesytem on a tape drive, that's what I need.
Re:Filesystem on tape (Score:2)
I had a 2 gb DAT streamer working under win98 and some special programm, that presented the tape as just another drive letter, it was really cool, expect the latency
Re:Filesystem on tape (Score:1)
Swappable parts, divorce from OS (Score:2)
The "native" disk storage could be used as a kind of cache. The "big fat" storage would be like a *service* that could be local or remote. The OS would not care. It simply makes an API call to the "storage service".
Re:Swappable parts, divorce from OS (Score:1)
But the storage device needs to run on something. It needs to have an IP stack, an network card driver, filesystem support etc and so it needs an OS.
Re:Swappable parts, divorce from OS (Score:2)
Maybe a "controller" of some sort. I was thinking that any networking would be by a "manager OS" but not the controller itself. The manager OS would not be using its own file system. IOW, the manager OS might still have to be local with the controller. However, if you have a direct connection between the disk system and the application's OS, then you would not need a seperate manager OS for networking.
I suppose there are a lot of different ways to partitian it all. My point is that a big file/disk system can exist independantly of the OS so that even Windows 3.1 could access huge amounts of storage without having it built-in to the OS.
Trademark infringement (Score:3, Interesting)
Couldn't it weaken the trademark to have Western Digital or Seagate making a '9 exabyte' hard drive? Or HP or Sony making an 'exabyte-class' tape drive? Wouldn't a judge find (in favor of Exabyte) that the consumer would easily be confused?
*The USPTO are idiots.*
Re:Trademark infringement (Score:2)
64 bit = rest, finally (Score:1)
The good news is, once we move to a 64-bit processor, that's it. We'll correct the code one more time and that's the end of it, since 64 bit ints are sufficient for any imaginable program.
Re:64 bit = rest, finally (Score:1)
Re:64 bit = rest, finally (Score:2)
since 64 bit ints are sufficient for any imaginable program.
That's just like saying, no one would ever need more than 640K.
Time? (Score:1)
When's that going to be fixed?
Re:NTFS (Score:1)
I'm sorry, but I have a hard time seeing where any programmer would even get near the source code. Especially with the amount of bulk in it, it'd be like Microsoft open sourcing Windows. No one would touch it to update it because it's way too stinkin big for anyone to dig through in any sort of timely mannor without getting serious eyestrain or going insane.
I'll probably get modded down for this, but I'll byte:
"Being that there are 8 megs of space reserved for Windows use that are unmounted upon boot and are never really viewable unless you know the OS call. At the fest the explained they used this space to optimize the boot time on XP."
Are they storing that somewhere in the MFT by chance?
-TheDarAve-
Re:NTFS (Score:1, Offtopic)
NTFS file system limits are a lot more real world and not equivalent to JFS's.
JFS was designed to handle up to 2 Petabytes but is limited to where it currently is (on Linux and OS/2 Warp).
Don't confuse MS design limits with MS real world released version actual limits.
what is an exabyte? (list of prefixes beyond gigs) (Score:1)
"9 exabytes" big, which is roughly 1,000,000 terabytes
Very roughly, perhaps. 9 exabytes is actually 9,000,000 terabytes.
For those that haven't got hard disks this big, here's a list [firmware.com] of names for sizes beyond megabytes and gigabytes.
Re:humm, just a simple question... (Score:1)
Re:humm, just a simple question... (Score:2)
I'm sure nuclear simulations, or any natural simulation (like weather) will create massive datasets too.
metric
Re:humm, just a simple question... (Score:1)
Re:humm, just a simple question... (Score:2)
You want at least a 20KM grid resolution (actually, you rather better than that, but we have real-world constraints in my business :-) -- that means something like a
200x200x25 grid. So that (at 4 bytes/number), one 3D state variable occupies 4 MB. For air
pollution, you will need about 60 such variables
(12 meteorology state variables, and another 48
or so chemistry variables): 240 MB per time step.
The summer air pollution season is about 100 days long, and you'll want to use a time step of half an hour or better (by Courant's theorem, wind speed gives the translation factor between spatial resolution and the required temporal), so that's 4800 time steps.
240 MB/step times 4800 steps -- about a terabyte.
Go to a (better) 10KM resolution, and the compute time and data set size go up by 2^4. fwiw.
Re:Here's how we could get around it... (Score:1)