Will New Object Storage Protocol Mean the End For POSIX? (enterprisestorageforum.com) 76
"POSIX has been the standard file system interface for Unix-based systems (which includes Linux) since its launch more than 30 years ago," writes Enterprise Storage Forum, noting the POSIX-compliant Lustre file system "powers most supercomputers."
Now Slashdot reader storagedude writes: POSIX has scalability and performance limitations that will become increasingly important in data-intensive applications like deep learning, but until now it has retained one key advantage over the infinitely scalable object storage: the ability to process data in memory. That advantage is now gone with the new mmap_obj() function, which paves the way for object storage to become the preferred approach to Big Data applications.
POSIX features like statefulness, prescriptive metadata, and strong consistency "become a performance bottleneck as I/O requests multiply and data scale..." claims the article.
"The mmap_obj() developers note that one piece of work still needs to be done: there needs to be a munmap_obj() function to release data from the user space, similar to the POSIX function."
Now Slashdot reader storagedude writes: POSIX has scalability and performance limitations that will become increasingly important in data-intensive applications like deep learning, but until now it has retained one key advantage over the infinitely scalable object storage: the ability to process data in memory. That advantage is now gone with the new mmap_obj() function, which paves the way for object storage to become the preferred approach to Big Data applications.
POSIX features like statefulness, prescriptive metadata, and strong consistency "become a performance bottleneck as I/O requests multiply and data scale..." claims the article.
"The mmap_obj() developers note that one piece of work still needs to be done: there needs to be a munmap_obj() function to release data from the user space, similar to the POSIX function."
Re:we've all een cyber-crippled. (Score:4, Interesting)
What I want to know is how the fsck they're still posting anonymous when Slashdot disabled it on D1, D2, and mobile. Guess they removed the checkbox from the frontend, but didn't actually disable it on the backend. Typical clueless n00b dev shit.
Else, I'd guess it's the Slashdot editors who have been posting Naz1 swastikas and these spam rants.
Lameness filter: these witless fucks blocked the word Naz1.
Re: (Score:2)
You'd think by 2020 even new devs would've learned not to trust the client, since that's been the root cause of 90% of the exploits we see anymore.
Of course, it's also possible the back-end is such a rat's nest of convoluted code that they're afraid to touch it.
Re:we've all een cyber-crippled. (Score:4, Interesting)
The last person I worked for couldn't code their way out of a paper bag. Mandated that everything be done on the frontend at all costs, because they couldn't grasp their own backend. They wondered how they kept getting "hacked".
Well, you didn't disable the shit, you just put a piece of tape over it... dumbass.
This person also used upwards of *200* external javascript modules in a single page load, because they refused to write their own code. I mean, to the point where, instead of using the fucking math operators of the language, they found a third party package that wrapped each one in a function, "add()", "subtract()", and so on.
That job was miserable, because they were also convinced that they were God's gift to programming. Very thankful to have gotten out.
Re: (Score:1)
Sounds pretty Dunning-Kruger to me... assuming the add() and subtract() weren't for an arbitrary-precision math library that was actually needed or something.
Javascript binary operators are nuts though. Since the (is it the only one?) primary numerical type is 64-bit floating point (which has a smaller mantissa, of course), binary ops convert the floating point arguments to signed 32-bit integers, perform operation, and then convert them back to 64-bit floating point.
I implemented a rudimentary Excel file g
Re: (Score:2)
Sounds pretty Dunning-Kruger to me... assuming the add() and subtract() weren't for an arbitrary-precision math library that was actually needed
You don't understand enough about how code is written, that's why you got stuck on top of Mt Stupid. And accused somebody else of it. (They always do...)
Here is the detail you missed: They don't even write JS anymore. They're writing code in some other language that gets fed through a code generator that outputs JS. It isn't optimized in such a way that it can reduce itself to native functions, instead they wrap everything so that they can apply the object semantics from the source language, instead of havi
Re: (Score:1)
I only learned enought Javascript to build a small personal project to my own satisfaction, and I only used vanilla JS.
I guess you're talking about things like React or Typescript, which yes, are very popular. The way I read GP, it sounded like the person complained about pulled in some library in vanilla JS that wrapped the math operators in functions. If GP was merely complaining about the use of a framework, then it puts their post in quite a different light.
Re: (Score:2)
The last person I worked for couldn't code their way out of a paper bag.
The irony is that this is precisely why all the stuff the AC is whining about is true. I quit web development because I got fed up with the egos and complete lack of discipline.
Re: (Score:2)
They probably hired somebody off a scripting site to write it, even though they've got lots of people on here that have experience with their codebase. LOL
Re: (Score:2)
I read the article and... (Score:5, Insightful)
Re:I read the article and... (Score:5, Insightful)
The article is quite confusing. I think what this is saying is that they built a function() that allows you to emulate mmap on a cloud store, like S3.
And TFA has nonsensical gems like this:
Using memory mapping to copy object data into the device means that all the data is temporarily stored and processed on the device rather than in POSIX.
Having used mmap() many times over the years, as well as the more usual read/write APIs, I simply have to ask, "What?"
Re: (Score:2)
Re: (Score:3)
So does you sentence mean anything? Or not mean anything?? Of what do my sentences mean or not??? God I am so confused!!!
Back to reality, these semantic paradoxes really shook up the foundation of mathematics about 100 years ago, trying to solve them directly lead to the development of computer science.
Re: (Score:2)
So does you sentence mean anything? Or not mean anything??
yes.
Re:I read the article and... (Score:5, Insightful)
What they've done here is create a mmap() like function, which can map a remotely stored "file/object" to local memory.
This however implies that you now have HTTP client, a JSON parser and god knows how many libraries in the kernel. What could go wrong with that....
Re: (Score:2)
What need is of a mmap-like function when mmap() itself works on NBDs (Network Block Devices)? You can even use NBDs as swap.
Since NBD itself works similarly to FUSE (only of course much more simple), you can implement all the HTTP code in userland, and since you don't have to support all the intricacies of the HTTP protocol (the only thing you should support is fetching ranges), all that could be very simply done, right now.
I'm no Linux historian, but NBDs are probably supported since at least two decades.
Re: (Score:2)
Someone unfamiliar with a subject is trying to write an article about it.
What they've done here is create a mmap() like function, which can map a remotely stored "file/object" to local memory.
This however implies that you now have HTTP client, a JSON parser and god knows how many libraries in the kernel.
That seems a longer jump even than an underpants gnome could make.
It has nothing to do with any of that, it is like switching from local storage with hand-coded metadata to in-memory ProtocolBuffers or something.
And even in the awful article, it explains that the advantage of mmap() is that it doesn't go through the kernel, and that this is adding a mmap_obj() that also doesn't go through the kernel. To replace the POSIX code, which does. So you're on your head there.
But this is already what we do in embe
Re: (Score:2)
rather than in POSIX.
I don't even understand how a compatibility standard is a place!
Re: (Score:3)
"The article is quite confusing. "
You read the article? With that uid?
Wait until your dad finds out you're using his account.
Re: (Score:2)
I thought it was all the kids with uids > 3M that didn’t read the articles ever???
--
I lost access to this account for over 10 years starting in 2007. Every day, every hour, every minute, johntheripperworked its ass off. I knew the password was within 8 alphanumerics plus a “-” and “@” and started with a specific number. 2,176,782,336 possibilities.
I dutifully attempted to crack my slashdot.org password, every 15 seconds, for years. Their supoprt team never ever responded.
On
Re: (Score:2)
What no F'n fail2ban ??
Bloody idiots!
Re: I read the article and... (Score:2)
I guess this explains why so many low UID accounts seem like they have turned into troll accounts.
Re: (Score:2)
What? no ... that's not the problem.... (Score:4, Interesting)
Re: (Score:3)
I get the feeling that stuff like this is "The old is bad, let's throw it out completely and build something new and completely different".
Then you'll discover that you only create discontent among everyone that's going to use it.
Breaking backwards compatibility is one of the worst things you can do in a system because it kills well-working old solutions and ways of working with little or no benefit.
Re:What? no ... that's not the problem.... (Score:4, Insightful)
And in return, every read becomes a write, and you lose all parallelism of read-primary workloads. Nope, atime's crazy. Relatime is a good hack, but better would be throwing that misfeature away.
Similarly I can go on about Posix locking, fcntl(..., F_[GS]ETLK(W)?, ...) vs. flock. fcntl has the lock owned by the file descriptor, so if you fork while you hold the lock, your child owns the lock... but support range locks. lockf has the lock owned by the process so you're not shocked by the ownership rules ... but doesn't support range locks. fcntl() locks calls "set the lock state" instead of taking a lock... so if you lock [0-10] and [5-15] and then unlock [5-10], you have [0-4] and [11-15] locked... don't lose your state. And so on...
rename? Rename we should keep. And hard links while we're at it.
Re: What? no ... that's not the problem.... (Score:1)
We all set noatime on our block device mounts, right? mtime is good enough for me.
Re: (Score:2)
While we at it, can we please have; unicode filenames, filenames cannot start with a hyphen, shell filename globbing that always prefixes a folder (one of './', '../' or '/').
Um ... (Score:3)
No.
Great News! (Score:5, Funny)
From TFA:
The need for a POSIX interface could be bypassed altogether with object storage by using a REST interface for applications.
For many years I've been wishing that they'd replace the bloated, slow and hard-to-understand POSIX API with a simple, streamlined, high-performance interface like REST.
The only downside I see is having to spend dozens of hours in meetings deliberating over which calls should be "POST" vs "PUT". But nevertheless, that will be well worth it for this upgrade!
Re: (Score:2)
In 2.0 we're expecting an upgrade. There will be no new functionality but the rest API will be like this: cp: {"originalFileName": "path", "copiedFileName": path}.
The new way is more correct and anyone who doesn't "cling to the old" will have no problem spending two days rewriting parts of their code for the update.
Re: Great News! (Score:2)
Can you see how insanely verbose and cumbersome this is?
And I mean "insane" as in literally mentally insane.
A plain text parser?? UTF-8 at the bottom. Basically a compiler at the top. Tons of escaping and variant data types in the middle. Data & CPU waste level: Over 9000.
If you absolutely need variable length fields, at least use binary markup! You can still have the editor translate binary numeric tokens to plain text tokens back and forth, using a simple map file. Unicode and ASCII/ANSI already do th
Re: (Score:3)
Re: Great News! (Score:5, Insightful)
It's all fun and games until someone thinks you are serious and inflicts this sort of monstrosity on people.
Re: Great News! (Score:2)
Re: (Score:2)
Re: (Score:1)
From TFA:
The need for a POSIX interface could be bypassed altogether with object storage by using a REST interface for applications.
For many years I've been wishing that they'd replace the bloated, slow and hard-to-understand POSIX API with a simple, streamlined, high-performance interface like REST.
The only downside I see is having to spend dozens of hours in meetings deliberating over which calls should be "POST" vs "PUT". But nevertheless, that will be well worth it for this upgrade!
Use REST with RUST. That'll lead to DECAY.... right?
Thanks, I'm here all week.
Hype, hype and more hype. (Score:5, Informative)
What they are talking about is adding a function that would allow proper utilization of object storage. Honestly, this is like saying epoll would be the end of POSIX. Frankly, if they standardized how object storage worked then they could even get it into a future version of POSIX.
Everything about this article is hype, even if object storage is a major component of what Big Data uses in the future.
Re: (Score:2)
Object storage makes me think of serialisation. I am currently working on a binary serialisation format and API that can represent lists, tuples, records, dictionaries, etc. Think of it as a binary JSON, with the emphasis on access speed. That is what object storage means to me: structured data on disk. I may be way off here, so I won't mind being called an idiot, well not much, anyway.
Probably not (Score:2)
I mean, the API will still exist, if only for the massive amount of legacy code that expects it to exist. Much like Win32 isn''t going anywhere any time soon.
Re: (Score:2)
I agree with you. It's actually a pretty useful thing too; it's handle-based and c-friendly. The idea of a message-loop and event-driven programming is also very useful
Didn't we go through this once before? (Score:2)
At one time object-oriented databases were all the rage - destined to make SQL databases obsolete. Where are they now?
Object storage? Snake oil, methinks.
Re: (Score:2)
Betteridge (Score:5, Informative)
No.
POSIX has been the standard file system
POSIX isn't a file system. POSIX is also a lot more than the file I/O spec. Perhaps an object storage spec will be added to POSIX. It's been done for DBMS systems already.
Next step: Who's object model shall we adopt? Let the competition begin. I'll get the popcorn.
Nothing to see here, likely GPT-3 content (Score:5, Insightful)
The TFA was written by a marketing bot or human drone and contains many nuggets of wisdom such as:
"POSIX has been the standard file system interface for Unix-based systems (which includes Linux) since its launch more than 30 years ago. Its usefulness in processing data in the user address space, or memory, has given POSIX-compliant file systems and storage a commanding presence in applications like deep learning that require significant data processing"
"POSIX has its limits, though and features like statefulness, prescriptive metadata, and strong consistency become a performance bottleneck as I/O requests multiply and data scales, limiting the scalability of POSIX-compliant systems. That's often an issue in deep learning[...]"
"Object storage is the most scalable of the three forms of storage (file and block are the others) because it allows enormous amounts of data in any form to be stored and accessed. "
"Using memory mapping to copy object data into the device means that all the data is temporarily stored and processed on the device rather than in POSIX."
"The SSD or other external device has much more available space for computing. The external device (a form of secondary storage for that computer) connects directly to the computer system and the CPU has a path to the data in the device: it is available almost as main memory while attached. Memory stays in the SSD during computing, and actively accessing the data—particularly the metadata—becomes much faster."
"Network computing power and speed will skyrocket. Though this may have its limitations - transferring data in file and block storage to object storage, for one - it will mean new developments for data-intensive computing."
Re: (Score:3)
It really is just gibberish. What the fuck would an object-oriented filesystem be other than some vast linked list with oodles of metadata. This is how the Presentation Manager worked on top of HPFS on OS/2, so that you could use inheritance to make special kinds of files and folders.
Re: (Score:3)
It really is just gibberish. What the fuck would an object-oriented filesystem be other than some vast linked list with oodles of metadata. This is how the Presentation Manager worked on top of HPFS on OS/2, so that you could use inheritance to make special kinds of files and folders.
Hell, I feel like classic MacOS system software did this better with the Resource Manager and empty data forks by the '80s.
Re: (Score:3)
I've been told /dev/null is webscale, but I don't know if it supports sharding...
Object store is kind of a cult. They have some points that are frequently valid and may justify a 'POSIX-lite' where some POSIX guarantees that are expensive could be relaxed, but in general in a local context an object store model doesn't generally outdo a POSIX filesystem. POSIX over remote data stores is where things get messy, and why over-the-network software sometimes benefits by skipping POSIX guarantees to get some perf
Re: (Score:3)
Possibly by GPT-3 or similar. Expect more of this kind of nonsense in the future.
https://www.theguardian.com/co... [theguardian.com]
What's more concerning, was that it slipped passed the editors (not too surprising though, since they are probably also bots) and that people here are discussing the content at face value.
The article is spam. Slashdotters (at least those who are not bots nor Russian trolls) should know better.
why isn't systemd doing this? (Score:2)
Re: (Score:2)
In all seriousness - when I read the summary, my first thought was "is this a new project of Poettering's?"
Re: (Score:1)
Maybe this new system will be Poettering all the way down.
Don't give him any ideas! (Score:3)
Why TF would you invoke that name? He's gonna see your post and *do it*.
Next it'll replace files in systemd, so all systemd systems will have to use that shit.
File::read is dead! (Score:2)
Kids re-inventing things, badly, yet again. (Score:3)
A file system is a database is an object storage is a network os a graph is a structured binary file is a whatever.
Itâ(TM)s all just different interfaces optinized for different use cases.
And humble files are not going away anytime soon
Also, seriously, look up what "POSIX" actually is. Because I doubt you really know.
Doubtful POSIX file semantics will go away (Score:3)
First problem: map the object into what representation in memory? C++ has a different in-memory representation than Ruby, which in turn differs from Javascript. In fact it probably varies depending on which flavor of the language you're using, not just the language. And some parts of the representation that, for instance, tie the object to the code needed to implement it's class can't really be represented in the storage representation because they aren't known until an application goes to access the object. POSIX file-access functions may go away for some application programmers who're working in a specific language within a specific framework and with a specific object-storage system implemented for that language and framework, but the fundamental calls to deal with physical storage will still be there and the only question will be how many layers of the stack exist between the application programmer and the physical storage access.
What will make for a game-changer is a new method of physical storage that follows different rules from address-based random-access storage (eg. content-addressable memory). Developers have, with the rise of fast hard drives (or devices that look/act like hard drives), forgotten the fun of dealing with different kinds of physical storage (eg. ones that have to be physically accessed sequentially, you can access the next or previous bit but you can't jump around except by repetitively scanning across each bit in the desired direction in turn).
Re: (Score:2)
I've been told by a few people I find credible that the problem is generally not with the POSIX calls people are accustomed to making, but on the filesystem side the guarantees it must comply with, particularly in a remote cached content. An 'object store' approach is simpler to implement in a fast way than NFS.
Of course, 'object store' is a bit vague and devoid of standardization, and generally only useful to a human after another layer of software has abstracted it somehow, so it's a bit silly to imagine
Re: (Score:2)
Those semantics are there to guarantee that the filesystem behaves the way people expect it to. The problems if you relax those constraints are the same problems you get in relational databases if you relax the rules surrounding transactions or eliminate transactions entirely. We've already seen the results when we started implementing RESTful services in front of databases: SQL record locking became impossible because the Read and Update operations had to be in separate transactions, so someone had to come
DOS 1.0 also did not use hierarchical file storage (Score:2)
im not sure what they are saying in this article but it seems like they want to get rid of all these complicated subdirectories and filesystems and write data directly to storage.
DOS 1.0 also did not have a hierarchy, because there were no subdirectories. files names were 11 characters long. that was it. no fancy shamncy redundant metadata. they had drive letters though.
Re: (Score:2)
Neither did the C64 or TRS-80. Back to the basics!
POSIX ... (Score:2)
... lalalalalLALALlal!
Don't spoil it I haven't seen "Piece of Shit 8" yet!
TFA is the term is referring to the Author, yes? (Score:3)
What TF is this about? End of POSIX? In one particular use case perhaps maybe possibly create a new API ... but this is eFFing generalization that is worthy of Fox news!
Now can some one explain if there is even a tiniest beam of reason here? How is the mmap() different from mmapobj() ? (no I do not have a weekend to spend trying to understand mmapobj-> NVMeOF->RDMA). I suspect it is about resource discovery not the actual write/read verbs perhaps?
I see it as simple mmap() that accesses non-local "files"? (file = "an entity consisting of sequence of bytes" with random/block access capability perhaps?). I know I am missing something her so help me please.
I actually <SHUDDER> RTFA (Score:3)
To borrow a turn of phrase, it's not even wrong.
I have no idea what they think is so magical about object storage. Inodes are objects.
As for the whole thing about fabrics somehow magically working, they don't. Sure, you can share memory over a fabric, but the overhead tends to eat you alive. There are fantastically expensive systems that can do so without terrible performance, but even then, they tend to be fragile and nowhere near as fast as local memory. Certainly that has nothing to do with the end of POSIX, most experiments in that direction take place on POSIX systems.
As for the rest, I guess they've never heard of the AS/400?!?
mmapobj() vs mmap() ? (Score:2)
Is this about how quickly one an mmap an object (still object is a "file" == sequence of bytes) w/o having to go through POSIX "file system"?)
Why would it not be possible by creating a specialized file system, that does not have semantics of directories? (say every file is identified by some unique key w/o any semantics of the key value?). So how is the mmapobj() different from using mmap() with such file system? (I claim 5th in terms of understanding the POSIX overhead of open() and other APIs).
What is the
Reinventing the wheels (Score:2)
The ideological obsession with Unix "everything is a file", and the more general "everything is one" simply breaks down in contact with the real world.
Crappy article, but maybe useful in-between (Score:5, Interesting)
- The problems associated with the posix file system is that unix file systems such as BtrFs, ZFS, Ext4, XFS, etc... are translation layers to store information on block devices. There are definitely many technologies in these different file systems that add resiliency and even performance, for example, ZFS has RAID, and write logs and read caches. XFS has excellent hashing for integrity and also has a write log that can be stored on low latency devices (though it's quite limited). And all that, but the multi-exabyte storage systems I'm working on in high performance computing for scientific processing, we tend to simply place XFS on top of RAID for massive storage. It's reliable and it's safe. In the core of the HPC, we would never consider this.
In high performance computing, we tend to have massive near-line storage systems as our data sets are... well huge. The project I'm working on generates 2TB of data per second every second 24/7 for decades at a time. We then have to keep that data online and accessible for 25 years (current mandate, looks like we'll be getting a grant for another 25). So we go for massive and cheap.
XFS and RAID are not a great option for this, but using technologies like dCache, we get geo-replication on top of pretty much anything. In fact, it allows us to scale pretty well between disk and tape. At the moment, we have at least several petabytes hard disk storage in about 100 countries and we have often much much more in online tape carousels.
RAID is quite terrible for performance since RAID-6 writes (and we always use RAID6) are painfully expensive and resyncing after a single disk failure can easily take weeks. So we tend to waste a huge amount of space by making smaller RAIDs.. typically only 10-14 disks each. In my little project currently, I have 22x13 drive RAIDs. It takes 2-3 days to resync a disk.
- For online storage (rather than the nearline mentioned above), we generally have petabytes of RAM. Using tools like Slurm, which is an HPC job scheduler, data is copied from disk (almost always spinning disk) into RAM or onto SSD if the data set exceeds a few petabytes. In biological computing, it's not uncommon for a single job to need 10 or more petabytes of storage. Often this SSD currently is SAS 12GB/s connected, though we're seeing a lot more use of Intel Optane or similar technologies coming in-between. If you're attempting to compare the characteristics of a single strand of DNA again a few hundred other strands, the original strand is typically stored in RAM and the other strands are loaded as segments across nodes.
- High performance computers, and this may come as a shock... tend to make use of high performance technologies. As such, we use Infiniband for clustering. We are investigating RDMA over Converged Ethernet at this time, it is attractive since Ethernet tends to be a little bit ahead of Infiniband in bandwidth... generally at a high cost of latency. Converged Ethernet is actually a ruinous disaster based on 802.3 flow control with an 802.1q class of service to "prioritize it". The problem is that even with the best Ethernet switches on earth, the generally flawed design of Ethernet almost always requires store and forward for packet forwarding. Infiniband is almost always cut-thru... Ethernet tended to get around this by over provisioning, but it's still not a very good solution. Infiniband is probably going to be around for a while... even though it's much more expensive.
- This brings us to file systems on the HPC nodes themselves. The author of the article seemed to get very confused. He was under the impression that NVMe over fabric would be a good solution. It is a truly awful solution in HPC. First of all, we use structured data in high performance computing. This can be a file as you'd find on a file system like Lustre. Or it could be an object as he is representing.
- Also, NVMe over Fabric is a fabulously stupid design as the NVMe proto
Re:Crappy article, may be "placed" (Score:3)
Besides being crappy, it also looks rather like someone created it out thin air...
The citation points to an Enterprise Storage Forum (ie, Eweek) article, which eventually points to the Prior Art Database, at https://priorart.ip.com/IPCOM/... [ip.com]
A search of Google and Google scholar for "User-level low-latency access via memory semantics to objects in
Re: (Score:1)
Can you get in touch? (Score:1)
Hi LostMyBeaver,
Given everything you say, you sound like you're working in the same industry and locale as me.
I'm trying to better understand the use cases involving fast object storage access in scientific research and HPC, specifically in genetics.
I'm not at the same scale compared to the numbers you're using but is there any chance you would be willing to get in touch to talk?
---
2783b3d7-9aef-4f15-804f-7de056ef4204@anonaddy.me
Forget POSIX file sizes should be 255bytes. (Score:1)
Blockchain? (Score:1)
But does it have blockchain?
If there is no blockchain then it is doomed.
dumb article (Score:2)
POSIX has outlasted other object stores. (Score:2)
Object stores have been the next big thing for 25 years or so. They'll get there, but to say that they'll obsolete the POSIX file-system interface indicates a gross misunderstanding of what people use filesystems for. Kind of comparable to saying that bitcoin will obsolete credit cards, or that iPhone will obsolete automobiles.