Memory Sealing 'mseal' System Call Merged For Linux 6.10 (phoronix.com) 50
"Merged this Friday evening into the Linux 6.10 kernel is the new mseal() system call for memory sealing," reports Phoronix:
The mseal system call was led by Jeff Xu of Google's Chrome team. The goal with memory sealing is to also protect the memory mapping itself against modification. The new mseal Linux documentation explains:
"Modern CPUs support memory permissions such as RW and NX bits. The memory permission feature improves security stance on memory corruption bugs, i.e. the attacker can't just write to arbitrary memory and point the code to it, the memory has to be marked with X bit, or else an exception will happen. Memory sealing additionally protects the mapping itself against modifications. This is useful to mitigate memory corruption issues where a corrupted pointer is passed to a memory management system... Memory sealing can automatically be applied by the runtime loader to seal .text and .rodata pages and applications can additionally seal security-critical data at runtime. A similar feature already exists in the XNU kernel with the VM_FLAGS_PERMANENT flag and on OpenBSD with the mimmutable syscall."
The mseal system call is designed to be used by the likes of the GNU C Library "glibc" while loading ELF executables to seal non-writable memory segments or by the Google Chrome web browser and other browsers for protecting security sensitive data structures.
"Modern CPUs support memory permissions such as RW and NX bits. The memory permission feature improves security stance on memory corruption bugs, i.e. the attacker can't just write to arbitrary memory and point the code to it, the memory has to be marked with X bit, or else an exception will happen. Memory sealing additionally protects the mapping itself against modifications. This is useful to mitigate memory corruption issues where a corrupted pointer is passed to a memory management system... Memory sealing can automatically be applied by the runtime loader to seal .text and .rodata pages and applications can additionally seal security-critical data at runtime. A similar feature already exists in the XNU kernel with the VM_FLAGS_PERMANENT flag and on OpenBSD with the mimmutable syscall."
The mseal system call is designed to be used by the likes of the GNU C Library "glibc" while loading ELF executables to seal non-writable memory segments or by the Google Chrome web browser and other browsers for protecting security sensitive data structures.
The problem with too many threads (Score:2)
What you want are processes that don't share address space except regions you explicitly share through IPC.
Re: The problem with too many threads (Score:1)
In theory, that is already the case, in practice, you need to modify (add/remove) your allocated memory pages as the application demands change dynamically, that information needs to be passed to other processes etc which leads to bugs where a faulty (or malicious) program can modify or read memory outside its bounds, typically if it is an attacker the goal is to read memory or elevate themselves to higher privileges.
All this does is basically put what every antivirus out there (should) do (hey something is
Re: (Score:3)
All this does is basically put what every antivirus out there (should) do (hey something is read/writing out of bounds) in hardware by making the CPU responsible for raising an exception/interrupt whenever this kind of stuff happens.
No.
There's no hardware about this. This is purely software.
This prevents something else running in your address space from altering the permissions of a page, permanently.
Page permissions themselves were always there, and handled by hardware.
Re: (Score:2)
There are, however, still reasons to do this.
While process isolation is a good solution to not require this, it has a large performance cost.
For things that performance is not terribly sensitive on, a standard clone(|~CLONE_VM) [see: fork] will do the job. For high bandwidth and low latency communications between workers, you're going to want that CLONE_VM.
Are we finally at the limits of von Neumann? (Score:2)
Will we go back to Multics protection rings, or even to capability machines (https://en.wikipedia.org/wiki/Capability-based_addressing) like the Intel 432 (https://en.wikipedia.org/wiki/Intel_iAPX_432) and its ilk?
Back in the mid '80s I worked on an Intel/Siemens collaboration to produce a second generation capability architecture, along with operating system, compilers, development environment, etc. The technology in that was well ahead of its time, but the marketing never figured out how to sell it. And
Re: Are we finally at the limits of von Neumann? (Score:1)
The problem with putting everything in hardware is that it becomes rigid and slow, like x86 which is littered with leftovers from bygone eras just so you can boot Windows on a current-gen Intel Xeon Platinum. And Windows is pretty much the only OS still out there that needs it. Intel has seen the writing on the wall and is going the ARM/Power way and booting out old 32-bit crap and Microsoft is following kicking and screaming (eg. eliminating VBscript and attempting to eliminate 32-bit Office plugins, DLL i
Re: (Score:1)
Re: (Score:1)
Re: (Score:2)
Re:uh ok (Score:4, Insightful)
The variable-length instruction set puts a significant wattage cost in the lower-end chips
As does a fixed-length instruction set that forces you to waste bytes you don't need.
This reminds me of the stupid contemporary armchair computer science experts extolling upon us the superiority of RISC, without having enough understanding of it all to even know the label has essentially zero meaning a superscalar CPU.
There was a time when fixed-width instructions and RISC were winners.
Superscalar CPUs happened, much to the surprise of everyone- and then it ceased to matter.
Re: (Score:3)
Fairly dense fixed and double length instruction with the capability of instruction fusing in the pipeline would be a decent compromise. Which has been demonstrated on ARMv8 and RISC-V that I know of. Possibly done on others. And x86-64 of course also performs instruction fusion and can encode a lot of useful instructions in a few bytes.
Re: (Score:2)
Fairly dense fixed and double length instruction with the capability of instruction fusing in the pipeline would be a decent compromise.
Indeed, and as you mention, that pretty much describes Arm and RISC-V.
And x86-64 of course also performs instruction fusion and can encode a lot of useful instructions in a few bytes.
Indeed. I'm fairly certain Intel was the first to do so.
We can debate over the virtues of flexible instruction width vs. fixed, but ultimately, at the end of the day, we can really only speculate, simply because there are too many variables.
What we *can* look at, is that there simply isn't a large delta in performance-per-watt right now in the Arm/x86 space. There was, but it's largely been closed. And rapidly.
With RISC-V.... well, th
Re: uh ok (Score:2)
ARM was seemingly too far behind x86, POWER, SPARC, and MIPS to catch up. But in the last 10 years we learned that if you put a lot of transistors on a chip, add 64-bit and some SIMD instructions then all you need is to have decent cache and memory controllers that it for it to work.
I think for RISC-V to make it in the data center there needs to be a subset of required features that server OSes adopt as a base feature set, and for someone to make really big chips. And it will probably come together more qui
Re: (Score:2)
There is exactly 1 Arm manufacturer on the planet that is competitive with Intel and AMD.
And indeed- all it was *ever* going to take was someone to throw enough transistors and cache at the problem.
RISC-V has caught up to older Arm cores, and at a rapid pace. I have no doubt at all that they'll catch up to Intel and AMD as well. This is without any changes to the architecture even- the market for better chips just grows organically from cheap
Re: (Score:3)
Well, the ability to emulate one instruction set architecture (ISA) on another is old hat. I remember running Virtual PC on my Mac almost 30 years ago, and someone telling me that's how they debugged Windows NT installations. It was much quicker to restart Virtual PC than to wait for the physical PC to reboot after a Blue Screen of Death. Now I suspect someone would just run old Windows stuff in a container (running Wine?)
Re: (Score:3)
Re: (Score:3)
iAPX432 was interesting, but slow. So slow, in fact, that people figured out it was faster to run the programs on the channel processor(s), AKA the 8086.
That's not a claim that it would have remained slow, the protection mechanisms in x86 have seen a decent speedup over time. Only the latest of those speedups have lead to lost correctness and potential for information leakage.
Re: (Score:3)
The Intel 80960, widely used in embedded systems, was a derivative of the 432 without the capability stuff. It's not clear to me the speed problems were related to the capability approach, but I'm not a hardware expert.
Re:Are we finally at the limits of von Neumann? (Score:4, Interesting)
CHERI [cam.ac.uk] is a modern capability architecture/extension that is being worked on a lot by multiple research groups and companies.
Most implementations have been on FPGAs. Codasip has announced that they are working on commercial CHERI/RISC-V chips for safety-critical embedded systems.
There is also an official RISC-V working group working on creating an official memory tagging extension (but I'm unsure about that its goals are...)
The most popular architecture with tagged memory could have been the IBM AS/400 ... and PowerPC has a compatibility mode [devever.net] using the ECC bits as tags, for running binary-translated AS/400 code. I think it requires a trusted compiled and hard code pointer integrity for it to be as safe as CHERI though.
Re: (Score:2)
A wasted effort (Score:2, Informative)
This new mseal() system call is an okay temporary solution.
Long term, I would prefer for systemd to do my memory allocations for me.
Re: (Score:1)
After that, the sky's the limit.
Any chance this leads to hardware changes? (Score:1)
While at university I was taught that there are (or were) two popular memory architectures for computing. One is a shared memory that puts data and executable code in the same memory space, which is an efficient use of space/hardware but has penalties for speed and security. The other method has executed code in one memory area and data in a different memory area, which has an advantage in speed and security but at the cost of transistors and therefore cost in dollars.
In reality few computers are so trivi
Re: (Score:2)
That change doesn't even need a new change on x86-64 systems, just going back to the non-flat model where the CS/DS/SS/ES registers can contain segment selectors instead of being forced to be a single segment. That would enable segmented memory again, allowing a second method of controlling memory access permissions via segment. It'd also allow the stack and heap to grow without running into each other. The capability is there, since the FS and GS segment registers can point to separate segments.
Re: (Score:2)
Why would we need to go back to segmented memory when we have an MMU and page tables?
Re: (Score:2, Informative)
Having separate stacks for code and parameters can help too.
It seems common practice to often pass some parameters on the call stack: https://en.wikipedia.org/wiki/... [wikipedia.org]
This mixing of code and data is bad hygiene:
call stack = code
parameters = data
Separating the stacks will make this harder: https://en.wikipedia.org/wiki/... [wikipedia.org]
It's a lot harder to overwrite the return address on the call stack with your data when that data goes to a different stack (e.g. parameter stack).
It doesn't protect against all exploits bu
Re: (Score:2)
On a per-byte basis, the stack is nearly completely function-local variables (at least for most compiled languages).
Should we have a third stack for that?
I'm not intrinsically against it. There is a performance cost, though. With your separate return pointer stack, you can basically guarantee it'll never be cached when its reached.
Same problem applies if you split parameter and local data stacks.
It seems to me that just not usin
Re: (Score:3)
In reality few computers are so trivial that they fit neatly into either category but for the most part a shared memory system is used. A system that puts in hardware bits to flag certain areas of memory as executable or not would be some compromise between the two systems I just described. Putting in bits that control for read-only, write-only, or read-write, would also add to security. This could be done on the level of an 8-bit byte, 32-bit (or whichever) "word" in the system, or on banks or pages of memory. As this system of memory protection is described it sounds like this is implemented in the operating system level as part of memory management. With some help from hardware with extra bits in the right places that would add to the security, simplify the code, while adding what I expect to be as a trivial extra cost in transistors and dollars.
You literally just described an MMU.
Yes, that is exactly how it works.
It's done in pages. 4KB usually, for x86, though up to 1GB for huge pages.
Re: (Score:2)
Both architectures are alive. Harvard is mostly used in controllers though. I do think that most of the current security problems come from doing things too cheaply. This is obviously even worse on the software side. The first line of defense is the application code itself.
Obligitory T2 (Score:2)
"Modern CPUs support memory permissions such as RW..."
I couldn't help but think of this [youtu.be] from T2..lol.
386 Protected Memory (Score:1)
Re: (Score:2)
Good question. Probably just too many people trying to do things too cheaply. That will do it.
Re: (Score:2)
User-mode processes do not have direct access to PTEs. They must ask the OS to set permissions on their ranges of memory.
The problem with this, of course, is that if your process is compromised, the code that has managed to execute can simply remove the restrictions you have placed (since the OS cannot distinguish you from it)
This lets you tell the OS that you will be making no more requests for permissions changes to a particular range of memory
Re: (Score:2)
No. You fundamentally misunderstand this.
Today, we're in x86-64 long mode, and still protected mode on 32-bit code.
What this does, is allow a process to tell the OS that nothing else coming from its address space should be able to use madvise(), mprotect(), etc, to change the permissions on a page.
I.e., it's a tool for a process to protect itself from itself, or more importantly, things that may have been able to execute code in its address space.
Looks like another step in the arms-race (Score:2)
At some point nobody will understand all the protection mechanisms and when to use them anymore and the attackers will win permanently.
Re: (Score:3)
Rather, it makes what OS calls a process is allowed to make after a certain point.
POSIX tools gives you fun things like madvise() and mprotect().
You can use them to mark regions of your memory RO, for example.
However, someone nefarious can simply unmark them once they've managed to execute code in your address space.
So, now we have mseal(). This lets you tell the OS you plan to make no further changes to the PTEs
Re: (Score:2)
I do understand what this one does. I am not sure how well it will work or rather how widely it will be applicable and how much it actually hinders an attacker. Well, at least the kernel folks are very careful about introducing new mechanisms like this one.
Re: (Score:2)
I've been one of the voices asking for this for a couple of years now.
Mapping of pages as NX and RO is a critical part of the security of a contemporary process. It's all handled by the loader and linker for you.
However, there's always been the asterisk behind it- in that if something is able to run enough code in your process, it can easily undo those mappings, turning a potentially minor and difficult to use exploit into a full compromise of the process.
Soon, you will see l
Re: (Score:2)
It will work fantastically.
I rather doubt that. I have seen too many claims like that for new mechanisms. The only thing interesting is whether it will be broken this year or take a little longer.
Re: (Score:2)
Broken? How could it possibly be broken? All this does is prevent a process from using syscalls that change the permissions on memory ranges.
Will this stop all other attacks from happening? Of course not- it just closes up a hole that exists right now- which is code being able to mark pages that were R-- to RWX. That will not be "breakable", as you can't do that without the OS anyway (user space programs don't have access to PTEs)
Re: (Score:2)
Let's talk again after a few Black Hat conferences have passed.
Re: (Score:2)
Again, I think you're confused about what this is.
It sets a flag for a memory range in the kernel and says, "if the process asks us to change permissions on this, reply with EPERM."
It's like setting a file to immutable, and then trying to call rename() or chmod() on it. It will fail. Nobody wondered "when are people going to figure out how to bypass the immutable flag on files?!?!?!?!"
Re: (Score:2)
I understand what this change does. I just think you are confused as to what that actually accomplishes. And I predict that somebody will find a way to work around this in the near future and hence this is only a minor improvement. Sure, some attack patterns will have to evolve. But likely not that much and attack effort and success chances will not change much. On the minus-sides, some people could mistake this for a major improvement and write even crappier code.
Re: (Score:2)
I understand what this change does.
That's not 100% clear at this juncture.
I just think you are confused as to what that actually accomplishes.
I know exactly what it accomplishes. It closes a design hole in POSIX, one that has been closed by other operating systems for a while.
And I predict that somebody will find a way to work around this in the near future and hence this is only a minor improvement.
There's no working around it, like I said. It's like adding an immutable flag for a file.
The hole here was a design hole, not a mistake hole.
The hole has been plugged.
Analogously speaking, nobody has managed to "work around that the MMU enforces R-- perms on a page.".
They were just able to utilize the POSIX design hole to change the