Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Security Linux

Memory Sealing 'mseal' System Call Merged For Linux 6.10 (phoronix.com) 50

"Merged this Friday evening into the Linux 6.10 kernel is the new mseal() system call for memory sealing," reports Phoronix: The mseal system call was led by Jeff Xu of Google's Chrome team. The goal with memory sealing is to also protect the memory mapping itself against modification. The new mseal Linux documentation explains:

"Modern CPUs support memory permissions such as RW and NX bits. The memory permission feature improves security stance on memory corruption bugs, i.e. the attacker can't just write to arbitrary memory and point the code to it, the memory has to be marked with X bit, or else an exception will happen. Memory sealing additionally protects the mapping itself against modifications. This is useful to mitigate memory corruption issues where a corrupted pointer is passed to a memory management system... Memory sealing can automatically be applied by the runtime loader to seal .text and .rodata pages and applications can additionally seal security-critical data at runtime. A similar feature already exists in the XNU kernel with the VM_FLAGS_PERMANENT flag and on OpenBSD with the mimmutable syscall."

The mseal system call is designed to be used by the likes of the GNU C Library "glibc" while loading ELF executables to seal non-writable memory segments or by the Google Chrome web browser and other browsers for protecting security sensitive data structures.

This discussion has been archived. No new comments can be posted.

Memory Sealing 'mseal' System Call Merged For Linux 6.10

Comments Filter:
  • by Anonymous Coward

    What you want are processes that don't share address space except regions you explicitly share through IPC.

    • In theory, that is already the case, in practice, you need to modify (add/remove) your allocated memory pages as the application demands change dynamically, that information needs to be passed to other processes etc which leads to bugs where a faulty (or malicious) program can modify or read memory outside its bounds, typically if it is an attacker the goal is to read memory or elevate themselves to higher privileges.

      All this does is basically put what every antivirus out there (should) do (hey something is

      • All this does is basically put what every antivirus out there (should) do (hey something is read/writing out of bounds) in hardware by making the CPU responsible for raising an exception/interrupt whenever this kind of stuff happens.

        No.
        There's no hardware about this. This is purely software.
        This prevents something else running in your address space from altering the permissions of a page, permanently.
        Page permissions themselves were always there, and handled by hardware.

    • You are correct, that this exists to solve the problem of lack of trust between threads in the same address space.
      There are, however, still reasons to do this.
      While process isolation is a good solution to not require this, it has a large performance cost.

      For things that performance is not terribly sensitive on, a standard clone(|~CLONE_VM) [see: fork] will do the job. For high bandwidth and low latency communications between workers, you're going to want that CLONE_VM.
  • Will we go back to Multics protection rings, or even to capability machines (https://en.wikipedia.org/wiki/Capability-based_addressing) like the Intel 432 (https://en.wikipedia.org/wiki/Intel_iAPX_432) and its ilk?

    Back in the mid '80s I worked on an Intel/Siemens collaboration to produce a second generation capability architecture, along with operating system, compilers, development environment, etc. The technology in that was well ahead of its time, but the marketing never figured out how to sell it. And

    • The problem with putting everything in hardware is that it becomes rigid and slow, like x86 which is littered with leftovers from bygone eras just so you can boot Windows on a current-gen Intel Xeon Platinum. And Windows is pretty much the only OS still out there that needs it. Intel has seen the writing on the wall and is going the ARM/Power way and booting out old 32-bit crap and Microsoft is following kicking and screaming (eg. eliminating VBscript and attempting to eliminate 32-bit Office plugins, DLL i

      • This post can only be responded to with "lolwut?"
    • Long term? I wouldn't be surprised if we see capability pointers come back. Particularly since we've got so many extra transistors to play with these days.
    • by sjames ( 1099 )

      iAPX432 was interesting, but slow. So slow, in fact, that people figured out it was faster to run the programs on the channel processor(s), AKA the 8086.

      That's not a claim that it would have remained slow, the protection mechanisms in x86 have seen a decent speedup over time. Only the latest of those speedups have lead to lost correctness and potential for information leakage.

      • The Intel 80960, widely used in embedded systems, was a derivative of the 432 without the capability stuff. It's not clear to me the speed problems were related to the capability approach, but I'm not a hardware expert.

    • by Misagon ( 1135 ) on Monday May 27, 2024 @05:31AM (#64502071)

      CHERI [cam.ac.uk] is a modern capability architecture/extension that is being worked on a lot by multiple research groups and companies.
      Most implementations have been on FPGAs. Codasip has announced that they are working on commercial CHERI/RISC-V chips for safety-critical embedded systems.

      There is also an official RISC-V working group working on creating an official memory tagging extension (but I'm unsure about that its goals are...)

      The most popular architecture with tagged memory could have been the IBM AS/400 ... and PowerPC has a compatibility mode [devever.net] using the ECC bits as tags, for running binary-translated AS/400 code. I think it requires a trusted compiled and hard code pointer integrity for it to be as safe as CHERI though.

  • A wasted effort (Score:2, Informative)

    by Anonymous Coward

    This new mseal() system call is an okay temporary solution.

    Long term, I would prefer for systemd to do my memory allocations for me.

    • It's only a matter of time before it takes over the job of libc's ELF loader.
      After that, the sky's the limit.
  • While at university I was taught that there are (or were) two popular memory architectures for computing. One is a shared memory that puts data and executable code in the same memory space, which is an efficient use of space/hardware but has penalties for speed and security. The other method has executed code in one memory area and data in a different memory area, which has an advantage in speed and security but at the cost of transistors and therefore cost in dollars.

    In reality few computers are so trivi

    • That change doesn't even need a new change on x86-64 systems, just going back to the non-flat model where the CS/DS/SS/ES registers can contain segment selectors instead of being forced to be a single segment. That would enable segmented memory again, allowing a second method of controlling memory access permissions via segment. It'd also allow the stack and heap to grow without running into each other. The capability is there, since the FS and GS segment registers can point to separate segments.

      • Keeping your stack and heap from growing into each other is quite easy. You don't need segment registers, you put them on different pages with an unmapped guard page between them.

        Why would we need to go back to segmented memory when we have an MMU and page tables?
        • Re: (Score:2, Informative)

          by Anonymous Coward

          Having separate stacks for code and parameters can help too.

          It seems common practice to often pass some parameters on the call stack: https://en.wikipedia.org/wiki/... [wikipedia.org]

          This mixing of code and data is bad hygiene:
          call stack = code
          parameters = data

          Separating the stacks will make this harder: https://en.wikipedia.org/wiki/... [wikipedia.org]

          It's a lot harder to overwrite the return address on the call stack with your data when that data goes to a different stack (e.g. parameter stack).

          It doesn't protect against all exploits bu

          • Stack is used for more than just parameters and return pointers, though.

            On a per-byte basis, the stack is nearly completely function-local variables (at least for most compiled languages).
            Should we have a third stack for that?

            I'm not intrinsically against it. There is a performance cost, though. With your separate return pointer stack, you can basically guarantee it'll never be cached when its reached.
            Same problem applies if you split parameter and local data stacks.

            It seems to me that just not usin
    • In reality few computers are so trivial that they fit neatly into either category but for the most part a shared memory system is used. A system that puts in hardware bits to flag certain areas of memory as executable or not would be some compromise between the two systems I just described. Putting in bits that control for read-only, write-only, or read-write, would also add to security. This could be done on the level of an 8-bit byte, 32-bit (or whichever) "word" in the system, or on banks or pages of memory. As this system of memory protection is described it sounds like this is implemented in the operating system level as part of memory management. With some help from hardware with extra bits in the right places that would add to the security, simplify the code, while adding what I expect to be as a trivial extra cost in transistors and dollars.

      You literally just described an MMU.
      Yes, that is exactly how it works.
      It's done in pages. 4KB usually, for x86, though up to 1GB for huge pages.

    • by gweihir ( 88907 )

      Both architectures are alive. Harvard is mostly used in controllers though. I do think that most of the current security problems come from doing things too cheaply. This is obviously even worse on the software side. The first line of defense is the application code itself.

  • "Modern CPUs support memory permissions such as RW..."

    I couldn't help but think of this [youtu.be] from T2..lol.

  • I don't understand why we reinvent protected mode introduced by the i386 back in 1986. Code memory block cannot be written (except by the OS). Data memory block cannot be executed. Where did this all go the past near-40 years ??
    • by gweihir ( 88907 )

      Good question. Probably just too many people trying to do things too cheaply. That will do it.

      • Too many people misunderstanding what things do, I'd say.

        User-mode processes do not have direct access to PTEs. They must ask the OS to set permissions on their ranges of memory.
        The problem with this, of course, is that if your process is compromised, the code that has managed to execute can simply remove the restrictions you have placed (since the OS cannot distinguish you from it)

        This lets you tell the OS that you will be making no more requests for permissions changes to a particular range of memory
    • What?

      No. You fundamentally misunderstand this.
      Today, we're in x86-64 long mode, and still protected mode on 32-bit code.

      What this does, is allow a process to tell the OS that nothing else coming from its address space should be able to use madvise(), mprotect(), etc, to change the permissions on a page.
      I.e., it's a tool for a process to protect itself from itself, or more importantly, things that may have been able to execute code in its address space.
  • At some point nobody will understand all the protection mechanisms and when to use them anymore and the attackers will win permanently.

    • This isn't that complicated. It has nothing to do with physical protection of the memory.
      Rather, it makes what OS calls a process is allowed to make after a certain point.

      POSIX tools gives you fun things like madvise() and mprotect().
      You can use them to mark regions of your memory RO, for example.

      However, someone nefarious can simply unmark them once they've managed to execute code in your address space.
      So, now we have mseal(). This lets you tell the OS you plan to make no further changes to the PTEs
      • by gweihir ( 88907 )

        I do understand what this one does. I am not sure how well it will work or rather how widely it will be applicable and how much it actually hinders an attacker. Well, at least the kernel folks are very careful about introducing new mechanisms like this one.

        • It will work fantastically.
          I've been one of the voices asking for this for a couple of years now.

          Mapping of pages as NX and RO is a critical part of the security of a contemporary process. It's all handled by the loader and linker for you.
          However, there's always been the asterisk behind it- in that if something is able to run enough code in your process, it can easily undo those mappings, turning a potentially minor and difficult to use exploit into a full compromise of the process.
          Soon, you will see l
          • by gweihir ( 88907 )

            It will work fantastically.

            I rather doubt that. I have seen too many claims like that for new mechanisms. The only thing interesting is whether it will be broken this year or take a little longer.

            • lol.
              Broken? How could it possibly be broken? All this does is prevent a process from using syscalls that change the permissions on memory ranges.

              Will this stop all other attacks from happening? Of course not- it just closes up a hole that exists right now- which is code being able to mark pages that were R-- to RWX. That will not be "breakable", as you can't do that without the OS anyway (user space programs don't have access to PTEs)
              • by gweihir ( 88907 )

                Let's talk again after a few Black Hat conferences have passed.

                • Gladly, lol.

                  Again, I think you're confused about what this is.
                  It sets a flag for a memory range in the kernel and says, "if the process asks us to change permissions on this, reply with EPERM."
                  It's like setting a file to immutable, and then trying to call rename() or chmod() on it. It will fail. Nobody wondered "when are people going to figure out how to bypass the immutable flag on files?!?!?!?!"
                  • by gweihir ( 88907 )

                    I understand what this change does. I just think you are confused as to what that actually accomplishes. And I predict that somebody will find a way to work around this in the near future and hence this is only a minor improvement. Sure, some attack patterns will have to evolve. But likely not that much and attack effort and success chances will not change much. On the minus-sides, some people could mistake this for a major improvement and write even crappier code.

                    • I understand what this change does.

                      That's not 100% clear at this juncture.

                      I just think you are confused as to what that actually accomplishes.

                      I know exactly what it accomplishes. It closes a design hole in POSIX, one that has been closed by other operating systems for a while.

                      And I predict that somebody will find a way to work around this in the near future and hence this is only a minor improvement.

                      There's no working around it, like I said. It's like adding an immutable flag for a file.
                      The hole here was a design hole, not a mistake hole.
                      The hole has been plugged.
                      Analogously speaking, nobody has managed to "work around that the MMU enforces R-- perms on a page.".
                      They were just able to utilize the POSIX design hole to change the

Genius is ten percent inspiration and fifty percent capital gains.

Working...