Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Math Programming Linux

Linux Random Number Generator Sees Major Improvements (phoronix.com) 80

An anonymous Slashdot reader summarizes some important news from the web page of Jason Donenfeld (creator of the open-source VPN protocol WireGuard): The Linux kernel's random number generator has seen its first set of major improvements in over a decade, improving everything from the cryptography to the interface used. Not only does it finally retire SHA-1 in favor of BLAKE2s [in Linux kernel 5.17], but it also at long last unites '/dev/random' and '/dev/urandom' [in the upcoming Linux kernel 5.18], finally ending years of Slashdot banter and debate:

The most significant outward-facing change is that /dev/random and /dev/urandom are now exactly the same thing, with no differences between them at all, thanks to their unification in random: block in /dev/urandom. This removes a significant age-old crypto footgun, already accomplished by other operating systems eons ago. [...] The upshot is that every Internet message board disagreement on /dev/random versus /dev/urandom has now been resolved by making everybody simultaneously right! Now, for the first time, these are both the right choice to make, in addition to getrandom(0); they all return the same bytes with the same semantics. There are only right choices.

Phoronix adds: One exciting change to also note is the getrandom() system call may be a hell of a lot faster with the new kernel. The getrandom() call for obtaining random bytes is yielding much faster performance with the latest code in development. Intel's kernel test robot is seeing an 8450% improvement with the stress-ng getrandom() benchmark. Yes, an 8450% improvement.
This discussion has been archived. No new comments can be posted.

Linux Random Number Generator Sees Major Improvements

Comments Filter:
  • by Registered Coward v2 ( 447531 ) on Sunday March 20, 2022 @07:18AM (#62373823)

    The upshot is that every Internet message board disagreement on /dev/random versus /dev/urandom has now been resolved

    The argument will shift to why there should be 2 different versions. Never underestimate the ability of the internet to continue an argument unabated.

    • The argument will shift to why there should be 2 different versions.

      The argument has now shifted to the fact that there will be only one version, with two names. Probably the node numbers will be the same, we can hope anyway.

    • by gweihir ( 88907 )

      There is no argument that there should be two versions. It is completely clear that there should be. The problem is that they now may have broken the CPRNG by making a 3rd version that quite possibly has bad seeding but will not tell the client software about it and then they removed the two existing versions. Really bad security engineering IMO.

      • There is no argument that there should be two versions. It is completely clear that there should be. The problem is that they now may have broken the CPRNG by making a 3rd version that quite possibly has bad seeding but will not tell the client software about it and then they removed the two existing versions. Really bad security engineering IMO.

        Thank you for proving my point.

        • by gweihir ( 88907 )

          Thank you for proving my point.

          You do not seem to understand the subject matter. Well, anyways. This is very little surprise. Most of the arguments by people that have a clue what this is about are not on whether there should be two versions, they are about when to use which one. You probably have not followed that discussion at all and now try to appear smart by throwing in a meaningless generality.

          • Thank you for proving my point.

            You do not seem to understand the subject matter.

            You seem not to understand humor.

            • by gweihir ( 88907 )

              Thank you for proving my point.

              You do not seem to understand the subject matter.

              You seem not to understand humor.

              Well, when it is indistinguishable from being condescending, no.

    • The argument will now shift to WTF have they broken /dev/urandom to make it bug-identical to /dev/random? For pretty much forever non-Linux systems like OpenBSD have had "high quality pseudo-random output data without ever blocking" (from the manpage), while with Linux after years and years and years of debate the solution is... to break urandom so it has the same problem as random? WTF?
  • I see lots of talk about performance, but there are stories of weak RNG leading to flawed crypto. How does security of the new RNG compare to the old implementation?
    • That would be covered by the words "...improving everything from the cryptography to..." in the third line.

      • It's very important to start with a good amount of entropy at the start though, and that can be very tricky especially on boot up. That's not important for most linux, but for embedded systems that need security trying to get that initial entropy is hard. Ie, you need the random number seeded before you can talk on the network, you can't use time of day as part of it because you haven't authenticated with the time server yet... So really, it's good to have a hardware randomizer, even if slowish, just for

    • by gweihir ( 88907 )

      I would expect the RNG is ok, but the new seeding seems to be somewhere between "hopeful" and "wishful thinking". Especially the "cycle jitter" thing seems like an accident waiting to happen.

      • by kriston ( 7886 )

        I use HAVEGED [issihosts.com] on my virtual machines. The current implementation seems to seed well on virtual machines.

  • by Anonymous Coward

    Now, for the first time, these are both the right choice to make, in addition to getrandom(0); they all return the same bytes with the same semantics.

    Not very random if they're always returning the same number. :) At least they'll be equally broken!

  • by OzPeter ( 195038 ) on Sunday March 20, 2022 @07:58AM (#62373899)

    I'd rather see the BLAKEs7 RNG.

  • > they all return the same bytes

    Doesn't sound very random to me.

  • by sinij ( 911942 ) on Sunday March 20, 2022 @08:52AM (#62373977)
    These changes, especially switch to BLAKE2s [kernel.org], all but guarantees that Linux would not be able to get NIST certified (and consequently adopted in government, healthcare, financial applications that require certification). So good job giving even more reasons to Red Hat to completely fork the kernel in RHEL 9.
    • Overlays (Score:4, Interesting)

      by JBMcB ( 73720 ) on Sunday March 20, 2022 @10:47AM (#62374251)

      There has been, as far as I've been compiling the kernel which goes back to the boxed copies of Redhat, patch overlays for various use cases, including at least two or three hardened kernel overlays, and a few for various under-supported architectures. I'd imagine a certified overlay would be in the works. Forking is not required, especially, IIRC, the RNG is now modularized.

      • by sinij ( 911942 )

        Forking is not required, especially, IIRC, the RNG is now modularized.

        Good to know.

    • by AmiMoJo ( 196126 )

      Probably shouldn't let NIST certification keep users on old, broken algorithms. If NIST can't keep up and is required for US government use, the US government has a serious problem.

      • by sinij ( 911942 )

        Probably shouldn't let NIST certification keep users on old, broken algorithms.

        They don't. SHA1 has issue with collision attacks, categorically different use case in random.c as part of LPRNG mixing function.

        • by Agripa ( 139780 )

          Probably shouldn't let NIST certification keep users on old, broken algorithms.

          They don't. SHA1 has issue with collision attacks, categorically different use case in random.c as part of LPRNG mixing function.

          After what NIST did, they cannot be trusted certifying any security system.

          https://en.wikipedia.org/wiki/... [wikipedia.org]

  • " they all return the same bytes with the same semantics"

    I hope this was a mistake because if every call to getrandom(0) returns the same bytes, well, thats the defintion of a constant.

    • Seemed pretty clear to me that they mean that you're next random number will be the same regardless of whether you read from dev/random dev/urandom, or call getrandom(0)

      A.k.a. they are now all aliases to the same function call, and you can use any of them to get the exact same results.

      • by Entrope ( 68843 )

        Yes, I think you correctly described the meaning. But there's always going to be somebody who thinks like https://xkcd.com/221/ [xkcd.com] .

        • You're right - there *always* will be. So why waste time and energy trying to avoid the inevitable?

          a.k.a.: Never argue with a fool - people won't be able to tell the difference.

    • Sure, it makes the security people upset, but the testers are more than overjoyed at the consistency and that the tests never fail no matter how often they run them!

  • by WaffleMonster ( 969671 ) on Sunday March 20, 2022 @09:13AM (#62374035)

    Replace the hash algorithm with one nobody uses or has ever heard of because speed. Would literally rather have it switched to MD5 HMAC than BLAKE.

    Promote removal of blocking calls. If you don't care about speed and want to wait for high quality non-diluted randoms too bad so sad.

    • Replace the hash algorithm with one nobody uses or has ever heard of because speed. Would literally rather have it switched to MD5 HMAC than BLAKE.

      1) Just because you have never heard of it does not meant that "nobody" has. 2) For the purposes in the kernel, they need a fast algorithm. This is why SHA-256 nor SHA-512 were chosen. 3) No one is preventing you from using whatever hash function you want on your own Linux install.

      • 1) Just because you have never heard of it does not meant that "nobody" has.

        "nobody" is not intended literally. The point is that BLAKE is not widely used or known. One can expect a commensurately non-existent effort for formal approval or testing of unused algorithms in the real world. Compared with established hash algorithms where substantial effort has been invested to approve, test and compromise them over many years.

        2) For the purposes in the kernel, they need a fast algorithm. This is why SHA-256 nor SHA-512 were chosen.

        I disagree, there is no valid performance reason for this and no significant performance difference between SHA-1 and SHA-256 algorithms. Anyone who needs sup

        • The point is that BLAKE is not widely used or known.

          Just because you don't see it in Slashdot stores doesn't make it not widely known. It very nearly became the standard algorithm for SHA-3 making to the final round of selection at NIST. Let cryptographic experts do cryptographic stuff, and fight a different battle from your armchair.

          I disagree, there is no valid performance reason for this

          Somehow you seem to have missed the decades of arguments around /dev/random and /dev/urandom if you think performance is not relevant.

          The algorithm is not user selectable. Idea of users custom patching kernels is obviously a nonstarter.

          Since when is the idea of custom patching kernels a non-starter? Have you ever heard of a thin

        • "nobody" is not intended literally. The point is that BLAKE is not widely used or known.

          By you. That is the point. According to wikipedia [wikipedia.org] "BLAKE was submitted to the NIST hash function competition by Jean-Philippe Aumasson, Luca Henzen, Willi Meier, and Raphael C.-W. Phan. In 2008, there were 51 entries. BLAKE made it to the final round consisting of five candidates but lost to Keccak in 2012, which was selected for the SHA-3 algorithm." Certainly people knew about it especially in cryptographic circles. Just not you.

          One can expect a commensurately non-existent effort for formal approval or testing of unused algorithms in the real world.

          BLAKE made it the final round of 5 for NIST testing to replace SHA-1 and SHA

  • Random numbers made EZ:

    1) Retrieve the front page of a dozen frequently-updated sites, ads and all (e.g. CNN, Imgur, Reddit, Yahoo News, etc)

    2) Concatenate the HTML of the pages into a string and strip out all the non-numeric characters. You'll end up with a bloody long string of numbers and it'll be a wildly different string each time.

    3) Pick the 71st number in that result. That character will be pretty fucking random.

    • by ELCouz ( 1338259 )
      interesting and also hard to attack. Just dependent on Internet though.
      • Yes, it's got to have some sort of access to one or more sources of ever-changing data that can't be predicted.

        If you don't have network access you could do it with the over-the-air RDS output from commercial radio stations or the NOAA Weather Radio All Hazards network, both of which have non-stop streams of constantly-varying data. (You could even do it with the random static between AM/FM stations as long as you can get the static into a digital format of some sort.)

        For web pages, on every access the tags

    • by aegl ( 1041528 )

      Random numbers made EZ:

      1) Retrieve the front page of a dozen frequently-updated sites, ads and all (e.g. CNN, Imgur, Reddit, Yahoo News, etc)

      2) Concatenate the HTML of the pages into a string and strip out all the non-numeric characters. You'll end up with a bloody long string of numbers and it'll be a wildly different string each time.

      3) Pick the 71st number in that result. That character will be pretty fucking random.

      Sounds appalling. Concatenation? Likely find the 71st number is always from the first of those sites, so no extra entropy from the other 11. Basing your random numbers on a source that an adversary can also read? Also very bad.

      • Sounds appalling. Concatenation? Likely find the 71st number is always from the first of those sites,

        You're correct- I missed a step of jumbling the pages and/or the string- my bad.

        So, mix the pages or the string up in some ridiculous way, divide by your dog's birthday or the number of cups of coffee you had in the last year, it doesn't matter. Convert it all to Italian and then to Vietnamese. It doesn't matter, just do some transform pointless on the string to bork it up. .

        Basing your random numbers on a source that an adversary can also read? Also very bad.

        But you can't read it- what YOU read will be a very different page from mine under the hood- it'll have different ads, different trac

        • Comment removed based on user account deletion
          • You do know this'll make your random number algorithm will be dependent upon another random number algorithm?

            So it'll be double randomized and therefore gooder. lol

            But your post brings up a deeper, more interesting question: is anything really random?

            I've had to come to the unfortunate conclusion that no matter how deep you look, there really is nothing that's actually random.

    • I HOPE you're kidding. In case you're not:

      Doing that, you're more than 5 times as likely to send up with "3" than "8". It's very not random.

      These page has 1910 of the digit "2". 755 of "5".
      The news sites will probably have a slightly higher concentrations of "2" because it's 2022. Actually, let's see the distribution:

      CNN.com
      11464 0
      6877 1
      4581 2
      4555 3
      2997 4
      2784 5

      • Doing that, you're more than 5 times as likely to send up with "3" than "8".

        But those are two of my favorite numbers.

      • Yes, zero is the most common digit to appear in a string of numbers, even in binary.

      • by kmoser ( 1469707 )
        Basically Benford's Law [wikipedia.org]
        • Benford's tells us that actual numbers that occur in the world have non-random digits. So that alone would be enough to make this non-random.

          On top of that, there's writing, which uses digits in non-random ways.

          It's ALSO the case that on news sites, a headline would say "Man bites dog", or "man bites 2 dogs" or "man bites 3 dogs". It would never say "1 man bites 1 dog".

          General popular news is more likely to say "almost 1/4 of ..." than to say "almost 1/9 of ..."

        • Ignore raymorris; listen to Donald Knuth: "Random numbers should not be generated with a method chosen at random."

          See Knuth's "Algorithm K" for illustration. You can find it on this page [informit.com].

          Pretty sure JustAnotherOldGuy was being facetious. It's hard to imagine the kernel loading 12 webpages every time it needs a random number.

      • I really appreciate that you actually checked this. Thanks for the high-effort post!
  • by gweihir ( 88907 ) on Sunday March 20, 2022 @10:02AM (#62374141)

    This seems like a really, really bad idea. The reason for the discussions were not that the design was bad. The reasons were that there are two ways to do it:
    1. Give a lot of pseudo-randomness even if seeding is not yet good (old /dev/urandom)
    2. Stop until seeding is good (old /dev/random)

    Using 1. can result in all sorts of security catastrophes, like bad SSH keys, bad link encryption, bad disk encryption, etc., hence it is a really bad idea in many situations. Using 2. can block for a long, long time. But it should do that. Now behavior 2. has become the norm, which is probably overall a good thing, _but_ to "fix" the blocking, the "cycle counter jitter" mechanism is used. This mechanism seems more than doubtful to me, because it relies on the CPU behaving non-deterministically during code execution. As most CPUs are mostly fully deterministic and you only get a small amount of variation by reactions to thermal changes, this seems to be an elaborate way to lie to yourself and potentially create very bad seeding. In addition, there _are_ CPUs that behave completely deterministic for this and hence we now have /dev/(u)random seeding being insecure on some platforms or behaving in the old way and being maybe (or maybe not) secure on AMD64 and some other platforms. This seems really bad design to me and is more an accident waiting to happen than a solution for anything.

    Now, there is hope that most distros fix this somehow by boot-up seeding and gathering entropy in a different additional way on installation. But not all will. And some "clever" developers may even disable this. And there are people that will copy VMs (a bad situation for gathering entropy in the first place) and then do nothing to fix this bad design and hence end up with the same or very similar keys generated on the copies. There is some "hopeful engineering" that desperately hopes that when a VM gets copied the CPRNG will be reseeded, apparently from a call from the VM hypervisor, but that requires all hypervisors doing it right.

    Now, the designer behind this seems to have though of everything, which is good. But the solutions to the questions that need to be asked sound highly questionable to me and things that a developer would do, but very much not things a security expert would do. Real-world impact is pretty unclear. At least on AMD64, RDRAND is probably still used as "whitener", which will at least fix things to some degree, _but_ RDRAND is a compromised design on Intel and RDRAND is broken on a large number of AMD CPUs, so that "fix" is doubtful as well.

    It think from Kernel 5.17 on, /dev/(u)random needs to be regarded as likely insecure and other entropy gathering needs to be employed in addition if you want to assure security. This is a huge step back.

    Also, /dev/urandom was changed a while ago to be much faster, so I do not think this is "the first major change in 10 years".

    • "cycle count jitter" does sounds like shit

      If people can attack roulette wheels then they can attack "cycle count jitter" also.

      Therefore it only pretends to address the concerns of those worried about this rng's internal state being somewhat predictable.

      Anyone who thinks I am not on track here, simply enlighten us as to the metric being used to measure this internal jitters quality as an unpredictable source of noise. Please point us to a dataset of these measurements taken across processor families, s
      • by gweihir ( 88907 )

        "cycle count jitter" does sounds like shit

        It probably is and it probably is especially bad in situations were you are entropy starved already. Like a small embedded device (e.g. router) that needs to generate long-term crypto-keys on first startup. This has gone wrong drastically before: https://factorable.net/weakkey... [factorable.net]
        I guess Linus does not read papers like that one.

        Well. It seems some mistakes need to be made a lot of times before people learn. Bad CPRNG seeding is really a very old problem.

        • Yup, and some developers in those situations, having a partial knowledge of cryptography and/or physics and/or math can come up with really terrible ideas. Always get the crypto guys involved in a secure system from day one. But more and more chips have built in true RNGs now based on physical principles, so those should be chosen when there's an option.

          • by gweihir ( 88907 )

            Indeed. But some HW generators have a compromised design, e.g. Intel RDRAND. We not only need to have good HW solutions, but to get them we also need to keep assholes doing politics out of engineering.

    • It think from Kernel 5.17 on, /dev/(u)random needs to be regarded as likely insecure and other entropy gathering needs to be employed in addition if you want to assure security. This is a huge step back. . . Also, /dev/urandom was changed a while ago to be much faster, so I do not think this is "the first major change in 10 years".

      /dev/urandom was changing in 2016 to use Daniel Bernstein's ChaCha20 CPRNG to be more secure. The new /dev/random uses BLAKES2 which is based on BLAKE which is based on Daniel Bernstein's ChaCha cipher.

    • There are hardware random number generators that do appear in some SoCs. These should be standardized in security modules so that every PC can assume it exists and there's more pressure to include in smaller chips as well. Hardware doesn't mean it's fast but it's fast enough for seeding during a boot up.

      • by gweihir ( 88907 )

        There are hardware random number generators that do appear in some SoCs. These should be standardized in security modules so that every PC can assume it exists and there's more pressure to include in smaller chips as well. Hardware doesn't mean it's fast but it's fast enough for seeding during a boot up.

        Indeed. Unfortunately, some people let themselves be coerced and Intel RDRAND is a compromised design, i.e. a design you cannot and should not trust because there is not practical way to verify it works as advertised. On the other hand, the far simpler VIA C3 generator is not a compromised design and still delivers >10kbit of entropy per second even if used very conservatively. The only problem getting this fixed now is to keep assholes doing politics out of engineering. That apparently may still take a

  • I tried to use C GLIBC rand() pseudorandom number generator and Intel Core i5 CPU TRNG RANDR asm instruction.

    They both gave numbers which basic statistics are correct (mean and variance) but time-series is wrong and have numerology problems and other wrong numbers. For example, srand(seed) gave different results using same seed.

    Do you know how to fix this?
    Intel support did not want to do anything.
    • >Intel Core i5 CPU TRNG RANDR
      Do you mean RdRand/RdSeed?

      >but time-series is wrong and have numerology problems and other wrong numbers
      Please explain
      What's a numerology problem?
      RdRand is reseeded every 0.5us or so.
      RdSeed is reseeded every number.

      >Intel support did not want to do anything.
      Well ask here. You're unlikely to get an RNG expert on the support portal. The only sometimes they manage to route questions to me.

    • Srand() is ancient, if this is a POSIX system. It is deprecated for eons. A lot of compilers or static analysis tool will warn if srand/rand are being used and tell you to use srandom/random.

      What's funny, and annoyinig, is that early on the devs on a system I am working on used rand() as the API, with a secure random generator underneath it, seeded by a hardware random number generator. But rand() is used in soooo many places in the code that no one's gone and replaced them all, and when I suggest it then

  • The Linux kernel RNG is a stain on the world of RNGs

    It has multiple pools for no reason - check
    Counting entropy bits wrong even with the new scheme - check
    No OHTs - check
    Tries to use interrupt timing as an entropy source, so double counting multiple sources - check
    Fails to provide a full entropy service - check
    Blocks even when the platform provides a fast entropy source (most do these days) - check
    Does a silly race-hazard infected dance at a vm fork when the CPU solved it properly in HW over 11 years ago -

  • Which one used to spit out random bytes when you moved the mouse? It was great when old installers would make you do it to increase entropy. Will that still be possible? Or did they replace it with the OTHER random behavior?
  • Not only does it finally retire SHA-1 in favor of BLAKE2s ...

    Get back to me when they're up to Blakes7 [wikipedia.org]. :-)

Money can't buy love, but it improves your bargaining position. -- Christopher Marlowe

Working...