Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Bug Software Linux

Linux Developers Consider On-Screen QR Codes For Kernel Panics 175

An anonymous reader writes "Linux kernel developers are currently evaluating the possibility of using QR codes to display kernel oops/panic messages. Right now a lot of text is dumped to the screen when a kernel oops occurs, most of which isn't easily archivable by normal Linux end-users. With QR codes as Linux oops messages, a smart-phone could capture the display and either report the error string or redirect them to an error page on Kernel.org. The idea of using QR codes within the Linux kernel is still being discussed by upstream developers."
This discussion has been archived. No new comments can be posted.

Linux Developers Consider On-Screen QR Codes For Kernel Panics

Comments Filter:
  • Good idea (Score:5, Insightful)

    by Primate Pete ( 2773471 ) on Saturday April 05, 2014 @07:24PM (#46672895)
    I'm not sure how hard it would be to pull this off in practice, but kudos to the team for improving (or at least thinking about) better usability from the kernel out.
    • Re: (Score:2, Insightful)

      by Anonymous Coward

      how soon until someone accidentally posts a QR code containing confidential information, since they cannot read it themselves.

      • Re:Good idea (Score:5, Insightful)

        by Kjella ( 173770 ) on Saturday April 05, 2014 @07:42PM (#46672999) Homepage

        Very unlikely.. the information in a QR code is probably just enough to say "I run kernel X (build Y) and it crashed with error code Z at instruction 12345 in module 123", if it was a kernel dump that's different but I have seen these without the QR codes and there's nothing sensitive there.

        • Re:Good idea (Score:4, Informative)

          by Zocalo ( 252965 ) on Saturday April 05, 2014 @07:59PM (#46673081) Homepage
          It might actually be more than that. Worst case, the screen in in 80x25 text mode (assuming a PC), which gives 2,000 binary bits, but if you start playing around with extended ASCII graphics characters you could probably encode a KB of data quite easily. Hardly a crash dump, but easily enough to get across the essentials.
          • Re: (Score:2, Interesting)

            by Anonymous Coward

            You just have to reprogram the VGA font table with 2 wide by 4 high bitmaps (because you can fit 256 such glyphs into the standard vga font table), and you now have 16000 pixels to work with instead of 2000, a bitmap display with a 160x100 pixel resolution; VGA text mode is 640x400 pixels, and each virtual pixel is 4x4 screen pixels, the standard VGA font is 8x16 pixels.

            BTW, you can't encode 2000 bits into a QR code with a 2000 bit bitmap, as it has parity and spatial clock recovery built in to the code.

            Sin

          • Re: (Score:3, Insightful)

            by rnturn ( 11092 )

            ``Hardly a crash dump, but easily enough to get across the essentials.''

            Here's a crazy idea: instead of working on displaying cutesy graphics images that need to be decoded using a smart phone and a web site, what about actually generating a freakin' crash dump? Is there a technical reason that Linux is unable to do this? If crash dumps are really not possible, how about a plain 'ol text file in the root directory containing the reason for the crash/panic?

            • Just from my own experience the only kernel panic I've ever encountered was due to a failed SATA controller. But conceivably bugs in new beta file systems (you jumped on the brtfs bandwagon yet?), hardddisk failure, SATA failure, or failure in the PCI controller.

              Mind you that's not a reason not to do it, just that there are times when it's not useful. The same machine with the dead SATA controller also had Windows on it which managed to bluescreen without creating a crash dump.

              • Just from my own experience the only kernel panic I've ever encountered was due to a failed SATA controller.

                In recent times I have also had a kernel panic due to a failed nVidia driver.

                • I have a couple of bad micro SD cards. Put one into a card reader on a Linux machine, try to read some of the files, and presto magnifico, kernel panic every time. Windows handles this corruption a lot more elegantly, incidentally, but the hardware is toast either way.
            • Re:Good idea (Score:5, Insightful)

              by Pinhedd ( 1661735 ) on Sunday April 06, 2014 @04:51AM (#46674649)

              Kernel crashes occur when the kernel enters an inconsistent or invalid state from which it cannot recover.

              When a user program fails, the kernel maintains consistency, can cleanly terminate the process, and can accurately report the cause of the failure if need be (illegal instruction, deadlock, access violation, etc...).

              When a kernel fails the very systems that it relies on to report failures may very well be compromised by whatever caused the kernel to fail in the first place. As such, any kernel fault reporting needs to be incredibly robust and as independent of other kernel mechanisms as possible. Dumping text to a serial terminal is the preferred method because it's incredibly simple and relies on nothing else, meaning that barring a failure of the system memory it should always act as a reliable fallback.

              Dumping kernel memory to a disk might fail if the state of the file system is compromised, if the storage controller is compromised, or if any number of intermediary systems are compromised by the inconsistent state of the kernel. Many operating systems do attempt to dump crash memory to the swap file / swap partition as this is less likely to cause data corruption than writing to a particular file in the file system.

              It "can" be done, but that does not necessarily make it a good idea.

            • Re:Good idea (Score:4, Insightful)

              by AmiMoJo ( 196126 ) * on Sunday April 06, 2014 @06:54AM (#46674981) Homepage Journal

              You usually don't want to write to the filesystem in the event of a kernel panic. It could make things worse and corrupt it. Once you kernel panic you are basically screwed and can't rely on any services beyond really low level BIOS stuff to work. Poking some text to the screen buffer is about it.

              Windows does core dumps using a specially reserved area of the boot drive and using low level boot driver calls. It can still fail but at least has a fairly low probability of damaging the filesystem further. I suppose Linux could maybe dump to the swap partition or something.

            • Re:Good idea (Score:4, Insightful)

              by Lemming Mark ( 849014 ) on Sunday April 06, 2014 @07:35AM (#46675123) Homepage

              As AmiMoJo also noted, when you have a kernel panic all bets are off regarding which parts of the kernel are OK. If the behaviour of the disk driver or filesystem have been affected, it could damage your filesystem to try to write a kernel dump into a normal disk partition. It might work but it does seem a good idea to be properly paranoid. I didn't know that Windows uses a special reserved area of the boot drive - that does make sense as a solution!

              There have been various systems for crash dumping under Linux, though. I think the de-facto solution (the one that was accepted by the kernel devs) ended up being kdump, which is based on kexec (kexec is "boot directly to a new kernel from an old kernel, without a reboot"). This allows full crash dumps with (hopefully) decent safety, so it is possible to do this if configured.

              In kdump, you have a "spare" kernel loaded in some reserved memory and waiting to execute. When the primary kernel panics it will (if possible) begin executing the dump kernel, which is (hopefully) able to reinitialise the hardware and filesystem drivers, then write out the rest of memory to disk. I'm not sure how protected kdump's kernel is from whatever trashed the "main" kernel but there are things that would help - for instance, if they map its memory read only (or even keep it unmapped) so that somebody's buffer overflow can't just scribble on it during the crash.

              Obviously, having a full kernel available to do the crashdump makes it easier to do other clever tricks, in principle - such as writing the dump out to a server on the network. That's not new, in that there used to be a kernel patch allowing a panicked kernel directly to write out a dump to network, it just seems easier to do it the kdump way, with a whole fresh kernel. Having a fully-working kernel, rather than one which is trying to restrict its behaviour, means you can rely on more kernel services - and probably just write your dumper code as a userspace program! Having just installed system-config-kdump on Fedora 20, I see that there's an option to dump to NFS, or to an SSH-able server - the latter would never be sanely doable from within the kernel but pretty easy from userspace.

              Various distros do support kdump. I think it's often not enabled by default and does require a (comparatively small) amount of reserved RAM. So that's some motivation for basic QR code tracebacks. I suppose another reason is if they expect they can mostly decipher what happened from a traceback, without the dump being necessary - plus, with a bug report you can easily C&P a traceback.

              This discussion has just inspired me to install the tools, so maybe I'll find out what it's like...

            • In case of a panic, you don't really want to be messing around with disks, in case you break something.

            • If crash dumps are really not possible, how about a plain 'ol text file in the root directory containing the reason for the crash/panic?

              It is easily possible to do what you want. Trivially so... so why isn't it done? There is an answer to that question my friend and the answer is this:

              A kernel will panic when it detects that the environment it is running in is not the same environment it thinks it is running in. What this means is that the kernel can no longer be certain of anything, up to and including whether or not it can write coherently to a file system. Rather than potentially trashing your file system, the kernel just prints to the s

        • by Guspaz ( 556486 )

          They're encoding the kernel oops. Here's one example oops they're using in that thread:

          http://levex.fedorapeople.org/... [fedorapeople.org]

        • Re:Good idea (Score:5, Informative)

          by Levex ( 3606037 ) on Sunday April 06, 2014 @02:08AM (#46674261)
          We are encoding the full Oops, i.e. from the "cut here" to the "end trace" marker. Classic won't ever go away, and we had already created a configuration option called CONFIG_QR_OOPS that can disable this at all. In case your distro or you had compiled it in and you don't want to have QR codes on your screen, I just added a new kernel parameter currently called 'qr_oops', which can as well disable it.
          • Thank you. You are awesome! I like the compile time and parameter passing configuration options to both be available. You are smart and doing it right. Keep on keeping on. :)

      • by icebike ( 68054 )

        how soon until someone accidentally posts a QR code containing confidential information, since they cannot read it themselves.

        Since the crash handler itself generates the code that takes your phone's browser directly to the report site, this isn't going to be a problem.

        Have you never actually uses a qr code the leads to a web site?

  • Huh? (Score:2, Insightful)

    by Anonymous Coward
    And if no one with a phone is there?
    • Re:Huh? (Score:5, Interesting)

      by ledow ( 319597 ) on Saturday April 05, 2014 @07:38PM (#46672973) Homepage

      You lose nothing.

      Anything that could have been logged to disk will have been.

      Anything that couldn't is probably FAR TOO LONG to even start taking down any other way and almost certainly will cut through the screen buffer limit anyway (every kernel panic I've had - which is about a dozen I think - was like that).

      Let's compare and contrast to, say, Windows. Bluescreen with minidump and error code that has 7 million potential causes.

      At least with a QR code, for those totally undumpable errors, you stand half a chance of snapping it and providing several kiloybytes of useful information for someone to work from - that they know hasn't been transcribed wrongly. And can be taken from even a completely hung machine.

      It's a good idea. Someone needs to make a patch for it. The biggest problem - as always - will be making sure you can get to the point that you can write to the video memory and do so with enough processing / storage to be able to write something useful into the QR code.

      • by Rich0 ( 548339 )

        Yup. For the most part the only way I can capture a panic/dump is by phone anyway, and a QR code would allow me to capture a lot more info than 25 lines of text in an image file.

        If I post an image, I'm lucky if anybody looks at it. If I post the text of the error, there is a good chance that somebody will stumble on it using Google and actually make some use of it.

        • "If I post an image, I'm lucky if anybody looks at it. If I post the text of the error, there is a good chance that somebody will stumble on it using Google and actually make some use of it."

          ^-THIS!

          While on the subject, any way to get people to stop doing how-tos with video? I can't print a video

  • by Anonymous Coward

    I worry about the long-term applicability of QR codes. In 10 years, are they still going to be convenient to read, or is some guy going to have to dig out an old smartphone to read the error output from his decade old system?

    • No, just rebuild the kernel. It should be a build option for text or QR panics.

    • by Nikker ( 749551 )
      The QR code has really nothing to do with smartphones. You could take a picture of it with any camera and upload it. QR Codes are just a software library that evaluates based on white and black marks within the picture.
    • I strongly support backwards compatibility, but chances are you won't have anyone to send it to, if you're running a 10 year old kernel. No one will want to debug that, except maybe yourself, but then you'll probably have all the stuff you need to read it (and might even have the text output piping to somewhere else, too).
  • by MichaelSmith ( 789609 ) on Saturday April 05, 2014 @07:51PM (#46673049) Homepage Journal

    QR codes are highly redundant and don't actually contain much data. There isn't enough space for a stack trace or anything like that. Probaby not even a register dump on those big modern CPUs.

    • The 40-L QR code format (177x177) encodes 2953 bytes. That is a useable amount of data for a kernel dump.

      • Re: (Score:2, Insightful)

        by Anonymous Coward

        1) No, 2953 bytes is not enough for a "kernel dump". "Kernel dump" as a term/phrase doesn't even make any sense, come to think of it. Did you mean a stack trace? Register dump? Because "kernel dump" makes me think of "memory dump", i.e. dumping all contents of RAM to swap + rebooting system (which later notices the crash dump header in swap and hopefully extracts it).

        2) If just a stack trace or register dump: 40-L may be too high a resolution to reliably work when using a mobile phone camera to take a p

        • by jeremyp ( 130771 )

          I have a better idea: how about just keeping things how they are. People using mobile phones to take a photo of a stack trace + register dump mostly works reliably (barring wobbly hands).

          ^^ This.

          Add a bit of OCR software and you have a system that can both be read by humans without the aid of special software and by computers to produce textual output with a bit of special software (you need a bit of special software anyway for QR codes, so you don't lose anything).

      • by Guspaz ( 556486 )

        Except L implies the lowest amount of error correction, making it the hardest to read, and few devices will read 40 codes anyhow. They're enormous.

        • You don't need high levels of error correction. That is meant for printed codes that may be damaged. You aren't likely to be displaying on a screen that is so broken that large chunks are missing.

          • by Guspaz ( 556486 )

            It also means that, in marginal conditions, it can reconstruct code blocks that it didn't manage to read correctly. It's not just useful for printed codes.

  • Inscrutable error messages seem to be par for the course when you're looking at kernel panics.

  • by jpellino ( 202698 ) on Saturday April 05, 2014 @08:08PM (#46673117)

    Anything's an improvement over:
    "My computer froze."
    "What happened?"
    "It put some message on the screen."
    "What did it say?"
    "Something about an error."
    "What error?"
    "I dunno. It had some numbers and letters and stuff."

    • by Anonymous Coward on Saturday April 05, 2014 @08:29PM (#46673207)

      And with QR codes, the conversation becomes this:

      "My computer froze."
      "What happened?"
      "It put some white and black crap on the screen."
      "What did it say?"
      "How the fuck should I know? It was random white and black dots! Like a fucking Rorschach test!"
      "It probably was a kernel panic. What was the error?"
      "I dunno, because like I said, ALL IT HAD WAS SOME DOTS AND SHIT. Then it rebooted! So it's gone! FUCK!"

      How is that an improvement? Yes it's a change, but it's not an improvement.

      • Re: (Score:3, Insightful)

        by Anonymous Coward

        I doubt the kernel developer that implements this would forget to put the message

        "Make a photo of this black-and-white dots and send it to crash@kernel.org so we can try to figure out what happened. Thanks for making the Linux kernel better!"

        at the top of the black and white dots.

        • by Flammon ( 4726 )

          There's no need to take a photo and send it; the QR code can include a URL with crash info. The user would simply need to scan it and follow the link.

          The message could be: "Please scan QR code to report Linux kernel bug."

      • by AmiMoJo ( 196126 ) *

        I think most people with smartphones recognize QR codes now, so there is at least a chance they will be able to take an image or use a decoder app on it. Well, next time anyway. And hopefully it will have enough info for you to know what happened.

    • by Anonymous Coward

      Really? You think your end user who hasn't got the brains to take a screenshot of human readable text and send it to you and who probably has never even heard of QR codes is going to have the presence of mind and technical knowledge and ability to take a picture of the code and send it to you?

      That has to be one of the dumbest things I've heard on slashdot...and that's REALLY saying something.

      It's even more worrying that the Linux Kernel devs are giving this idea the time of day.

      • Yes, I do. http://i.imgur.com/zMyvT.jpg [imgur.com]

        I think having the option to scan the QR code with a simple message to do so is one more way to get the info needed.
        Aiming a smartphone at the screen is easier than framing a screen with your phone's camera and hoping for a solid shot without a flash before it does something even stranger.

        They're used on beer ads, chain pizza ads, breakfast cereal and at Disney parks.
        So yes, I think the average end user has a shot at this.

        I'm thinking of Windows in particular, that us

    • Anything's an improvement over:
      "My computer froze."
      "What happened?"
      "It put some message on the screen."
      "What did it say?"
      "Something about an error."
      "What error?"
      "I dunno. It had some numbers and letters and stuff."

      "Show me!"
      "I already rebooted it."

      Personally I would rather have a more sophisticated crash dump system, like other OSs, because whatever is going to fit in a QR code isn't going to help much unless you're looking up known issues in an enterprise Linux vendor's bug database. That's assuming they can cram a stack trace into QR codes, AAAAND you have a problem that leaves a predictable stack trace.

      I don't remember the last time I had a Solaris system crash that didn't leave a dump (try not to giggle). It wo

  • How about a slight modification of a classic: Just change the background color of the display. Even 1 byte RGB gives you 256 messages. (I guess lighting would affect this.)
    • How about a slight modification of a classic: Just change the background color of the display. Even 1 byte RGB gives you 256 messages. (I guess lighting would affect this.)

      Even if we could accurately capture the precise background color value of the display, how could only one byte give enough information for anything useful?

  • No way! (Score:3, Funny)

    by msobkow ( 48369 ) on Saturday April 05, 2014 @10:57PM (#46673757) Homepage Journal

    I am NOT buying a fucking cell phone to read a core dump.

    Just fuck right off already. Not everyone wants a digital leash.

    • You don't need a cellphone to decode QR code images.

      Just sayin', like.

    • by msobkow ( 48369 )

      Troll?

      How so?

      Because I don't want to buy into the digital leash mentality that "everyone owns a cell phone?"

      Fuck you.

  • This is why my alternate OS eschews absolute minimalism and includes mandatory "userspace" features in its design, so it can rely on them being present. I handle the whole (multi) boot process within the OS, so I can launch other OSs from within a running instance. Boot process integration was necessary for firmware segmented loading (optionally put part of the OS in firmware, see: Coreboot). Since the OS handles boot itself it can avoid immediately crapping all over memory at boot and instead upon soft-b

  • The matrix (Score:3, Insightful)

    by BlazingATrail ( 3112385 ) on Sunday April 06, 2014 @12:56AM (#46674095)
    I prefer all my BSOD, crashes and core dumps to use the Matrix dripping green characters and pixel crap method of reporting errors. It's easier to see the patterns. Guru meditation # 42
  • I think it's a great idea to make error reporting easier. I recently experienced an oops but didn't report it because there was no immediate way to do it. However relying on a framebuffer being present is a mistake, in my case it was on an embedded headless system, and framebuffers generally are available only on desktops which are far from being the majority of Linux usage.

    • Yeah, the more deeply embedded, the bigger the pain. Most embedded things have some sort of serial port, and I believe you can send plain text dumps over that.

  • Just show smileys (Score:3, Interesting)

    by Anonymous Coward on Sunday April 06, 2014 @03:10AM (#46674415)

    Linux must be ubuntufied. We need to hide everything because it's way to complicated for the common user or his dog. We need more splash-screens to hide all the stuff that makes no sense anyway. Who want's to know if a module didn't get loaded? As a matter of fact, we should remove unnecessary logs (like message, dmesg, audit), because nobody gives a rats ass. Also: Why have a console? Or init-mode 3 ? People want the graphical stuff, let's get rid of all the ballast like command-line. Those few people still using ancient tools like 'make', 'vi' or (o my god) 'ifconfig' should go and find themselves something else to brag with. Linux MUST go mainstream.

    • That is somewhat a problem, I agree. Please make this QR code display friendly, such as "Your operating system kernel has crashed. For advanced users, a QR code containing additional information is provided below. By taking a photograph of it, you can help the developers to solve the problem. [QR code picture] Press any key to restart." Maybe include the Tux logo there to show that this is about Linux.

      Wayland should improve the situation too, as we can settle with a proper graphics more earlier at the boot.

    • We need to hide everything because it's way to complicated for the common user

      I don't really care about the common user: I like linux much the way it is. However, I don't like transcribing whole screenfuls of text by hand to report bugs. That massively, royally sucks.

      This isn't about making Linux shiny-happy it's about making it easier for developers to debug the kernel.

  • Do you want to mail them physical screenshots? With a qr code, you can mail text.

  • (Not true as I haven't communicated that idea to anyone - and an idea can't be stolen anyway).

    There are a number of advantages of doing this, non-technical people are likely to be familiar with QR-codes, most people have access to digital cameras and resources to convert the QR code to a link and using this works as long as the screen can be written to. Storage system failure or network system failures wouldn't be a hinder to provide a thorough failure analysis.

    The sole disadvantage IMHO is that one have

  • It says, `` ... most of which isn't easily archivable by normal Linux end-users. Abnormal Linux end-users easily archive the text. If you have to use QR codes ... maybe you aren't the right kind of Linux end-user. Just saying.

  • Good idea, but I hope they keep all existing systems in place, and make it optional. Graphics drivers are massively complex, and are probably a significant source of oops. If displaying a QR code means that the kernel needs to interact more with the drivers, and (oh god i hope not) change the resolution to display a QR code, then I expect more fail. People can take photos of the crash messages in 80x25 character consoles anyway, so let's not destroy that.

THEGODDESSOFTHENETHASTWISTINGFINGERSANDHERVOICEISLIKEAJAVELININTHENIGHTDUDE

Working...