Linux Kernel Developer Chris Mason's New Initiative: AI Prompts for Code Reviews (phoronix.com) 47
Phoronix reports:
Chris Mason, the longtime Linux kernel developer most known for being the creator of Btrfs, has been working on a Git repository with AI review prompts he has been working on for LLM-assisted code review of Linux kernel patches. This initiative has been happening for some weeks now while the latest work was posted today for comments... The Meta engineer has been investing a lot of effort into making this AI/LLM-assisted code review accurate and useful to upstream Linux kernel stakeholders. It's already shown positive results and with the current pace it looks like it could play a helpful part in Linux kernel code review moving forward.
"I'm hoping to get some feedback on changes I pushed today that break the review up into individual tasks..." Mason wrote on the Linux kernel mailing list. "Using tasks allows us to break up large diffs into smaller chunks, and review each chunk individually. This ends up using fewer tokens a lot of the time, because we're not sending context back and forth for the entire diff with every turn. It also catches more bugs all around."
"I'm hoping to get some feedback on changes I pushed today that break the review up into individual tasks..." Mason wrote on the Linux kernel mailing list. "Using tasks allows us to break up large diffs into smaller chunks, and review each chunk individually. This ends up using fewer tokens a lot of the time, because we're not sending context back and forth for the entire diff with every turn. It also catches more bugs all around."
Re: (Score:3)
I'm not sure the business numbers are all that important when it comes to code. We already have them trained on _a lot_ of code, and since they're more focused they can be smaller without being useless compared to the full size ones. If we can run them locally on a single GPU, it doesn't go away when the bubble pops and the big players stop throwing away money.
As with any tool, they need to be used where they actually offer value. Which is definitely not to architect solutions, but to sanity check smaller c
Re: Don't be stupid, people (Score:3)
Re: (Score:3)
The hidden costs - especially on up-and-coming devs - is the fact that knowledge isn't being retained so a solved problem will end up being re-solved by LLM again. And again. And again.
Worse yet, given how human creativity works, this means we won't see novel applications or solutions. Just layer after layer of mostly functional AI slop
Re: (Score:2)
If you just use it to create an initial mock-up, I don't think that cost occurs.
Re: (Score:2)
The hidden costs - especially on up-and-coming devs - is the fact that knowledge isn't being retained so a solved problem will end up being re-solved by LLM again. And again. And again.
Worse yet, given how human creativity works, this means we won't see novel applications or solutions. Just layer after layer of mostly functional AI slop
Funny thing is, the more AI consumes all of our CPU and RAM for their data centers, the more necessary it is to build performant and optimized code, both for CPU cycles and for memory constraints, since consumer devices are going to have worse compute resources available going forward due to cost.
The dinosaurs among the devs (of which I sadly count myself as one), know how to write code that squeezes every bit of useful work from every clock cycle, and how to use the least amount of memory necessary to a
Re: (Score:2)
So the more layer after layer the AI slop generates, the more job security actual skilled developers will have.
That is certainly true. And since junior people will get reduced in numbers and very few learn actual skills anymore, actually skilled developers may all but die out. Will take a while, but we are looking at a potential catastrophe in the making.
Re: (Score:3)
Indeed. And when it comes to code security or architecture, LLM type AI is a complete catastrophe. Now, who is supposed to do these when junior people get scarce and those left do not learn the basics anymore?
I predict that even if LLMs stay around and available, LLM code will cause a delayed catastrophe.
Re: Don't be stupid, people (Score:3)
Correction:
They've been trained on a lot of UNCHECKED code.
Re: (Score:2)
Correction: They've been trained on a lot of UNCHECKED code.
Garbage in, garbage out.
Re: (Score:2)
Indeed. And they do not know never developments, including security bugs, docu updates, features made obsolete, new regular bugs, etc.
A coding model becomes a problem when not updated for a year or two.
Re: (Score:2)
The 10-20% difference that you're quoting is actually huge. It's a difference between a successful cancer treatment and deadly poison. It's a difference between a building standing strong, or collapsing and killing thousands of people. Correctness matters. If you can't rely on results, the tool is useless in a professional setting where your reputation and life of others depends on it.
As for the PhD-level problems - most of the time there's nothing to solve. AI just serves you a reheated pancacke of the sol
Re: (Score:2)
First, you need to retrain frequently for code as well unless you want to stay on low amateur level. Think security problems (of which tons of new ones are discovered all the time), new libraries, old stuff getting deprecated, etc. And second, you cannot run the large coding LLMs locally in a meaningful way. And you would need to be able to get them in the first place.
So, yes, the catastrophic (3 years in) business numbers matter. In fact, they are critical.
Re: (Score:2)
This is starting to look like bad faith argumentation to me. Remember that the premise here was using LLMs as assistants for code review, not code production. Decent coding assistant LLMs that can be run locally already exists, and have for years. They're not as good as full size models running in data centers, of course, but they're also not useless. As long as they are able to catch _some_ security issues, at an acceptable signal to noise ratio, they offer value in freeing up reviewer brain bandwidth to f
Re: (Score:1)
Re: (Score:2)
The question isn't "is AI ever useful", but rather "is it useful enough today for the specific use case?". That is what this guy is exploring. My gut feeling is that it isn't, but I don't have the experience to know for sure, and neither does anyone else.
Well, most of us don't have the specific experience to know whether AI is useful enough for Chris Mason's specific use case, but we have enough from our own.
I have found that AI is good enough for a first draft of code, or for providing comments on existing code -- but I want a human to review whatever it generates, and would expect the normal suite of other tools (linters, SAST/DAST, fuzzers, etc.) to pass the code before publishing it. I recently interviewed someone else with 25-ish years of professional
Re: (Score:1)
Well, most of us don't have the specific experience to know whether AI is useful enough for Chris Mason's specific use case, but we have enough from our own.
I have found that AI is good enough for a first draft of code, or for providing comments on existing code
I have no idea how good AI is right now for this task, but presumably existing off-the-shelf dedicated s/w for doing code inspections will use way less resources than an LLM. Regardless of the inspection tool, human review after the code has passed that inspection is always helpful, if for nothing else than maintaining styles, standards, and for awareness of what is being done. Awareness is important until we turn the whole shebang over to the LLMs when we are enslaved.
It will be interesting to see the poin
Re: (Score:2)
I have no idea how good AI is right now for this task, but presumably existing off-the-shelf dedicated s/w for doing code inspections will use way less resources than an LLM.
Based on my experience, I am almost certain you are right: the static cofe analyzer / static application security testing tool that I have used professionally needs fewer resources than an LLM. But on the other hand, an LLM might catch things that the special purpose tool does not. The guy I interviewed said the race conditions escaped his static analyzer, and I've seen even a locally hosted mid-size (120B parameter) LLM flag cut-and-paste errors that an SCA tool might miss. (I did not run a dedicated an
Re: (Score:2)
You can't have the specific expertise, because he's training it himself. Expect it to catch the kinds of errors he trains it to catch, and to probably do a good job at that...and a worse than lousy job on other kinds of errors.
Re: (Score:2)
Indeed. And that is only one of the problems. It is already on show-stopper level though.
Let's face it, to do things competently (and find the issues AI misses), you need a lot of experience. Experience can only be gotten from doing things. If you let the AI do the easier things, you will never get to the skill level needed to master the harder things. If enough people do that, it could be catastrophic.
Re: (Score:2)
That NOT a problem. Don't expect any tool to be the be-all-end-all. It's really useful to have a bunch of classes of error be automatically detectable.
Re: (Score:2)
You do not understand what the problem here is. I am pretty sure I do.
Re: Don't be stupid, people (Score:3)
Re:Don't be stupid, people (Score:5, Insightful)
First, LLM-type AI may not actually be around in any suitable way in a few years. The business numbers are catastrophic.
That’s not an argument, it’s astrology with a spreadsheet. Even if Vendor X faceplants into a crater, the workflow Chris is talking about doesn’t evaporate. These are prompts and scripts that turn “big diff, big context” into small, reviewable chunks. Swap the engine, keep the tooling. The kernel has outlived entire tech empires, compilers, VCSes, “next big things,” and at least three “Linux is doomed” decades. Tools that reduce reviewer fatigue stick around because reviewers keep using them, not because a quarterly earnings call went well.
Also: the LKML thread is about making AI review less magical by structuring it, scoping it, and forcing it to show its work. That’s basically the opposite of “bet the farm on a single vendor’s hype cycle.”
Second, LLM-type AI misses what is really important, namely quality of architecture and interfaces
Correct in the most trivial way possible: lint won’t design your subsystem either -- and nobody claimed it would. This isn’t “let the chatbot be a maintainer,” it’s “use a tool to catch more bugs while humans stay responsible for architecture and interfaces.”
Kernel review is layered. Humans do the high-level “does this belong, does it fit, is the interface sane, does it age well?” work. Tools do the tireless “did you miss a refcount, a NULL check, a lock ordering hazard, a surprising call path” work. Chris is explicitly carving the diff into tasks, extracting call graphs, and even cross-checking lore and Fixes tags. That’s a checklist machine, not an architect. Complaining it’s not an architect is like complaining grep can’t write a better filesystem -- which Chris *obviously* can do... :)
and is[t] bad at finding security problems outside of toy examples.
If your model is “AI must find every non-toy security bug or it’s worthless,” then congrats, you’ve also just declared static analyzers, fuzzers, and humans worthless, because none of them are complete. In reality, we stack imperfect tools and get better outcomes. Syzkaller doesn’t understand architecture either, yet it finds terrifyingly real bugs. Sparse doesn’t grok interfaces, yet it saves us from type and annotation shooting us in the foot. Smatch doesn’t have to grok the dev's intent to catch patterns reviewers miss at 2AM.
AI review is the same category: a probabilistic pattern spotter that can flag suspicious deltas fast, especially when you constrain context, force targeted questions, and make it operate on extracted facts instead of vibes. That’s exactly what this informal RFC is doing, including extra rigor around syzbot reports.
if you don’t want to use the prompts, don’t. But don’t pretend “VC math scary” and “AI isn’t a maintainer” are substantive rebuttals to an RFC, even an informal one, about reducing token waste and catching more bugs with a structured, auditable review pipeline.
Re: (Score:1)
First, LLM-type AI may not actually be around in any suitable way in a few years. The business numbers are catastrophic.
That’s not an argument, it’s astrology with a spreadsheet.
That you do not understand business analytics does not mean they do not work. Seriously. Your ignorance is strong.
Re: (Score:2)
And if you don't have the hardware to run one, there are many crowdsourced solutions, or even companies that are profitable who do nothing but inference on local models.
It's very true that training is a big fucking problem. Companies building out datacenters to house their GPU horsepower requirements are a big fucking problem.
Raw rented-hors
Re: (Score:2)
This is a use case where I could see LLM being decent, subject to a couple of constraints:
- The submitter is always able to advance it to the next stage regardless of what the LLM says
- A human review is always the next step after the LLM review
LLM code review is actually one of the more innocuous situations, it offers very small, digestible indirect feedback about code. It's still usually wrong, but it occasionally will catch something useful that was otherwise overlooked. It may spare the reviewers from
Re: (Score:2)
So essentially catch submissions by incompetents and AI-slop? That could work. That may even become necessary, given how many idiots think AI turns them magically into good coders.
But it seems that is not what the person from the story is trying to do.
Wtf is a kernel stakeholder? (Score:3)
Is that pointy hair speak for kernel code contributers?
Re: (Score:3)
Based on the summary, to me the use of "upstream" indicates "maintainers". As in the people responsible for approving and merging.
Re: (Score:2)
"stakeholder" in this case means "contributors" and "people who benefit from those contributions" so maintainers and distro developers.
Re: (Score:2)
Disagree [ I am not a kernel contributor ]. If you look at the mailing list, patches go through multiple iterations, with back and forth between code reviews.
Code ought still be reviewed by humans but if an AI can reduce patchsets from, say, 8 to 5 based on the experiences of a veteran kernel developer then it has performed a service.
And maybe the AI can introduce a special, pedantic, Linus-mode to pre-emptively yell at you when your approach is bad ! :)
Re: (Score:2)
Yeah, I'd rather be warned privately that my code likely has some issues than have it pointed out publicly! Of course for educational purposes some people need to make public mistakes, but it need not be as many as are doing it now.
Re: (Score:2)
What does that have to do with *what* or *who* is reviewing your code? If you're submitting PRs to the kernel, you're way beyond the concern of being self-conscious about shit like that.
Re: (Score:1)
Imagine getting your first code reviewed instantly instead of waiting a day. And then iterating 5 times with the bot before the first human sees it and tells you what's against the guidelines. You save a lot of time. You have less wait time to the first feedback and can prepare a good patchset, and the maintainers only need to read your code 3 times instead of 8. Everyone wins.
Re: HELP! (Score:2)
Probably confusion caused by too many drugs.
Re: (Score:2)
Help! I am trapped inside the fortune cookie factory, being forced to write quips that sound funny if you append [in bed]!
I wish to subscribe to your fortune cookie newsletter.
Did AI write the summary? (Score:3)
It's good to know he has been working on a thing he has been working on.
Why should I trust him... (Score:2)
When btrfs RAID5/6 has been broken for well over a decade? The filesystem is a shitshow of bugs and bad implementation compared to ZFS.