Linux's Multi-Grain Timestamps Short-Lived: Removed From The Kernel After A Few Weeks (phoronix.com) 31
An anonymous reader shared this report from Phoronix:
One of the new features merged for the Linux 6.6 kernel was multi-grained timestamps for the VFS layer and wiring it up for the EXT4, Btrfs, XFS, and Tmpfs file-systems. This alternative though to coarse-grained timestamps ended up exposing some problems and this week ahead of Linux 6.6-rc3, the feature has been stripped entirely from the kernel.
Multi-grain timestamps were intended for addressing cases where the current coarse-grained timestamps can be ineffective for updating creation/modification times with a lot of I/O potentially happening within the once per jiffy timestamp... Multi-grained timestamps though were only to be selectively enabled to avoid the performance overhead.
Christian Brauner of Microsoft who originally submitted the feature for Linux 6.6 went ahead and submitted the pull request, which has already been honored, for dropping the short-lived kernel feature... "As there are multiple solutions discussed the honest thing to do here is not to fix this up or disable it but to cleanly revert. The general infrastructure will probably come back but there is no reason to keep this code in mainline."
Multi-grain timestamps were intended for addressing cases where the current coarse-grained timestamps can be ineffective for updating creation/modification times with a lot of I/O potentially happening within the once per jiffy timestamp... Multi-grained timestamps though were only to be selectively enabled to avoid the performance overhead.
Christian Brauner of Microsoft who originally submitted the feature for Linux 6.6 went ahead and submitted the pull request, which has already been honored, for dropping the short-lived kernel feature... "As there are multiple solutions discussed the honest thing to do here is not to fix this up or disable it but to cleanly revert. The general infrastructure will probably come back but there is no reason to keep this code in mainline."
Re:well thought out, thanks Linus (Score:5, Insightful)
Well, at least they realized that it's a mistake and removed it instead of doubling down, pretending it is no problem and eventually telling their users that they're holding it wrong if they have a problem with it, and eventually calling it the next best thing since the invention of the microchip despite all reports from actual users that it's bollocks.
Re:well thought out, thanks Linus (Score:4, Insightful)
Right,
A project as a large as Linux has a lot of complex interactions between components. There are lots of intersections between domains where people can't be exports in all of them.
It is going to be case there are surprises and corner cases that emerge from big architectural lifts like this. This isnt going to be last time something like gets a revert, and that isn't bug its a feature.
Re: (Score:2)
Corrupts: no. Detects corruption that would remain unnoticed on most other filesystems: yes, and it's being told that your data what has gone to shit instead of silently returning crap is what makes people complain.
Things are especially bad on disks "designed for Microsoft Windows" which use the optimization that an unsafe shutdown on Windows would corrupt data anyway and thus the disk can skip obeying barriers for some benchmark bump. And btrfs is intolerant of hardware that actively lies instead of repo
Re: (Score:2)
Btrfs' own documentation lists quite a few of its features as only "mostly okay". It's one thing for a system to get corrupted due to a hardware issue, it's quite another to get corrupted due to a file system error. Virtually every instruction on using btrfs comes with caveats and warnings.
There are other filesystems that detect corruption that are actually fully stable.
Re: (Score:2)
There are other filesystems that detect corruption that are actually fully stable.
Care to tell me which ones? You can't have checksums with overwrite in place; ext4 has recently grown metadata checksums but they 1. don't cover data, 2. are vulnerable to replay attacks (which disks lying about barriers effectively are).
Re: well thought out, thanks Linus (Score:2)
Furthermore, TFS fails to point out that the MGTs were pulled into the RELEASE CANDIDATE for 6.6, not the FINAL release. Big difference. Release candidates are designed as the final stop to see if something is viable to stay in the mainline.
Being merged into a release candidate, and then removed after spotting issues is *exactly* how this is intended to work.
Re: (Score:1)
That's because Lennart isn't involved.
Re: (Score:2)
You should reconsider your chosen life path.
The fix: (Score:4)
While discussing various fixes the decision was to go back to the drawing board and ultimately to explore a solution that involves only exposing such fine-grained timestamps to nfs internally and never to userspace.
This is a real fix because it works for existing configurations. The fix here the multi-grain timestamp feature becomes superfluous which makes dropping it entirely the obvious choice from a code maintenance perspective.
Re: The fix: (Score:1)
Re: (Score:2)
I've tried to comprehend this from multiple sources, but it sounds like maybe caching issues make it possible that a later timestamp change may be overwritten by an earlier one. Certainly with some kinds of files, like say database files which remain open and updated with some frequency, frequent updating of time stamps every time there is a change could create a lot of IO overhead. Even the implementation in question seems to rely on downtime to update timestamps, which means if a system becomes very busy,
Re: (Score:2)
"The kernel will elide fine-grain timestamp updates when no one is actively querying for them to avoid performance impacts. So a sequence like write(f1) stat(f2) write(f2) stat(f2) write(f1) stat(f1) may result in timestamp f1 to be older than the final f2 timestamp even though f1 was last written to but the second write didn't update the timestamp.
Such plotholes can lead to subtle bugs when programs compare timestamps. For example, the nap() function will estimate that it needs to wait one ns on a fine-grain timestamp enabled filesytem between subsequent calls to observe a timestamp change. But in general we don't update timestamps with more than one jiffie if we think that no one is actively querying for fine-grain timestamps to avoid performance impacts.
multi-grain? (Score:2, Insightful)
Re: (Score:2)
if you're a horse.
Wow Microsoft is evolving (Score:1, Offtopic)
They still spew out crap code, but now they take it back all by themselves.
Patch had a bug, got rolled back (Score:2)
Re: (Score:2)
a developer being upright and doing the right thing even if it involves removing his own code is an example of integrity and humility that should not go unnoticed, particularly not in a field dominated by narcissists.
Re: Patch had a bug, got rolled back (Score:2)
particularly when it's M$ owning up to what happened.
Re: Patch had a bug, got rolled back (Score:1)
Re: Patch had a bug, got rolled back (Score:2)
MMMM tasty... (Score:2)
I gave the multigrain timestamps a try, but honestly, I prefer the whole-kernel ones - I'm glad they took them out.
And don't get me started on my chip preferences.
Re:MMMM tasty... (Score:5, Funny)
Have you tried the Dave's Killer 21 Whole Timestamps and Seeds?
Re: MMMM tasty... (Score:2)
The seeds aren't random enough...
Ah yes (Score:2)
As opposed to the 100% whole wheat timestamps.