Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Data Storage Linux

Linus Torvalds Has 'Robust Exchanges' Over Filesystem Suggestion on Linux Kernel Mailing List (theregister.com) 121

Linus Torvalds had "some robust exchanges" on the Linux kernel mailing list with a contributor from Google. The subject was inodes, notes the Register, "which as Red Hat puts it are each 'a unique identifier for a specific piece of metadata on a given filesystem.'" Inodes have been the subject of debate on the Linux Kernel Mailing list for the last couple of weeks, with Googler Steven Rostedt and Torvalds engaging in some robust exchanges on the matter. In a thread titled, "Have the inodes all for files and directories all be the same," posters noted that inodes may still have a role when using tar to archive files. Torvalds countered that inodes have had their day. "Yes, inode numbers used to be special, and there's history behind it. But we should basically try very hard to walk away from that broken history," he wrote. "An inode number just isn't a unique descriptor any more. We're not living in the 1970s, and filesystems have changed." But debate on inodes continued. Rostedt eventually suggested that inodes should all have unique numbers...

In response... Torvalds opened: "Stop making things more complicated than they need to be." Then he got a bit shouty. "And dammit, STOP COPYING VFS LAYER FUNCTIONS. It was a bad idea last time, it's a horribly bad idea this time too. I'm not taking this kind of crap." Torvalds's main criticism of Rostedt's approach is that the Google dev didn't fully understand the subject matter — which Rostedt later acknowledged.

"An inode number just isn't a unique descriptor any more," Torvalds wrote at one point.

"We're not living in the 1970s, and filesystems have changed."
This discussion has been archived. No new comments can be posted.

Linus Torvalds Has 'Robust Exchanges' Over Filesystem Suggestion on Linux Kernel Mailing List

Comments Filter:
  • Ignoring the characters of the story, it'd be nice to summarize the concepts involved. Bit patterns used as numerics or not, what's the suggested reference index under debate here as an alternative?
    • What do you mean? This is Slashdot, a contemporary, vaguely tech-flavoured "news" site. The story is about the outrage of Torvalds using CAPS LOCK.
    • Re: (Score:3, Insightful)

      by AlanObject ( 3603453 )

      The inode concept was introduced when physical media was fairly small and it was fairly easy to predict and provide for the maximum likely number of files on that file system. Although the data structure was limited in various ways and subject to disruption (fsck it) it just isn't practical for the multi-petabyte file systems that are appearing more often these days.

      It has served us well many years, but the $5,000 10MB hard drive just isn't something we buy anymore.

      I don't know what the preferred solut

      • Re: (Score:3, Insightful)

        by dfghjk ( 711126 )

        You've managed to conflate identifiers with data structures more successfully than the quotes have and contributed nothing to resolving that confusion. But sure, "fixed-sized tables" don't seem like a "preferred solution", not that anyone suggested otherwise or even defined what the "solution" would solve.

        "It has served us well many years, but the $5,000 10MB hard drive just isn't something we buy anymore."

        Nor was it anything we bought when Linux was conceived. And "inodes" aren't a Linux creation. This

      • by sfcat ( 872532 )
        You also don't know about this issue. inodes are not going away. Nobody is saying that. What is going away is giving inodes unique identifiers. And (one of) the problem (edge case) being discussed involve tars that cross filesystems of different types. See how complex this is? Now, even though I probably know more than you about this, I still am pretty sure I don't know enough to go head to head with Linus on this. The problem is that this ignorant Google engineer did. He had multiple chances to sto
    • by bsolar ( 1176767 )

      From what I understand, the underlying issue is that some pseudo-fs don't behave like "real" filesystems in all respects, causing some operations to not function properly, like trying to tar the files or performing some forms of copy.

      So there seems to be some effort in trying to make these pseudo-fs "technically" behave like real filesystems in those scenarios, but by doing that the code would become more complex and Linus' issue is that it's worthless complexity, since trying to tar or copy those pseudo-fs

  • I'm a software developer who is trying to forge a path towards development in the Linux ecosystem. In order to have some experience in this area on my resume, I've started working on a personal project to enable automatic versioning of certain files on the file system (every time you save a file that has versioning enabled, the kernel creates a copy-on-write version of that file, storing only the blocks that have changed). Before reading this exchange, I had very low expectations that my changes to the VF
    • Like the RH article mentioned, inodes are only unique per-filesystem, having a mount-point in the middle of a hierarchy where someone will run tar or find might mess up the results.

      Even then iirc some networked filesystems like Samba might not even expose inode information. Dunno if Linux makes up inode numbers from pathnames or it's just the same? (I remember reading about it being a contentious issue back in the day to allow Samba mounts due to the lack of inode numbers, I guess the tide has turned since)
      • Like the RH article mentioned

        This is Slashdot - what is an article?

        inodes are only unique per-filesystem

        I had considered this but dismissed it because I didn't think applications did anything with inodes due to the exact reason mentioned. While it may be more efficient for an application to work with inodes directly, there are many reasons that can get you into trouble and your safest bet is to work with the path. Then again, the path can change if a device is mounted to a different location later. For

        • How does one detect hard-links if one doesn't have a filesystem unique identifier (inode)?
          Multiple backup systems (including tar) do this.
          • by sfcat ( 872532 )
            I believe by format. Links across filesystems include an identifier for the other filesystem in the inode id stored in the first FS. I think that's how it works but I don't remember. Its been a while since I looked at ext4. Also, different FSes will do these things differently so there probably isn't one answer to your question.
            • There is one answer to my question, though.
              st_ino, and st_dev in the POSIX stat structure, which say they, together, uniquely identify a file on the system.
              st_ino is the file serial number. Also known as the inode.
              • by sfcat ( 872532 )
                That's not an inode. An inode is the data structure containing file data or directory metadata. Its arranged in a b-tree usually. st_ino is the inode ID. Also, this only works on certain filesystems, not them all.
                • That's not an inode.

                  Yes, it is.

                  An inode is the data structure containing file data or directory metadata.

                  You're confusing implementation with specification.
                  inodes are an opaque reference. They may refer to a data structure, they may not.

                  Its arranged in a b-tree usually.

                  Shut up. You don't even know what the fuck a b-tree is.

                  st_ino is the inode ID.

                  lol. No. That's why it's not called "inode ID".
                  that's why the ino_t type isn't a struct.

                  Also, this only works on certain filesystems, not them all.

                  No, it works on them all, except for the one in question.
                  Check /dev, /sys/, /proc, any tmpfs (/dev/shm/ is a common one) network filesystem.
                  All of them.

                  There's a reason for this:
                  The POSIX specification demands it.

    • Re:Yikes (Score:5, Informative)

      by Burdell ( 228580 ) on Saturday February 03, 2024 @03:05PM (#64211152)

      The discussion in question is purely about virtual filesystems like /proc and /sys, where inodes don't really exist or matter (you already can't just tar up those filesystems, so unique inodes don't matter). The reason Linus got hot is that the same person keeps making broken code submissions to "fix" something that is not a problem (and instead making more problems in the process), and has so far apparently mostly ignored the more gentle approach.

      On the other hand, someone else has been working on another virtual FS, and he and Linus (and Al Viro, the lead VFS dev) have had a long back and forth and productive discussion about issues and how to fix them.

      If you do your homework on how to integrate with the existing systems, can describe what you are doing, and take productive feedback, Linus and most of the other kernel developers are just fine to work with. This is not "news", this is somebody looking to MAKE news of "see Linus is a bad person!" because someone else repeatedly poked him until he got mad.

      • Honestly after reading your description it sounds like whatever problem the guy had was with their own custom software not working right and blaming the kernel for it? I mean I get that "tar" won't just work, but its just tar you know? If its some kind of backup software you should be checking mounted file systems anyway.
      • You absolutely can tar virtual filesystems, and people have been for ages.
        The particular filesystem in question, probably not, but /dev and /proc- absolutely.
        And as long as we're on this bone-dead line of reasoning, what about tmpfs?

        Linux is being a fuckwit, here. It's rare, but he does it on occasion. He's proposing breaking his own rule- which is don't break userspace. To accommodate all the ways userspace has come up with to get around deficiencies in POSIX.
      • The Linux VFS layer doesn't rely solely on inodes, so that makes sense; dentries matter more with finding what you are looking for. That can be somewhat confusing to someone with a more traditional filesystem background.
        • No, but userspace does.
          inodes are part of the POSIX contract.
          st_ino and st_dev uniquely identify a dentry in a specific filesystem. They, together (and with i_generation in Linux) constitute the File Serial Number.
    • by jd ( 1658 )

      You might want to look at the work done on Nilfs2 in this area. It never hurts to know what others have learned, whether or not you pursue a similar approach.

      It's also worth considering, given your goal. Nilfs2 is widely available and used, and has been for some time. I don't know Linus' feelings towards it, but they haven't stopped it being popular.

    • Inodes are only unique to a filesystem. That's it. If you import a foreign filesystem, how would you prevent collision? I suppose we could require shaXXXsums, but that is going to be extremely burdensome. Some filesystems, like ZFS, do have something like that but to require it for all filesystems would be burdensome.
  • Not In a Regular FS (Score:5, Informative)

    by glum64 ( 8102266 ) on Saturday February 03, 2024 @12:39PM (#64210810)
    Mind you, the entire discussion is pertinent solely to the behaviour of pseudo-filesystems where the sematics of inode is very much different, if meaningful at all.
    • And here I thought the entire discussion was about how rude Linus was once again.

      However it seems to me he is only rude in cases where polite already failed, and once again he proves that rude works in those cases.

      How many times do you have to politely decline what is basically the same PR as the last one declined? The PRs dont stop with polite. They stop with rude.
    • Oh, context that changes the entire ordinary interpretation of the conversation!

      TYFYS.

    • where the sematics of inode is very much different, if meaningful at all.

      I disagree entirely, and so does every linux archiver I know of.
      Note, from an implementation standpoint, tmpfs is as pseudo as the others. It has no actual block backend, and simply forwards requests upstream to the VFS.
      Inodes are entirely synthetic, and meaningful.

  • by 93 Escort Wagon ( 326346 ) on Saturday February 03, 2024 @01:22PM (#64210914)

    Who does he think he is - Poettering?

  • Isn't a filesystem-wide unique Inode ID required to implement hardlinks?

    As in, any two files with the same Inode ID, size, and a hard link count greater than 1 must be hardlinks of each other if they are on the same mounted partition.

    • It gets complicated if you have a file system that supports snap shot volumes. At the low-level the answer is no to unique inodes.

      • Subvolumes, and ya. However, it's easy enough to make your archiver aware of the subvolume, and to treat it like another mountpoint.
        The point here is, that a certain behavior has been taken for granted for ages by userspace, and no real replacement is offered.
        Linus is calling what tar does stupid because he has dug himself a hole he can't get out of. There is no good way to identify hard-links without a filesystem unique identifier.
        • by sfcat ( 872532 )
          What tar does is stupid. What tar should do is work at the level of the path's actual character contents and the POSIX API. What it does is read the inode table and work from that. In all fairness, the code was written in a time when reading the inode table was important for speed. We don't live in that world anymore (speed is important, reading the inode table no longer gives you that advantage). What we should do is rewrite tar so it works on top of a filesystem instead of reaching into its internals
          • by Dwedit ( 232252 )

            TAR is not a compression tool, it is a tool to stream a series of files to another stream. Capturing that stream to a file just happens to be one possible way to use TAR. It can also be used as a hardlink-preserving file copy command when used with pipes. Can even pipe it through netcat and now it's a way to transfer files over a network.

            But it's horrible for creating archive files. No seeking support, need to stream through the whole archive file (possibly skipping through the data if the TAR file is n

            • Depends on the purposes of the archive, but your listed shortcomings are undeniable.

              The real advantage to tar, is merely that it conforms to specifications. You can generally re-create any thing you may find on a POSIX compliant filesystem with it. Could a better format do that job? Definitely. I think we're all still waiting for that, though.
          • hard links are part of POSIX, you ignorant fuckstick.
            There is no POSIX inode table- inode is merely a synonym for file serial number in POSIX.

            Now shut the fuck off, and scamper back into your bullshit-filled basement.
    • Isn't a filesystem-wide unique Inode ID required to implement hardlinks?

      To implement, no. To use them with any kind of fucking rationality from userspace, yes.

      • by sfcat ( 872532 )
        Why do you need to use them at all? The abstraction from userspace should be the files and directories, not reaching into the internals of the FS for some magic number.
        • by Dwedit ( 232252 )

          Maybe you want to back up files and maintain the hardlinks.

          Or maybe you want to know if writing to one file will modify another file.

          Your tool for creating hardlinks is a userspace tool.

        • What? lol.
          Why do you need to use them at all?

          Ok, toolshed. You clearly have no fucking idea what you are talking about here.
          st_ino, st_dev, and st_nlink in struct stat is why.

          These aren't magic numbers from the filesystem. They can be sequential, they can be whatever you want- they just must be unique per mountpoint. POSIX doesn't guarantee that an inode will forever stay the same. It merely guarantees that st_dev + st_ino can uniquely identify a file.
  • Thanks for the insight into the kernel dev process. It's always super interesting!

  • Leadership at work (Score:5, Informative)

    by dcooper_db9 ( 1044858 ) on Saturday February 03, 2024 @01:35PM (#64210952)
    You have to read through the thread to understand the context. A developer on the list had said "lets not get into this here". Rather than taking the conversation offline, the Google developer plowed forward. Torvalds isn't lashing out at someone. He's reining in a thread that's gone off the rails.
  • Rostedt eventually suggested that inodes should all have unique numbers...

    As I understand it inodes are themselves are supposed to be unique numbers. How would adding a second unique number to that already-unique number make things better?

    "Yo dawg, we heard like like unique numbers, like inodes, so we added a unique number to your unique number."

    • Unique numbers are not always trivial.

      You can choose huge random numbers, so the probability of collision is close to 0, but huge random numbers are easy on a PC, not necessarily so easy on some random ARM device with a crappy ring oscillator RNG. Linux runs on many things. Also big random numbers are, well, big. So take up space. I don't think inodes IDs are 256 bits or more. The internet tells me they are 32 bits. So not nearly big enough for a globally unique identifier. Maybe big enough for a locally un

  • If Linux inode numbers aren't unique, then what does this mean for using the "st_ino" and "st_dev" members of a "struct stat" in the POSIX/X-Open standard to uniquely identify a file?

    • by Junta ( 36770 )

      In the original context of the suggestion, it was about breaking that behavior specifically in the context of eventfs, not a "real" filesystem.

      It did seem to suggest that btrfs also breaks that behavior with mounted snapshots too... Of course, this scenario gets tricky, one could imagine that a snapshot *should* by all rights be able to have an indepedent set of inodes, but if it's a filesystem construct without a distinct device, then that seems awkward, since you would expect to map st_dev to a devnode,

  • by gavron ( 1300111 ) on Saturday February 03, 2024 @02:16PM (#64211050)

    If you don't like what Sony does, don't buy Sony. If you don't like what Linux does, don't use Linux. If you don't like how Linus Torvalds expresses himself, don't engage with him. Each "tool" has its use, purpose, cost, and downsides. I happen to like his style, as do other coders. Some puff pastries are offended, and some call him a relic. Be that as it may nobody forces you to buy Elon Musk's Tesla or Steve "I stole a liver" Jobs' iPhone or Linus' "Shut the hell up" Torvalds or Kimi "I know what I'm doing" Raikonnen's driving.

    Onto Inodes. The original article didn't go into it, so I will. Consider this a shortened layman's version so as not to confuse the puff pastries upset that Linus has a brain AND a mouth.

    Inodes are data. They are structured in a way that allows matching up some part of other data (e.g. files) with some part of other data (e.g. who owns the files, who is allowed to access them, for what purposes, creation time, etc.) This makes inodes "metadata" which is "data about other data." People like to say "Oh it's metadata" as if that makes some difference. It does not. It's data, and like all nongarbage data, has some purpose.

    An inode may contain a unique identifier. It's not required, and in some filesystems isn't even an option. A lightweight file system doesn't need the inode to be independently tracked, cataloged, nor identified. Its purpose can be met so long as its association with the right data object(s) is(are) maintained.

    There is no question that there IS DEFINITELY A NEED to maintain information ABOUT files in a filesystem. At the very least, the location of the contents and an association with a 'handle' (which can be abstracted into "filename" or "path") are necessary for simple human interaction. If humans didn't have to list a directory contents ('ls') or view a folder then the whole concept of the 'handle' becomes something which may not be comprehensible to humans. Examples of this about in the various cache directories (such as Google-Chrome's cache, Firefox's cache, etc.) and the abstraction is a very long hexadecimal string.

    Given that there are many filesystem types, some physical, some virtual, some simple, some layered, and each has its strengths and weaknesses*, the global concept of inode in Linux is a matter of discussion. That's what's occurring here. Linus has his opinions, colored by decades of experience, some of which comes from reviewing proposals that never made it into Linux, others in watching evolution of various metadata storage mechanisms. Google hires great people (Hi Bruce!) so there's no doubt some of them are fantastic filesystem concepts people, but ... not... all of them are.

    You want to take sides on this one? It's a free country. Take a side. Before you judge someone's style, think to yourself "Would I rather Linus was bold in his statements and kept cruft out of the kernel" or "Would I rather some idiot was 'Dictator for a day' and messed up the kernel so badly nobody could fix it."

    E
    * I've gone out of my way to leave out discussions of filesystem weaknesses that aren't in the filesystem itself, but are about the actions of the eponymous developer.

  • Not fit to weigh in on this one, only to comment that a lot of people might envy having those same engineering problems that first engaged you in school, still your reason to tear-hair decades on in the career.

    Linus is actually farther along in his career arc than the pond scum that had to go apologize to Congress for their products this week; an apology cheerfully given, as it was not just cheap, but free. Congress has no intention of actually legislating or regulating away their oligopoly and their freed

  • Add some sound effects, such as a cat hissing, and this could make the script for geek sitcom...
  • At one point, Linux hat a tirade that the kernel should never ever break userspace. Which I appreciated.

    Then someone birngs up that tar and find break if inodes are funky, and Linux counters that the real problem is "tar does stupid things".

    While the original context was inodes in the context of a pseduo filesystem in which case I'd tend to understand the "no meanigingful indoes" scenario, but the tar example cited was traversing into snapshots, which tar would do when snapshot access looks like just anothe

    • The "tar does stupid things" really caused the hair on the back of my neck to raise.
      tar, and every linux archiver I'm aware of uses inodes to detect hard-links. This is a problem on btrfs snapshot subvolumes, as you mentioned, but in that case, it can at least be detected and treated like another mount.
      The idea the userspace doesn't need inodes, and only jackasses use them is wrongheaded. Linus is wrong. It happens.
      • by sfcat ( 872532 )
        He's not wrong. What needs to happen is if detecting hard-links is so important, then there needs to be a function all FSes implement that tells you if a path is a hard or soft link. Then the tools just use that rather than reaching into internal data structures that are implementation specific. This would clean up all of these issues in a clean way that reduces bug count significantly. He understands there is a leaky abstraction here that is a bug farm. Just because compression tools do this, doesn't
        • He absolutely is wrong, says the POSIX specification.
          You don't implement a new fucking kernel interface to do the job of an existing POSIX interface- that's quite literally against the standing rules of the kernel. Stop talking out of your ass.
        • by Junta ( 36770 )

          What needs to happen is if detecting hard-links is so important, then there needs to be a function all FSes implement that tells you if a path is a hard or soft link. Then the tools just use that rather than reaching into internal data structures that are implementation specific. This would clean up all of these issues in a clean way that reduces bug count significantly. He understands there is a leaky abstraction here that is a bug farm. Just because compression tools do this, doesn't make it right.

          Well that is specified already:

          Identifies the device containing the file. The st_ino and st_dev, taken together, uniquely identify the file. The st_dev value is not necessarily consistent across reboots or system crashes, however.

          Per glibc documentation of the POSIX defined attirbutes st_ino and st_dev. So technically there is a mechanism: the combination of backing device and the filesystem curated inode.

          This is not reaching into implementation specific data structures, this is following the specification in terms of how files are explicitly declared to behave.

          If anything, it would make sense for filesystems wishing to opt out of meaningful inodes to report back some newly dubbed 'special' value. Giv

  • There's a huge range of filesystems now on Linux, although some are poorly maintained and are being dropped, which is a shame.

    This makes Linux an ideal target for people to experiment with crazy schemes, because they can do a like-for-like comparison with a truly wonderous range of approaches.

    Instead of talking on the list, if he has a good idea, he should try it out. Talk is cheap, benchmarks and robustness are the true measure.

    Duplication of functionality isn't an issue if the idea is sound and proves its

    • That touches a key element which many people seem to overlook. In this type of kernel development, correctness rules above all else. If something is incorrect, especially if it violates the contracts of the operating system or potentially loses data, the harsh lesson will follow.

      Linus has absolutely had serious problems with being a jerk in his reactions, but the topics he is triggered by are nearly always about correctness.

      Prove that it works, have data to back it up, be able to prove that the behavior

  • Which has mode f-words per sentence, "Robust Exchange" or "Animated conversation" ?

Genius is ten percent inspiration and fifty percent capital gains.

Working...