
Linus Torvalds Expresses His Hatred For Case-Insensitive File-Systems (phoronix.com) 123
Some patches for Linux 6.15-rc4 (updating the kernel driver for the Bcachefs file system) triggered some "straight-to-the-point wisdom" from Linus Torvalds about case-insensitive filesystems, reports Phoronix.
Bcachefs developer Kent Overstreet started the conversation, explaining how some buggy patches for their case-insensitive file and folder support were upstreamed into the Bcachefs kernel driver nearly two years ago: When I was discussing with the developer who did the implementation, I noted that fstests should already have tests. However, it seems I neglected to tell him to make sure the tests actually run... It is _not_ enough to simply rely on the automated tests. You have to have eyes on what your code is doing.
Overstreet added "There's a story behind the case insensitive directory fixes, and lessons to be learned." To which Torvalds replied.... "No."
"The only lesson to be learned is that filesystem people never learn."
Torvalds: Case-insensitive names are horribly wrong, and you shouldn't have done them at all. The problem wasn't the lack of testing, the problem was implementing it in the first place. The problem is then compounded by "trying to do it right", and in the process doing it horrible wrong indeed, because "right" doesn't exist, but trying to will make random bytes have very magical meaning.
And btw, the tests are all completely broken anyway. Last I saw, they didn't actually test for all the really interesting cases — the ones that cause security issues in user land. Security issues like "user space checked that the filename didn't match some security-sensitive pattern". And then the shit-for-brains filesystem ends up matching that pattern *anyway*, because the people who do case insensitivity *INVARIABLY* do things like ignore non-printing characters, so now "case insensitive" also means "insensitive to other things too"....
Dammit. Case sensitivity is a BUG. The fact that filesystem people *still* think it's a feature, I cannot understand. It's like they revere the old FAT filesystem _so_ much that they have to recreate it — badly.
And this led to a very lively back-and-forth discussion.
Slashdot's summary of the highlights:
Bcachefs developer Kent Overstreet started the conversation, explaining how some buggy patches for their case-insensitive file and folder support were upstreamed into the Bcachefs kernel driver nearly two years ago: When I was discussing with the developer who did the implementation, I noted that fstests should already have tests. However, it seems I neglected to tell him to make sure the tests actually run... It is _not_ enough to simply rely on the automated tests. You have to have eyes on what your code is doing.
Overstreet added "There's a story behind the case insensitive directory fixes, and lessons to be learned." To which Torvalds replied.... "No."
"The only lesson to be learned is that filesystem people never learn."
Torvalds: Case-insensitive names are horribly wrong, and you shouldn't have done them at all. The problem wasn't the lack of testing, the problem was implementing it in the first place. The problem is then compounded by "trying to do it right", and in the process doing it horrible wrong indeed, because "right" doesn't exist, but trying to will make random bytes have very magical meaning.
And btw, the tests are all completely broken anyway. Last I saw, they didn't actually test for all the really interesting cases — the ones that cause security issues in user land. Security issues like "user space checked that the filename didn't match some security-sensitive pattern". And then the shit-for-brains filesystem ends up matching that pattern *anyway*, because the people who do case insensitivity *INVARIABLY* do things like ignore non-printing characters, so now "case insensitive" also means "insensitive to other things too"....
Dammit. Case sensitivity is a BUG. The fact that filesystem people *still* think it's a feature, I cannot understand. It's like they revere the old FAT filesystem _so_ much that they have to recreate it — badly.
And this led to a very lively back-and-forth discussion.
Slashdot's summary of the highlights:
Overstreet replied that there's "an established need" for case insensitive directories. ("So I don't want any whining about [how] 'bad' they are. I want them to work, and I want the code to be clean and maintainable, and not make my eyes bleed when I have to go and debug it.")
Torvalds responded, explaining why the code was partitioned off:
if filesystem people were to see the light, and have a proper and well-designed case insensitivity, that might change. But I've never seen even a *whiff* of that. I have only seen bad code that understands neither how UTF-8 works, nor how unicode works (or rather: how unicode does *not* work — code that uses the unicode comparison functions without a deeper understanding of what the implications are)...
Overstreet replied:
Since you're not saying anything about how you think filesystems get this wrong, this is just trash talking. I haven't seen anything that looks broken about how case insensitivy is handled. And honestly I don't think the security "concerns" are real concerns anymore, since we're actively getting away from directories shared by different users — /tmp — because that's caused _so_ many problems all on its own.
Torvalds: The thing is, you absolutely cannot make the case-insensitive lookup be the fast case...
Overstreet: That's precisely what the dcache code does, and is the source of the problems...
Torvalds: I think you're confused, and don't know what you are talking about. You'd better go learn how the dcache actually works.
Overstreet: No, what I wrote is exactly how CI lookups work with the dcache. Go have a look.
Torvalds: Kent, I literally wrote most of that code, and you are claiming that the CI case is trying to be the fast case...
Overstreet: The subject is CI lookups, and I'll eat my shoe if you wrote that... Are you being obtuse on purpose? I'm saying the CI case is a combination of overoptimized and poorly designed. I'm not saying the CI case is trying to be the fast path relative to case-sensitive lookups, that would be insane.
Torvalds: Start chomping. That nasty code with d_compare and d_hash goes way back. From a quick look, it's from '97, and got merged in in 2.1.50... If you get dentry aliases, you may be doing something wrong.
Overstreet: Yeah, Al just pointed me at generic_set_sb_d_ops()... And you never noticed that the complaints I had about the dcache bits didn't make sense and how I said it should work was how it actually does work? Heh.
Torvalds: That's a funny way of saying "Oh, Linus, you were right in the first place when you called me out on my bullshit"...
Blame Micro~01 (Score:5, Funny)
Re: (Score:3, Informative)
The parent post got down-modded, but it touches on a point here (even if it does not say it explicitly).
The only reason for having support for case-insensitive file-systems is when you need to interface with systems where that is the default: such as Microsoft Windows.
For a Linux-native file system, the use-case would be to use a Linux system as a file server for users on MS-Windows.
Re:Blame Micro~01 (Score:4, Insightful)
Well but even then that functionality would be far better served in the file server daemon as it already needs to do things like character set conversion and file name conversion.
Re: (Score:2)
"user friendliness" (Score:5, Insightful)
It's all about "user friendliness". I put that in quotes, because this is well-intentioned but counterproductive stuff. Don't stress the user out about "help.txt" vs. "Help.txt" - surely they mean the same thing. The problem is that this generalizes very poorly as soon as you leave the realm of basic ASCII. As Linus points out, there are two heart-emojis - is one lower-case and the other upper-case?
This fits right in with the idiocy of hiding file extensions. Don't stress the user over whether it's a JPEG or a PNG, you can just show some completely random indication that it's a pic. But many users have a dangerous half-clue, so they know files extensions exist. So life is fine when they get CutePic.jpg.exe and the ".exe"is hidden.
Or the current trend in hiding actual directories. Where did that file get saved? How many different places do Windows and iOS store files? How many of those are hidden? Which ones are in the cloud and which are local? It's all magic and your average user has absolutely no clue...
Re:"user friendliness" (Score:5, Insightful)
This fits right in with the idiocy of hiding file extensions. Don't stress the user over whether it's a JPEG or a PNG, you can just show some completely random indication that it's a pic. But many users have a dangerous half-clue, so they know files extensions exist. So life is fine when they get CutePic.jpg.exe and the ".exe"is hidden.
That plus the decision to make the (hiddden) extension semantically meaningful by having it decide what actually happens when you click it was the most stupid decision ever in the realm of user interfaces. With the extensions hidden, you actually can't know if you're running an unknown exe, or starting an image viewer that will show a jpg. Either one without the other would be OK(-ish) and could have saved billions of dollars in lost productivity.
Re:"user friendliness" (Score:5, Insightful)
This is deranged-level stupidity at work and not "user friendlyness", agreed. The first thing I do when I have the misfortune to do a Windows installation is make it show everything. How was hiding stuff ever acceptable or a good idea? It leads to crap that people think they open, say, a picture but what they really do is run a script. And, TADA!, one of the most dangerous remote attack vectors (executable email attachment, these days often in more complex forms) is born!
Re: (Score:2, Interesting)
Part of it is because Unicode is broken and doesn't provide adequate support to developers. Part of it is because major operating systems like Windows are case insensitive and developers are really looking for how best to support NTFS and FAT, not how to avoid security issues on Linux.
Replacing Unicode would do a lot to help with this. Most of the rest is down to less than ideal operating system design.
Re:"user friendliness" (Score:4, Insightful)
Replacing Unicode won't help. Filenames are even allowed to contain broken Unicode. Do you want to get an error code from the kernel, because the encoding the userspace tries to use is unknown to the kernel?
Linus' point is, that the kernel should be agnostic of encodings. A byte is a byte. For case conversion you need to know the encoding and the uppercase-lowercase tables. And consider, that they change with new Unicode releases. That's a huge mess.
Do you really want to mount your filesystem with -o unicode_version=2025 so your 2030 system knows why two files with the "same" filename according to Unicode 2030 are allowed? And now think about the use-case of copying files around. So your hard drive is still mounted with unicode_version=2025, but your USB stick is mounted with unicode_version=2030. When you now copy two files with different filenames in 2025 and the same in 2030, you're suprised why your PC can store them and the USB stick overwrites the first with the second one.
Re: (Score:2)
I don't want to use case insensitive filesystems at all, but I can see that there is a use case for them. It's not wrong that it is interoperation with honestly broken operating systems, but that doesn't change the fact that I might need to do this awful, awful thing.
I'm not saying Linus is wrong about them being bad in any way, but they do have to exist, and they do have to do what Windows does in some cases. We cannot simply pretend Windows doesn't exist any more than I can pretend it doesn't suck.
Re: (Score:2, Troll)
Suppose Linux runs an application which tries to read/write from a file on a case insensitive third party source system.
The application itself shouldn't need to know what kind of filesystem is used by the third party (If you disagree, imagine thousands of applications which all have special code for every possible third party). So the responsibility for doing the right thing properly lies with the OS, which mean
Re: (Score:2)
Apart from that, all valid UTF-8 is allowed. Since you have to check whether a filename is valid anyway, having to check for valid UTF-8 is trivial.
Re: (Score:2)
MacOS does allow | in filenames, as does FreeBSD. Windows does not.
Which leads to problems if I save a web page as pdf on the Nextcloud share folder on my Mac and later try to access it on the same Nextcloud share folder on my Windows pc.
Because most web browsers have the default filename as the section of the website, and that is generally a decently descriptive name for the file, and there are often | characters in that. On Windows they get changed to something else, usually _, but MacOS web browsers don
Re: (Score:3)
Re: (Score:2)
Yeah, and that's why we need an official support library that handles that kind of thing and tells you when it doesn't make sense.
Re: (Score:2)
Re: (Score:3)
On MacOS, to count the letters in a Unicode string, you use Swift (which gets it right finally),
How do you know it gets it right?
Re:"user friendliness" (Score:4, Insightful)
IMHO the larger point of view of this class of problems is to allow multiple specifications. This introduces an ambiguity, which is fine and nice from some perspectives, but then you have to maintain this long-term.
A different example of the same beast is to introduce two ways to call a functionality in some library. Some codes will use one way, others another. Now you are stuck maintaining both ways. Diagnostic scripts become complex because there is more than one way to do it.
So there is a benefit to being stringent and avoiding ambiguity.
Re:"user friendliness" (Score:4, Insightful)
It DOES affect developers, when the "open file" code works as tested on their machine but not on machines with case sensitivity different. Then it becomes a problem for end users.
Re: (Score:2)
This fits right in with the idiocy of hiding file extensions
What about the idiocy of using file extensions? It's smarter to analyze the files to determine their formats. For performance, you reasonably want to cache that information, and I was sure the filesystem would do that for us by now, but trusting the extension is just about the last thing you should do anyway.
Re: (Score:2)
> This fits right in with the idiocy of hiding file extensions
Or having file extensions in the first place, that's the real underlying issue there. The name of a file shouldn't contain required metadata. Most operating systems, including Unix-like ones like GNU/Linux, but also OG MacOS and AmigaOS (to name but two), unhampered by the need to have a file system that, if you squinted at it, kinda looked like CP/M's, didn't feel the need to have every not-to-be-trusted executable end with ".EXE", ".COM", ".
Re: (Score:2)
If you are more or less directly interacting with a user: you will almost certainly want to be able to handle case insensitive searches. You'll probably also want to let them change the font and the text size and color and have the timestamps displayed correctly for their region and time zone.
It's just much less clear how much of that you want to make the filesystem's problem. You could drag a full timezone system into the filesystem so that timest
Well it's an incredibly hard problem (Score:5, Informative)
Suddenly you are dealing with natural language processing on a file system level. It may seem trivial to do case insensitive matching for US-ASCII, but on a grander scale it's not.
For example in German if you have a ß-character and you want to compare that word to an upper-case version, it can be either SS SZ or . There are characters for which no upper- or lower-case variant exists. Upper-case is language dependent. For example in German-German the upper-case variants of äöü are ÄÖÜ, while in Swiss-German it's not uncommon to write Ae, Oe and Ue.
It's just incredibly hard to do that correctly. Of course it might still be worth if there was some strong argument for doing it, however nobody has brought that forward.
Re:Well it's an incredibly hard problem (Score:4, Insightful)
Re: (Score:2)
Indeed. Because Microsoft selected the most stupid solution possible back when.
Re: (Score:2)
It was no big problem, while characters still were 7 bit. It got a mess when they got 8 bit and codepages. It is a catastrophe with unicode.
Re: (Score:3)
It is actually very somple to do correct: Different bit-pattern - different name. Done. The problem comes in when people that do not understand computing and have no clue about IT security think they have the perfect solution and then proceed to push that on people. In particular, Unicide should have been delayed at least 20 years longer. I talked to one of the original designers 30 years ago, and the securty implications of, for example, two code-points rendering on the same glyph or a glyph havig differen
Re: (Score:2)
The security problems don't come from case-insensitiviity or from glyphs having different encodings, but from your insistence that security requires reinventing the wheel, badly.
Re: (Score:2)
Did you even read what I wrote? I just argued for different bit-pattern - different name. And I did so because things like there not being any universal meaning for variants of glyphs. Did you intend to anwer to somebody else?
Re: (Score:2)
Ideally Unicode would come with a nice library that handles all that stuff for you. Unfortunately Unicode is kinda broken so it's not nearly as easy to do as it should be, but it's not impossible.
I'd implement it as language metadata, which is where the current implementation of Unicode is lacking because it is focused entirely on characters, and wrongly merged some of them too. Anyway, language metadata, which is then used to generate libraries for different languages and operating systems, so for the deve
Re: (Score:2)
Yes but we are talking about filesystems and natural language support. Which means that you'll somehow have to tell the file system which language the directory is in. We'd need to change all of the APIs for that. Particularly for multi-user systems where different users may speak different languages... or multilingual single-user systems, that's a nightmare.
You can't abstract that away from the programmer since it needs to know what rules it needs to apply. Having the same script doesn't mean the rules for
Re: (Score:2)
Well that's why I mentioned that Unicode is deficient because it lacks any concept of languages, only characters.
The use case is that most files are stored on case insensitive filesystems, at least on personal computers, because of NTFS and FAT.
Re: (Score:2)
So essentially you are suggesting an explosion in complexity with everyone having to change their computers in major ways... just to replicate some functionality in some file systems? A functionality most users wouldn't even know is there?
Re: (Score:3)
The two most common desktop systems are Windows and MacOS. Both, par default, use case-preserving, case-insensitive file systems. This isn't a minor use-case. It is the majority of computers that people use day to day. "Everyone having to change their computer" really means Linux users having to change their computer's way of functioning when interacting with these systems (external drives and SMB network shares are likely the main ways). If you want to interoperate, it is a requirement.
Re: (Score:2)
But where is the use case. You can store all the data from a case-insensitive file system on a case sensitive file system. Users typically click on filenames or use autocomplete so they don't care about case-insensitiveness. There is probably no software around that fails if it could create "FILE" and "file" in the same directory.
It's like saying, Linux should strip long filenames as Windows can just do 8.3. Yes you can use Samba to provide short file names to Windows systems, but that's an issue that must
Re: (Score:2)
The computer only knows the user types a bunch of characters with a particular keyboard layout. The computer doesn't know which exact language a character belongs to. There is no automatic and reliable way to assign the language metadata. Multiple languages can be mixed together in the same document or database record entry.
Since multiple languages can mix together, Unicode must focus on characters and embed whatever language metadata into some form of characters. Side channel is not going to work.
Re: (Score:2)
Re: (Score:2)
For example in German if you have a ÃY-character and you want to compare that word to an upper-case version, it can be either SS SZ or . There are characters for which no upper- or lower-case variant exists. Upper-case is language dependent. For example in German-German the upper-case variants of ÃÃü are ÃÃ-Ãoe, while in Swiss-German it's not uncommon to write Ae, Oe and Ue.
Oh, yeah!
Locale dependent file system behavior! What could POSSIBLY go wrong!
Unicode is a mess (Score:3)
I agree that the questions of whether case-insensitive file names should be supported, and how to do it indeed are two different questions.
You can't always get to choose to implement only the features that you want. For most of us developers in the real world, there are always external demands.
I'd think that for a file system, the technical problems are about 1) filtering which filenames are allowed when renaming/creating a file, 2) collating filenames when doing lookups. #2 depends on #1: you can only look up filenames that exist.
I once started writing a text editor, but got bogged down so much in the difficulties of the tangled mess that is Unicode that I eventually gave up on it.. for a while at least.
But it made me form an opinion on how to approach it: No program can support all of Unicode. It is practically not possible. You'd have to restrict yourself to supporting a subset, and then make sure that all your text is only within that subset.
Then be honest and document how the restrictions meet the demands from the users.
This applies of course to Unicode filenames in file systems, and for file-systems there is a long history of legacy systems with restrictions in what file names are allowed, so I don't think (experienced) users would be surprised. ... but you'd have to make sure that filtering decision is only in one place in the software stack, so that you have control over the behaviour when the new revision is released.
If you are designing a new piece of format -- such as a file system --, then you yourself get to choose your subset of Unicode to support -- and to choose a subset that you are able to transform into a case-insensitive form.
You could expand the set of supported Unicode in a future revision
Instead of bickering, I think the Linux kernel developers who work on the core should help the file system developers on that last mentioned aspect.
Re: (Score:2)
Unicode was always a mess and badly designed. I talked to one of the designers anout 30 yeaers back. They nevwer even though about security implications, for example. Now we have crap like different codes rendering on the same glyph, causing URLs that look to go to one side to actually go to another. Or we have source-code that you cannot read without a converter because the letters are Sanscrit. And more really bad ideas.
Re: (Score:2)
From what I've seen, the problem is normal users (Score:3)
You know, most of the normal-ish people who employ us and/or work with us. Try to explain to THEM why it matters in what case they type those letters - you'll either see their eyes quickly glaze over, or you'll see their opinion of you quickly plummet (or possibly both). Soon, women won't talk to you, children will point and laugh, and dogs will start growling as you walk by.
I've gotten to where I just advise people to always stick with lower case - and, if asked "why", I tell them so that it's not confusing for those poor "others" who aren't as savvy as they themselves are. It seems to work.
Re: (Score:3)
It is _rally_ simple: An "a" and an "A" are different, right? And you write it "Joe", not "jOe", right? Anybidy that "has their eyes glazing over" at that is beyond help...
Re: (Score:2)
It isn't that simple.
The problem is that case matters but it also doesn't matter. It depends upon context. For example "It is raining" and "I hope that it is raining" ... do we have two different words "It" / "it"? The VAST majority of people would say no, there is just a single word - they are not different despite being encoded/displayed differently. The capitalization is there to make it clear where the sentence starts and not because "it"/"It" are different. Sometimes case does distinguish meaning. "One
Re: (Score:2)
Sure, from a computer file security standpoint, case sensitivity makes sense.
As you point out, people will regard some things as functionally the same, even when they use different characters. Hence case insensitivity makes more sense for people.
And then there is the whole raft of instructions that have something like: "Enter the text above (without quotes)". Some people will regard an instruction of Enter the system name ("main_system" in this case) that the system name is main_system. And others will thin
Re: (Score:2)
Sorry, but we are talking about _names_ here. And there, it is that simple.
Re: (Score:2)
I'm going to steal a comment I just read in osnews (user cevvalkoala)...
Set up a firm called NIKe and try defending that at a trademark court, arguing case sensitivity.
Write a book named HaRRy PottER and see if you’ll get sued or not.
Re: (Score:2)
Trademark law prohibits similar trademarks. Hence you have no point. But let me use this one: How about I call you "mr. BArKEY". Is that your alias here?
Re: (Score:2)
Try Horry Pottar and you get the same problem, even when your filesystem can distinguish between the names.
Re: (Score:2)
Sorry, but we are talking about _names_ here. And there, it is that simple.
Seems so simple I don't understand why some aren't getting it.
Capital S = 01010011 Lower case s = 01110011 I mean - that is what you are saying - different characters are different in binary, right? I just woke up and only on my first coffee, so maybe I'm confused?
Re: (Score:2)
You got that exacly right. What probably confuses you is that there are actually, incredibly people that disagree with this blatantly obvious, simple and robust approach. I have learned that most people are not smart but really do not know that and hence voice the most stupid things with confidence. Hence I am not confised by that anymore. The Dunning-Kruger effect is probaly the most important finding about humans, ever.
One thing is that some people always need to be contratian and desperately need to feel
Re: (Score:2)
The conversion table for characters other than ASCII changed over time and will probably change in the future again. The point Linus is making is, that your program can't rely on the filename always meaning the same when it has to rely on Unicode tables. You can rely on it being the same if you treat it like raw bytes, though. And now consider a filename containing broken Unicode. What should the normalization do? You get a lot of special cases when normalizing Unicode. And ambiguous filenames (not in displ
Re: (Score:2)
Re: (Score:2)
Indeed. And that is why filenames _need_ to be treated as raw bytes for the purpose of equality. Anything else will only cause problems with no compensating benefits at all. Unix got that right. DOS, Windows, MacOS screwed it up and it is a completely unnecessary but serious problem.
I talked to one of the original Unicode designers 30 years ago. They were not even aware this had security and other implications back when. It was like these people never considered that strings can be subject to computations a
Re: (Score:2)
"Hatred" is strong, but it is stupid (Score:2)
In particular, the MS people made the most stupid decision they could here. Not the only place where they did that.
The sane thing is to see different bit-patterns as different names. Anything els is a hack and deeply stupid. If you really need something like it, do it on the application side to make sure everybody gets how stupid the idea is and it gets avaoided. And do not provide any library or kernel support for it, ever.
Case-insensitve but not... (Score:2)
Homoglyphs-insensitive! That would be a fun filesystem. Detect homoglyphs.
Re: (Score:3)
You can't have homoglyphs in todays environment.
Re: (Score:2)
You can't have homoglyphs in todays environment.
It's non-binary glyphs, you insensitive clods!
An indication.. (Score:2)
Case sensitive is just wrong (Score:3)
Re: (Score:2)
Re: (Score:2)
Actually that's a pretty good illustration against Linus' point.
Do it on the right layer (Score:2)
Linus is correct that case-insensitive filenames can be problematic at the filesystem level. However, there is no issue in having a file manager that normalizes filenames. For instance, if "case-Insensitive.txt" already exists, a save-file dialog could reject saving "case-insensitive.txt" or warn "File exists with different capitalization. [Overwrite existing file] [Create second file with different capitalization]" allowing the user to avoid creating duplicate files due to varying capitalization.
Re: (Score:2)
Linus is correct that case-insensitive filenames can be problematic at the filesystem level
He is absolutely not correct. I've been using case-insensitive filenames on my Mac and iPhone for over 30 years with no problem.
The only problem that I ever had was over 35 years ago: Say you give a command to change the name of a file from X to Y. The file system will find a file named "X" according to its rules, give an error if there isn't one, and change the name to Y. Someone decided to make an optimisation: If you change the name from X to X then _nothing_ is done. Makes sense. But they applied tha
Re: (Score:2)
Year of Linux on the Desktop (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
This. Linux has always been built by hobbyists, for hobbyists. This attitude from Torvalds is an example of why it doesn't catch on with regular people. No regular person would think a file name MyNotes is a different file from mynotes. If you can't meet people where they are, they will not use your tech.
Re: (Score:2)
The problem with case sensitivity (Score:2)
File systems exist to organize, store, and retrieve data. They're useless if people can't find what they want in a large collection of files.
You know what does a great job of hiding files from users? Case sensitivity.
I don't care how 'technically superior' case sensitivity is, if you're championing it for a user-exposed file system, you're wrong.
Re: (Score:2)
I remember having to check for both cases of a letter in AppleSoft so I can see Linus's point, but I also wasted a lot of time in Linux searching for files because I was looking for the wrong capitalization. Then someone invented CamelCasing because spaces were evil or were used to separate commands, but the _ was too hard to type and the - just wasn't popular for some other reason.
So is it camel casing, Camel Casing, Camel casing, CamelCasing, camel_casing, camel-casing, or some combination?
Computers are s
Re: (Score:2)
>I remember having to check for both cases of a letter in AppleSoft so I can see Linus's point
I've done some coding, but likely nothing compared to what most in this discussion have done, so forgive me if this is a question with an obvious answer:
Why not convert your search variables to all lower case? And optionally, convert all spaces to underscores. If there's no pre-existing routine for that, you only have to write it once.
Re: (Score:2)
Ok (Score:2)
So what you're all saying is that VMS (still) rocks!
Re: (Score:2)
Had to scroll way too far for this comment. AS/400 and mainframe stuff was case insensitive for a long time.
Technically right (Score:2)
There is zero value in being able to have the following folder names
Temp
tEmp
teMp
temP
Zero. Value. Only confusion and broken code to be found here.
Re:MacOS (Score:4, Interesting)
Try rsync-ing directories from a Linux machine, and you might observe where the macOS "case-insensitive but case-preserving" paradigm breaks down.
Re:MacOS (Score:5, Interesting)
Case-aware but case-insensitive is the absolutely most stupid choice possible, and it copmes (AFAIK) from Windows. Typical MS quality "decision making". If the bits are different, the name needs to be different. Anything else is a disaster and an invitation for serious problems and attacks.
Re: (Score:3, Informative)
It goes back to at least the original MacOS (mid-80s). Microsoft just copied the Mac's way of doing things for Windows 95. As I said in another post, Apple probably did user-testing to determine which one to use (they were big on user-testing when they created the Macintosh).
And in the mid-80s, cyberattacks really weren't part of the design criteria!
Re: (Score:3)
It goes back to at least the original MacOS (mid-80s). Microsoft just copied the Mac's way of doing things for Windows 95.
I remember this being in DOS as well. None of the OS utilities ever created lowercase entries in FAT, but if you had some program which did so, it didn't cause any problems. My first DOS was 3.0 I think, I had a 5150 but it was upgraded with an AST card with another 384kB, a RTC, and IIRC a decent UART like a 16450, and I had a 30MB MFM disk on a Xebec controller. I used a lot of weird addons to try to make the interface less terrible at different times. 4DOS, cshell (a layer for the DOS shell that would sw
Re: (Score:3)
This is an even older legacy problem that got solved stupidly. Remember punchcards? No lowercase letters in these. Then lowecase letters became available and the morons making the decision just decided to map them to the corresponding uppercase letters. Unix got it right: Different code, different letter. The ElCheapo option got it wrong and it is still wrong today.
Re: (Score:2)
I honestly think the case insensitivity of those ultra limited platforms was a reasonable idea, because statistically nobody was using GUIs at the time and managing those filenames would have been irritating with the terrible CLIs in CP/M, DOS and so on, which were necessarily bad in order to save memory on systems with 64kB or even less. I believe that the failing in Microsoft land was that they didn't go to case sensitivity when they moved to long file names. They could have easily stuck with case insensi
Re:MacOS (Score:4, Informative)
Apparently the situation is a bit messy on MacOS too:
https://mjtsai.com/blog/2017/0... [mjtsai.com]
(not about case-insensitive filenames in an update below the main article)
Re: (Score:2)
Linux is built from the beginning around case sensitivity, so switching the other way breaks things. The use case they are trying to figure out is interoperability between Windows (especially games, like Valve) and Fat32.
Re: (Score:2)
I have a git repository in which some files (for very good reason) have identical names except for capitalization. If I check it out on a Mac, some of the files disappear because of name collisions.
Case insensitive filesystems are broken, on Mac just like everywhere else.
Re:technical project management reply to module ow (Score:5, Interesting)
Treating upper case and lower case letters the same is asking for trouble. Keep them separate things at all cost. Or don't allow characters other than ASCII characters in file names (which creates all sorts of additional problems).
Re: (Score:3)
Don't forget about the dotless i/I with dot above horror in Turkish, where you actually should do the localization at word, not text, file or sentence level (nobody would expect the i in foreign proper nouns like e.g. Biden or Florida to lose its dot when going through any kind of "normalization").
Re:technical project management reply to module ow (Score:5, Informative)
The issue is that different languages have different rules for "case insensivity". In French for instance, the accent disappears when converting a lower case letter into an upper case letter.
Some people may be doing that but that's not the right way. At least definitely not a rule of French:
https://dictionnaire.lerobert.... [lerobert.com]
https://vitrinelinguistique.oq... [gouv.qc.ca]
https://www.academie-francaise... [academie-francaise.fr]
Re: (Score:2, Insightful)
Whatever the technical issue, the underlying issue is that humans have to use computers, and humans don't see case as significant. To a human, the file name MyNotes is the same as the file name mynotes, and they certainly wouldn't expect the two file names to be two separate files. If you didn't have to worry about humans, then Torvalds is right. But if you want humans to use your system, then you have to design it for humans.
Re: technical project management reply to module o (Score:2)
Re: (Score:2)
humans don't see case as significant
Humans *do* see case as significant. Do the names ExpertsExchange and ExpertSexChange look the same? Should they be the same file?
Can you come up with an algorithm that accurately and predictably determines when humans will or won't see two names as the same?
If the name uses an alphabet the user isn't familiar with, can the user accurately determine which lower-case glyph matches which upper-case glyph, in the same way that the computer does?
What if the user or name uses a language different from the one
Re: (Score:2)
Humans *do* see case as significant. Do the names ExpertsExchange and ExpertSexChange look the same? Should they be the same file?
Depends on what is being capitalized. Humans may not notice EXcel.exe is not the same as Excel.exe and assume they are the same application. The counterpoint to Linus is domain names are not case sensitive. The URL after the domain name is case sensitive. Imagine if every website owner had to register all case sensitive variants of their name to avoid malicious actors from setting up fake websites. Chase Bank would need to own Chase.com, cHase.com, CHase.com, etc.
Re: (Score:2)
Similarly, would a human expect "MyNotes" and "My Notes" to be the same file? If so, should the filesystem also handle such cases?
What about "MyNotes!" or " MyNotes"?
Do humans consistently consider filetypes/filename extensions to be part of the name or not? Should "MyNotes.txt" be the same file as "MyNotes.rtf"? What about "MyNotes.txt.txt" or "MyNotes.1.txt"? Should filename extensions use the same rules as filenames?
Re:technical project management reply to module ow (Score:4, Insightful)
Unicode has rules whether letters are the same. These rules are locale independent. It doesn't matter whether French or German has different and possibly changing rules.
HFS+ and AFPS on MacOS and iOS have non-trivial rules whether two strings compare equal. That's what you need: A set of rules telling you whether two strings are equal. If I take the same hard drive, these rules must always give the same result. That's all you need.
In the Eszet case, the rule is quite simple: Character comparison to check string equality is done inside the file system, and it is explicitely not done according to the rules of the latest Unicode version, but according to the rules of one fixed version. Eszet and ss, Ss, sS or SS compare equal in a case-insensitive comparison in the file system. They did when the file system was created, and they always will, forever. And the unicode codepoint that is nowadays used for capital Eszet didn't compare equal in a case-insensitive file system comparison back in the day, and it never ever will.
Now if some userland code does incorrect case-insensitive comparisons, that's it's fault. And note that HFS+ and AFPS don't compare codepoints, they compare letters according to Unicode rules. So you can have 1, 2 or 3 code points representing the same letter.
Summary: You need ONE 100% fixed set of rules that tells you whether two filenames are considered equal or not. These rules must be fixed forever. And this comparison should be made available to all code. It would be useful to have a consistent hash function as well.
Re: (Score:3)
I spent a bunch of time on this a number of years ago. The case insensitive filenames are fundamentally problematic. The best that can do done is to have an authoritative file name, and one or more user facing file name(s). Trying to compare file names in a case insensitive manner runs afoul of these issues:
1. Character sets. Even the Windows Unicode character set gets expanded, so there is no invariant character set. This pretty much ends any concept of having a fixed set of rules.
2. File systems cr
Re:technical project management reply to module ow (Score:4, Insightful)
Overstreet ignored that and went on to insult the dcache code that Linus wrote, without knowing that Linus wrote it, or understanding the code itself. Linus was not impressed from a technical point of view.
Re: (Score:2)
Typical case is developers doing everything on Windows with no regards for case sensitivity in their file names and imports then they are all surprised their apps don't work when deploying on a Linux server.
Re: (Score:2)
You may not like it, but any OS with an X at the end was generally considered to be a support nightmare back then. Has much changed today?
No, a lot of people are still making a lot of dumb excuses.
It is common for Windows shops to replace tons of functionality in Windows with third party software, even just basic functions like remote software installation or VPN, to say nothing of security monitoring. That's because Windows is a support nightmare. But somehow operating systems ending in x are bad.
With that said, it still seems logical that if you're implementing filesystems with case insensitivity in the Linux kernel, you're going to need th