Forgot your password?
typodupeerror
Ubuntu Data Storage Linux News

Ubuntu Will Switch To Base-10 File Size Units In Future Release 984

Posted by Soulskill
from the stay-above-the-belt dept.
CyberDragon777 writes "Ubuntu's future 10.10 operating system is going to make a small, but contentious change to how file sizes are represented. Like most other operating systems using binary prefixes, Ubuntu currently represents 1 kB (kilobyte) as 1024 bytes (base-2). But starting with 10.10, a switch to SI prefixes (base-10) will denote 1 kB as 1000 bytes, 1 MB as 1000 kB, 1 GB as 1000 MB, and so on."
This discussion has been archived. No new comments can be posted.

Ubuntu Will Switch To Base-10 File Size Units In Future Release

Comments Filter:
  • by g-to-the-o-to-the-g (705721) on Saturday March 27, 2010 @12:03PM (#31639998) Homepage Journal

    If you read closely, you'll see that the summary is kind of misleading. What canonical is actually doing is using SI prefixes for base-10 units, and IEC prefixes for base-2 units.

    In other words, they will use 1kB for 1000 bytes and 1KiB for 1024 bytes. This is a good thing, it just means the UI should be consistent and you don't need to second-guess.

  • by Svartalf (2997) on Saturday March 27, 2010 @12:16PM (#31640126) Homepage

    Considering that kilobytes predates SI units...I kind of doubt that it broke the established anything.

  • Absolutely BS (Score:2, Informative)

    by Island Admin (1562905) on Saturday March 27, 2010 @12:17PM (#31640142)
    Oh this makes me sooooo grumpy. FFS, who does the International System of Units think they are. 1024 does equal 1 kilobyte ... always has been. That's what I was taught in school. If I had answered 1000 bytes = 1 kilobyte, it would of been zero marks.

    According to the Oxford Dictionary: noun Computing a unit of memory or data equal to 1,024 bytes.
    According to Websters Dictionary: A unit of information equal to 1024 bytes.
    According to Cambridge Dictionary: a unit of measurement of computer memory consisting of 1024 bytes
    According to http://dictionary.reference.com/browse/kilobyte [reference.com]:

    –noun Computers.
    1.1024 (2^10) bytes.
    2.(loosely) 1000 bytes. Symbol: K, KB

    So until the guardians of the English language change .... 1 kilobyte = 1024 bytes. Finished.
  • Re:Really annoying (Score:1, Informative)

    by Anonymous Coward on Saturday March 27, 2010 @12:26PM (#31640230)

    CMD+I to open the info window, which shows you the exact number of bytes. If file size concerns you that much, you ought to know it to the byte, not to the thousandth or millionth byte.

    For example: 2.56 GB on disk (2,561,880,064 bytes)

    Makes much more sense than 2.56*1024*1024. Join the rest of the world with the SI units.

  • by gweihir (88907) on Saturday March 27, 2010 @12:28PM (#31640260)

    1kb never was 1024 bytes. It is either 128 bytes or 125 bytes. 'b' is bit, 'B' is byte and that distinction is rather important.

  • by Schraegstrichpunkt (931443) on Saturday March 27, 2010 @12:31PM (#31640298) Homepage

    a few years ago you didn't need to: 1kb was 1024 byte. it was defined like that.

    No, it wasn't. It meant, variously: 1000 bytes, 1024 bytes, 1000 bits, 1024 bits, or "approximately 1000 bits/bytes". There was also the goofiness that if you transferred at 64 kbps for 10 seconds, you ended up with 62.5 kb, and when you formatted your 10 GB hard drive, you ended up with only 9.3 "GB" of space.

    It confuses ordinary people for no good reason.

  • Re:Interesting (Score:3, Informative)

    by LBArrettAnderson (655246) on Saturday March 27, 2010 @12:39PM (#31640382)

    Yes, I may have been wrong with "for the most part," but there are definitely some (at least one) manufacturers who are advertising capacities with base 2 sizes. OCZ (one of the biggest vendors of SSDs), for example. Their "120GB" drives are 121.60 GB, and I assume they're actually the same size as other companies' 128GB drives. What I was getting at was both industries have started to "cave" at the same time, so we still aren't sure which way we're going.

  • by darkpixel2k (623900) <aaron@heyaaron.com> on Saturday March 27, 2010 @12:42PM (#31640422) Homepage

    Hard drives, on the other hand, have nothing that is fundamentally based on a power of 2.

    Well--except that pesky material on the surface of a disk that can store either a '1' state or a '0' state. Most people call that a 'bit'. Strangely enough, that 'binary' state is conducive to measuring in powers of two...

  • by Fastolfe (1470) on Saturday March 27, 2010 @12:45PM (#31640446)

    before that communications was already using SI kilo

    If you're talking about "communication" terms like megabits, this is because the base is 'bit', not 'byte'. The confusion only exists when you're talking about bytes. Anything dealing with bits has always been base-10.

  • by deniable (76198) on Saturday March 27, 2010 @12:45PM (#31640448)
    Only if you're using 8 bit bytes.
  • by tuomoks (246421) <tuomo@descolada.com> on Saturday March 27, 2010 @12:50PM (#31640512) Homepage

    Sorry, 512 or whatever base-2 sector size is not arbitrary - the disk controlling hardware / buffers / controllers / channels / etc and especially the transfer sizes, multipliers in headers, and so on are (still) base-2. If you ever do performance / capacity calculations or estimates for storage size, etc, you very fast find base-2 very handy.

    The disk size error is not a big deal - there always is an overhead that changes by storage type, file system, fixed physical characteristics, key / data compression used, replication, whatever - so? The public (and I think many in IT) really don't know and/or have to know more than if they have enough or need more!

  • Re:Annoying... (Score:5, Informative)

    by Kjella (173770) on Saturday March 27, 2010 @01:08PM (#31640706) Homepage

    Because the context is a problem every time you mix computers and what you're doing on a computer. Let's say you record a CD, 16 bits/sample @ 44.1kHz. That's a bitrate of 16 * 44.1 = 705.6 kbit/s second right? If I want to send it over the LAN too? What if I need to allocate a memory buffer, is it still 705.6 kbit/s? And what if I want to store it to disk, do I need to allocate 705.6 kbit per second of music? Computers aren't not remotely consistent with themselves, a 100 Mbit LAN is 100,000,000 bits/second. Hard drives too but they're hardly the only ones, floppies weren't even consistent with themselves most being 1.44*1000*1024 bytes.

    Things get confusing all the time because a 1 MB, 1 KHz (1024*1024*1000) bus is not equal to a 1 kB, 1MHz bus (1024*1000*1000) which is why everyone dealing with networks never used kilo = 1024. The 56k modem is 56,000 bits, ISDN is 64,000 bits and so on right up to SATA 6Gbit/s which is 6,000,000,000 Gbit/s (and even more confusing because it's in 8/10 bit encoding, but that's another story). So both inside and outside the machine we're switching between base 2 and base 10 all the time.

    A particularly confusing item was codecs. Should they follow the "size" standard so a 128 kbit/s MP3 would take up 128 kbit/s, or the network standard so that a 128 kbit/s would take 128 kbit/s of network bandwidth? I think now most settled on k = 1000, that is to say if you encode a one second clip at 128 kbit/s it'll only take up 125 kbit on your disk. Confusing as fuck? Hell yeah. Let's just settle this and be done with it, with the i = base 2, without it base 10. Just forget the lame names, and let the prefixes do the talking. MB = megabyte, MiB = megabyte. That's what I'm doing at least.

  • Re:Really annoying (Score:3, Informative)

    by Schraegstrichpunkt (931443) on Saturday March 27, 2010 @01:11PM (#31640722) Homepage

    I've been ... for 20+ years and I do _not_ want to change

    People like you are best ignored.

  • by gnasher719 (869701) on Saturday March 27, 2010 @01:21PM (#31640826)

    The only people who lie about this have been the HDD manufacturers. Wasn't there a class action about that some time back? I expect the court didn't understand the problem of my 120GB drive actually being under 112!

    It seems the court understood the matter very well and decided that when a hard drive contains 120 billion bytes, and the prefix "G" means "1 billion" as all international standards say, then calling it a "120 GB" hard drive is absolutely justified and correct, and it is not the hard drive manufacturer's fault if some idiot programmer displays it incorrectly as 112 GB.

  • by hanabal (717731) on Saturday March 27, 2010 @01:34PM (#31640918)

    he kilo prefix is derived from the Greek word ("chilioi"), meaning thousand. It was originally adopted by Antoine Lavoisier and his group in 1795, and introduced into the metric system in France with its establishment in 1799.

    So while "SI" wasn't around. It was already as established standard

  • Re:Good move (Score:4, Informative)

    by kevingolding2001 (590321) on Saturday March 27, 2010 @02:54PM (#31641620)

    Before, the situation was simple.

    Everything not binary-represented-information related used base-10.

    Everything binary-represented-information related (computing related, bandwidth related etc) used base 2,

    According to Wikipedia, big bandwidth [wikipedia.org] is measured in base 10.

    I guess it was not so simple after all.

  • by prockcore (543967) on Saturday March 27, 2010 @04:17PM (#31642252)

    All file systems structures are in base-2 units.

    Well that's not true in the slightest.

    Let's look at the Apple II floppy disk, just because I happen to have actual stats on that from writing an emulator.

    The only part that's base-2 is the amount of data in a sector. 256 bytes.
    That 256 bytes is encoded in "6-and-2" encoding, making it actually take up 342 bytes on disk. Not base-2.
    There are 74 bytes of sector header, and self-syncing data attached to that, making each sector actually take up 416 bytes on disk. Not base-2.
    The self-syncing bytes are actually 6 sets of 10 bits each. 60 bits. Not base-2.
    Each track can hold approximately 6900 bytes. That allows around 16.5 sectors per track, but since you can't have half a sector (and the amount of data that can be stored on a track isn't absolute), they round down to 16 sectors per track, and leave the rest of the track unused. (Earlier formats only had 13 sectors per track because they used "5-and-3" encoding, so each sector took up more space on the track).
    The disk has 35 tracks per side. That's not base-2.

    The only thing involved in this disk that is base-2 is the amount of data stored per sector, and that was a completely arbitrary decision to have it match RAM.. you could actually store more data on a disk if sectors didn't store exactly 256 bytes.

  • by commodore64_love (1445365) on Saturday March 27, 2010 @05:31PM (#31642808) Journal

    The only time 1 KB == 1024 bytes is when discussing memory due to the base 2 nature of computers/transistors. In all other cases kilo == 1000 per the definition from ~200 years ago.

    When I say I have a 750k connection, I don't mean 750*1024. I mean 750,000 bits per second. My hard drive is 500 gigabytes or 500 billion bytes. A 20 kilohertz AM station is not 20*1024 but simply 20000 hertz (cycles per second) wide. And so on.

  • Re:Mod parent up (Score:1, Informative)

    by Bigjeff5 (1143585) on Saturday March 27, 2010 @07:48PM (#31643632)

    Unfortunately as everytime we leap another 10^3 we're off by another 2.4%, and by the time we get to 10^12 we're off by 10%.

    It only looks that way because we aren't leaping by 10^3, we are leaping by 2^10. The byte itself is 2^3.

    Notice any similarities? Base your kilos on binary units instead of decimals and kilobytes = 1024 is what you get.

    The problem is we are representing binary units in decimal, to make things easier on ourselves, and then trying to treat those units exactly the same as ordinary decimal units. They aren't exactly the same.

    If manufacturer's didn't try to fudge their numbers by breaking convention, a "300gb" drive would actually show up as 300gb exactly in your computer. But manufacturers have fooled us. They use the 2^3 byte, just like everybody else, but they don't want to use 2^10 for figuring kilo, mega, and giga like everybody else.

    Basically the hard drive manufacturers have been screwing you for years, and you're defending them for it. You don't "lose" 2.4% per 10^3, the HD manufacturers have been lying to you by 2.4% each iteration by breaking the 2^10 convention.

    I'm kinda surprised to see that it is the Linux community that has fallen so hard for this deception.

  • by Artemis3 (85734) on Saturday March 27, 2010 @07:55PM (#31643680)

    I read the policy and consider it correct.

    There are two ways to fix the abuse of the SI standard for base-2:

    1. Correct the application to divide by 1,000 and keep on using SI prefixes.
    2. Correct the application to keep on dividing by 1,024 but use the IEC prefixes.

    So, use the IEC prefixes and you don't need to change much. Its just a little i.

    Furthermore, this was approved in 1998. Don't you think you had enough time to adapt by now?
    http://physics.nist.gov/cuu/Units/binary.html [nist.gov]

    I'm sure there are more distros and programs implementing this.

  • by Bigjeff5 (1143585) on Saturday March 27, 2010 @08:34PM (#31643884)

    SI doesn't own kilo, Greek does.

    The metric system is based on powers of 10. Oddly enough, kilobyte is based on 10 powers of two. Now, you say that isn't the same thing, and you're right. That's because decimal doesn't mesh well with binary, but we understand decimal much better than we do binary.

    Kilobytes, megabytes, and gigabytes are metric representations of binary. 2^10, 2^100, 2^1000, it's the exact same concept. That's why the chose to use kilo, mega, and giga in the first place - because it is conceptually the same thing, and just as easy to understand if you know what it actually means.

    disk makers have been using kilo to mean 1000 for years, and it'll probably never be really sorted out.

    That's because disk makers have been trying to make their disks look bigger than the actually are for years. Coming up with a new unit that nobody understands, which fucks with the entire nomenclature used throughout the computer, all to accommodate hard drive manufacturers as they try to convince you that they are selling you more than they actually are, is fucking retarded.

    Whoever came up with the kibibyte probably works (or worked) for a hard drive manufacturer.

  • by street_astrologist (1522063) on Saturday March 27, 2010 @11:36PM (#31644784)

    By posting while logged in - duh

  • Re:Mod parent up (Score:4, Informative)

    by GigaplexNZ (1233886) on Saturday March 27, 2010 @11:59PM (#31644922)

    It only looks that way because we aren't leaping by 10^3, we are leaping by 2^10.

    Except that the hardware doing the "leaping" are the hard drives that are measured in base 10, so we are "leaping" by 10^3.

    The byte itself is 2^3.

    First of all, a byte is not always defined as 2^3, it depends on the architecture (you are thinking of the octet, which on most common hardware is equivalent to a byte). Also, the size of a byte is not governed by SI prefixes as it isn't a prefix. Applying an SI prefix to the byte should still follow SI prefix rules.

    Notice any similarities?

    I noticed that 2^3 is 7 orders of magnitude off (in a power of 2 system) being a match to the 2^10 convention. The only reason 2^10 is chosen is because it comes close enough to emulating 10^3. Why emulate when you can just use 10^3 directly?

    If manufacturer's didn't try to fudge their numbers by breaking convention, a "300gb" drive would actually show up as 300gb exactly in your computer.

    While it might be a convention to use 2^10, it's not the SI standard (so it shouldn't try to redefine SI prefixes, it should use it's own system). Also, on my computer it does report as 300GB in the partitioning tool that I use (cfdisk).

    You keep coming back to the byte being 2^3. This is a somewhat silly argument as the byte is the smallest addressable unit in RAM. The size of the byte itself is largely irrelevant to the discussion of the prefix definition.

You've been Berkeley'ed!

Working...