Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Fedora Metrics Help Whole Linux Community

Posted by kdawson on Tue Jan 30, 2007 03:42 PM
from the doing-the-numbers dept.
lisah writes "When Fedora released Fedora Core 6 late last year, the team decided to track the number of users with unique IP addresses who connected to yum in search of updates for a new installation of FC6. According to the data they collected, FC6 crossed the one-million user mark in just 74 days. Fedora Project Leader Max Spevack says that while it's great to use metrics to better understand what users want, the real value lies in its ability to encourage hardware vendors to more offer more Linux-oriented goods and services. Spevack told Linux.com: '[W]e always say we wish hardware vendors had more [Linux-capable] drivers. Well, if you can go to them and say, "Hey, there's millions of people using this," then maybe they will listen. In the real world, you need data to prove your case. Well, here it is.'" Linux.com and Slashdot are both owned by OSTG.
+ -
story
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • But.... (Score:5, Funny)

    by CaymanIslandCarpedie (868408) on Tuesday January 30 2007, @03:45PM (#17818856) Journal
    Doesn't collecting data make you evil?
    • No (Score:5, Insightful)

      by DrYak (748999) on Tuesday January 30 2007, @04:01PM (#17819116) Homepage
      Collecting non-personally identifying data, that would be logged anyway during the normal process of the server function (httpd/ftpd daemons will log connection anyway wether or not FC owners choose to do something out of it) and publishing only the compiled form (the total number. Opposed to the complete obfuscated [rot5 scrambled ?] list, AOL-style), ISN'T EVIL (It just similar to the "number of visitors" counters back in the old Web 1.0 days).

      Collecting data in an opt-in manner like http://counter.li.org/ [li.org] to do statistic. ISN'T EITHER

      Collecting data, that don't necessary need to be collected for technical reason (IP address vs. Pentium serial number), without telling it the user first, without asking permission to the user first, THAT IS EVIL (and regularly done by microsoft and other object of hatred from the /. crowd).
        • by Kelson (129150) * on Tuesday January 30 2007, @04:36PM (#17819700) Homepage Journal

          Except collecting the IP addresses then using them for marketing purposes is not necessary

          How are they using the IP address for marketing purposes? They're using the number of IP addresses. No one can take the information they've released and determine that a computer at x.x.x.x is running Fedora. (And the information they have, they would have had anyway -- just like Slashdot knows the IP address you posted from.) As the GP said, it's no different from a website processing its server logs and reporting that it had X unique visitors during period Y.

          Come to think of it, since yum fetches data over HTTP, it is a website processing its server logs and reporting the number of unique visitors.

            • There might be an outcry if Microsoft did that, just because people hate Microsoft and think Microsoft is evil, but that wouldn't mean that doing it would be evil. (So, Microsoft may in fact be evil, but not necessarily everything they do is evil, and moreover, just because they could do something, doesn't make it evil.)

              There's nothing wrong with saying "x people accessed Windows Update this [year|month|day]." That's no different from the hit counters that used to exist on every web site. (And which were tacky, and I thank God that people finally realized this.)

              What would be evil, and the temptation they need to avoid, is to take their server logs and start mining them for data that can be sold or used for malicious purposes; i.e. personally identifying information about what users are using what versions of Windows, or even how often they're updating, etc.

              Aggregate information about hits is something that HTTP servers and their operators do all the time. Where it gets evil is when you have cookies tracking particular users across multiple sites, etc.
        • That's why the parent said "in compiled form." Red Hat isn't publishing the IP address list it has collected, it is compiling the number of unique IP addresses seeking FC6 upgrades and using that number as a statistic.

          This is no more 'evil' than the management of Dolphin Stadium in Miami counting the number of people who pass through the turnstiles and publishing that number to show how many people came to Miami to watch the Super Bowl.
    • Doesn't collecting data make you evil?

      Only if you call the process "activation" instead of "metrics".
      • Re:But.... (Score:5, Informative)

        by spevack (210449) * on Tuesday January 30 2007, @04:51PM (#17819968) Homepage
        the day they start requiring registration or creating GUIDs is the day I give the shove to Fedora -- and I've been and RH users for 8-9 years.

        As the "Fedora Project Leader", the Fedora buck stops with me, so to speak.

        And I promise you that I will NEVER require anyone to "register" Fedora in order to download updates, or stuff like that.

        Neither I, nor the Fedora Board, which is Fedora's governing body, will allow some sort of "required registration" in order to get the full Fedora experience.

        Download. Install. Update. If that's the extent of a person's interaction with Fedora, fine by me. We hope, of course, that there will be a fourth step, that being: Contribute

  • by quixote9 (999874) on Tuesday January 30 2007, @03:49PM (#17818906) Homepage
    I have legacy hardware, and too little knowledge, so I'm too afraid to switch from Core 3 to 6. God only knows what would break, and I sure don't know enough to work around it. But if I could get 6, I'd be in their statistic too. There's bound to be more people like me, who can't get 6 for some reason. So that number is a low estimate!
    • by Intron (870560) on Tuesday January 30 2007, @04:26PM (#17819504)
      So just fire up a live CD with a recent kernel and try it out. You don't have to upgrade if it doesn't work. Hardware drivers are in the kernel, so just testing the right kernel on your system will tell you whether it works (mostly).

      FC3 uses kernel 2.6.9
      FC6 uses kernel 2.6.18
    • Re: (Score:2, Informative)

      you need to learn to use Slackware, it is the best distro for old hardware...
    • by Znork (31774) on Tuesday January 30 2007, @04:41PM (#17819796)
      Personally, I rsync from a mirror and have a local repository, so I have a whole bunch of machines that dont get counted. Stuff like that will result in the numbers being a bit off.

      "so I'm too afraid to switch from Core 3 to 6."

      If you upgrade that rarely, I'd suggest you take a look at CentOS. CentOS 4 will be a far smaller leap (RHEL4 is close to FC3/FC4), and you'd be on a maintained platform again.
      • Not to mention multiple computers hiding behind NAT; they would probably appear to be one system, due to the single IP address, unless the software for determining "hits" is smart enough to look at the transactions and realize that the same IP address just requested the same data 4 times over, and thus is probably 4 machines on a LAN behind a NAT router. I suspect that it is not, though, and thus you're almost certainly underestimating the number of installed systems.

        That doesn't mean the metric is worthles
        • Re: (Score:3, Informative)

          From the article:
          "We believe it is reasonable to equate a "new IP address checking in" with "a new installation of FC6", with the following caveats:
          1. Users who have dynamic IP addresses will likely be counted multiple times, which inflates the number by some amount.
          2. Users who are behind NAT, corporate proxies, or who rsync updates to a local mirror before updating will not be counted at all.

          The anecdotal evidence that we receive from different groups, companies, and organizations makes it quite clea
  • Saddly... (Score:5, Interesting)

    by DrYak (748999) on Tuesday January 30 2007, @03:49PM (#17818908) Homepage
    Saddly this metric will be very quickly attacked because of all users who have broadband connections with IP changing every 24 hours.

    Maybe counting how many different IPs downloaded *1* given critical update will be more precise (based on the assumption that even users with non permanent IP will download the patch once to secure their machines, and then won't download it again).

    But even if it lacks precision, it is still a good indicator that Linux *IS* in fact popular and much more widespread than people think.
    It just lacks sales figures to prove it. ...

    Specially when compared to the so-many "Vista didn't get a warm welcome" reports we read a lot those days.
    • Re: (Score:3, Insightful)

      Saddly this metric will be very quickly attacked because of all users who have broadband connections with IP changing every 24 hours.
      All users? I don't think my cable IP address (dynamically assigned) has changed in over a year.
      • because of all users who have broadband connections with IP changing every 24 hours.

        All users?

        You misplaced the invisible parenthesis: "all (users who have broadband connections with IP changing...)"

        "with" must refer to "connections" and cannot refer to "users' in that sentence.

      • All users? I don't think my cable IP address (dynamically assigned) has changed in over a year.

        Comcast users usually don't see an IP change unless they powercycle their modem and restart their computer or router. When I was on ATTBI, my IP remained the same until we switched to Comcast's IP block.

        Some DSL users constantly gain a new IP when their IP lease is up. It's unfortunate that DSL has gone this route as it used to be a guaranteed static IP.

        But in general, this "statistic" means absolutely squat.
        • Comcast users usually don't see an IP change unless they powercycle their modem and restart their computer or router. When I was on ATTBI, my IP remained the same until we switched to Comcast's IP block.

          Well, I have Comcast cable and neither rebooting, nor powercycling the cable modem will get a new IP address. I think it is tied to the computer's MAC address. When we moved house, I got a new Cable account, a new cable modem and even then, I still got the same IP address at my new house (I think it was sti

          • Well, I have Comcast cable and neither rebooting, nor powercycling the cable modem will get a new IP address. I think it is tied to the computer's MAC address. When we moved house, I got a new Cable account, a new cable modem and even then, I still got the same IP address at my new house (I think it was still ATTBI then).

            You lease is tied to both MAC addresses but it will expire over time. If no one else is in line to nab the IP you are using when your lease expires and you restart/powercycle, you'll regai
        • But in general, this "statistic" means absolutely squat. No one is going to give a shit if 100 million people downloaded something -- Microsoft is what managers hear the most about and that's what they are generally inclined to want.

          The point is that if you can go to ATI or Netgear and say 'you are going to loose X million sales if you don't develop a Linux driver', then ATI & Netgear will pay more attention than if you go to them & say 'we want drivers for Linux'. It's about numbers. When Marketdr

    • Re:Saddly... (Score:5, Informative)

      by spevack (210449) * on Tuesday January 30 2007, @03:55PM (#17819026) Homepage
      Actually, the Fedora folks address that very point. Quoting from the Fedora Project wiki, and it's page on Statistics:

      "Accuracy of metrics

      We believe it is reasonable to equate a "new IP address checking in" with "a new installation of FC6", with the following caveats:

      1. Users who have dynamic IP addresses will likely be counted multiple times, which inflates the number by some amount.

      2. Users who are behind NAT, corporate proxies, or who rsync updates to a local mirror before updating will not be counted at all.

      The anecdotal evidence that we receive from different groups, companies, and organizations makes it quite clear that group (2) is significantly larger than group (1). As such, we believe that the true numbers in the field are higher than the numbers on this page."
      • The question is, will dynamic IP result in too many or too few hits.
        If you count the IP during installation, you are most likely to complete that installation using the same IP. Then the next time sombody installs using that IP it will not be counted as new. So both dynamic IP and people behind NAT might actually give a lower estimate, than the actual value.
      • I'll concur with that quote. I have enough Fedora boxes behind NATs at home and at work to make up for several dozen dynamic IPs.

    • the team decided to track the number of users with unique IP addresses who connected to yum in search of updates for a new installation of FC6
      .

      It's how any new systems are being checked for the first time, and most people probably aren't reinstalling it constantly and downloading updates, so there's very little attacking you could do to these figures.
    • Re: (Score:3, Informative)

      The numbers will be inflated, but also deflated by places like the one where I work that have multiple FC6 hosts behind the same router.
  • I don't think commercial support is about the number of users, it's more about not wanting their precious IP laid bare to anyone who can download the kernel source (ie. everyone).

    Also, in these times of big companies patenting everything the source could reveal infringement.
  • by ColonelPanic (138077) <pmk@googl[ ]om ['e.c' in gap]> on Tuesday January 30 2007, @03:51PM (#17818938)
    IP addresses are necessarily unique ("one of a kind"). You mean "distinct" here.
  • by RyoShin (610051) <[moc.liamg] [ta] [orakut]> on Tuesday January 30 2007, @03:54PM (#17818986) Homepage Journal

    Well, if you can go to them and say, "Hey, there's millions of people using this,"
    Actually, it's a million computers using this (that's actually at least a million computers, as multiple PCs may be behind one public IP). Especially amongst the more computer-oriented people (of which the Linux community has many), it's not uncommon to have more than one computer running the same OS. I myself have three computers, two of which run Windows (the third is being put together). While these are tied to one DSL line, one of them, a laptop, may travel to other wireless networks and thus change IPs, so I could be recorded under two unique IPs but be only one person.

    Not saying there isn't a vast number of Linux users (I'm sure there are well over a million individual Linux users - that's a third of 1% of just the American population), just that numbers from data like this can be skewed.
    • Re: (Score:3, Insightful)

      You have it backwards. Are you going to download 3 fedora CDs because you have 3 computers? Maybe if they are differing archs... but that's not normally the case. Thus, the number would be LARGER than the one they gave, because many people use the same CD for more than one install, give their CDs away after using them, etc.
      • Re: (Score:3, Informative)

        You are quite correct; one person would not download the CD three times.

        However, that's not how they're collecting the data:

        the team decided to track the number of users with unique IP addresses who connected to yum in search of updates

        While you need only one CD to do multiple installs, it is my understanding that each machine has to run YUM itself. They've also thought of what you mentioned.

        According to Spevack, it's not enough to simply count how many times the distribution has been downloaded

        Now, the

    • This is good enough for the intended purpose. Every computer is hardware that may use a peripheral device. Vendors will have to pay attention and develop drivers if the numbers are high enough.
  • by currivan (654314) on Tuesday January 30 2007, @03:56PM (#17819030)
    I just installed FC6 on a machine yesterday, and they made it impossible to do anything without connecting to their server. I'm keeping the machine off the network, but apparently there's no way to install packages from the DVD without first downloading the update lists from their mirrors.

    The Add/Remove gui (and yum) crashes if DNS isn't available. After some research, I was able to hack the yum .repo files to point to the DVD instead of the internet, but it still crashes with mysterious errors about media uris. I finally gave up and installed Ubuntu instead. So no, this doesn't help the whole Linux community. We'd be furious is Microsoft imposed this sort of requirement on new installations.
    • Re: (Score:2, Informative)

      What about yum --disablerepo=* localinstall or rpm ?
      • I used --disablerepo to get rid of the ones that pointed to the web, and only included my custom DVD repo, but yum dies with another error about a media uri that didn't appear anywhere in the files. It mangled the file:///media/disk uri and put a bunch of random digits in front of it, then complained it doesn't exist. Maybe I could debug it if I actually had the relevant packages on the machine, but there's no reason to put up with this. I shouldn't even have to know about yum, let alone rpm, to install
        • Re: (Score:2, Informative)

          Unfortunately the repodata provided on the CD & DVD is not useable by yum but creating a local yum repository [city-fan.org] is quite easy once you know how.

          Installing packages from the original media is great just after you've loaded the system, but remember the good old days when you would be given the a prompt like: to complete this change you need to insert disk 3 of the installation media. Good luck finding the original disks a year or two after installing the PC.

          I believe the majority people are happy that y

    • I just installed FC6 on my macbook pro over the weekend, and I had no internet connection at all during the entire process (I regularly work offline). It worked fine, so I can only assume that your case is an isolated incident.
      Regards,
      Steve
    • Huh? (Score:3, Informative)

      I just did a retro-fit upgrade and an install on two machines and neither went to the "yum" repository mirrors to do an update till after they finished their first reboot where I had to activate the update manually (and get the gpg keys installed).

      - I remember that "install" at some point gave me an option to install against latest package in the "yum" repositories, which I do not do for speed.
      - I remember the "upgrade" and "install" screens from Anaconda being different. The "upgrade" never asked me to up
  • One issue they mention (and many people here will mention) is

    "1) Users who have dynamic IP addresses will likely be counted multiple times, which inflates the number by some amount."

    To counteract this once you hit the 6 month mark you simply delete IPs that haven't been used in 1-2 months, by doing that you pratically guarante that whatever number you have is an underestimate and that number becomes a lot more authoritative.

    Still it's awesome to see the numbers for Fedora are that high considering the dissa
      • Re:Sweet (Score:5, Funny)

        by quantaman (517394) on Tuesday January 30 2007, @04:18PM (#17819378)

        I'm ready to move.So where should we put the city ?
        Well I was thinking Canada or Europe, heck why not Luxembourg [wikipedia.org]? With a population of only 465,000 we'd made a majority of the population and be able to form a governmenmt.

        Welcome to Fedoraland!
        • ... why not Luxembourg? With a population of only 465,000 we'd made a majority of the population and be able to form a governmenmt.

          For starters because Luxembourg won't let you move in and get citizenship all that easily.

          And the population is rich enough to enforce their will: Pretty much every adult is an officer of several international corporations, at some serious pay each. This is because Luxembourg's laws make it advantageous to headquarter there, but require at least one citizen as a major officer.
          • ... why not Luxembourg? With a population of only 465,000 we'd made a majority of the population and be able to form a governmenmt.

            For starters because Luxembourg won't let you move in and get citizenship all that easily.

            And the population is rich enough to enforce their will: Pretty much every adult is an officer of several international corporations, at some serious pay each. This is because Luxembourg's laws make it advantageous to headquarter there, but require at least one citizen as a major officer.

            Besides: Taking over by settling creates serious (sometimes deadly) opposition from those already there who FORMERLY ran their own government.

            If you want to create a settlement where you can run your own government up to a significant level, try Oregon. If they remain true to their history, once you've established a significant colony of like-minded people, if you have a beef with the rest of your county they'll split it and give you your own county composed of you and your like-minded settlers. Then you can elect your own supervisors and sheriff, tax each other, maintain the roads your way, etc.

            (Which is what makes the Ragneeshi's attempted takeover of Wasco county - by food-poisoning a salad bar at a local restaurant shortly before the election - such a stupid move: The state had already offered them a county composed of their own settlement and the roads to it.)

            -1 Buzzkill

        • Re:Sweet (Score:5, Funny)

          by gclef (96311) on Tuesday January 30 2007, @04:49PM (#17819930)
          Fedoraland? Bah. Tuxembourg!
  • by Locutus (9039) on Tuesday January 30 2007, @04:00PM (#17819110)
    Given the numbers coming out, I'd think that it sure can't hurt for these guys to post the number they are.

    Here(2nd page ) Mark Shuttleworth mentioned Ubuntu having 8 million active users:

    http://redherring.com/PrintArticle.aspx?a=20497&se ctor=Briefings [redherring.com]

    Now what are the hardware vendors waiting for? Permission from Microsoft?

    LoB
    • by spevack (210449) * on Tuesday January 30 2007, @04:17PM (#17819366) Homepage
      The key difference, IMHO, is that in Fedora we are trying to demonstrate *where* our numbers are coming from, as opposed to just giving a number with no context.

      It's also important to realize that this metric is just for Fedora Core 6, not "all instances of Fedora 1-6".
      • Re: (Score:3, Insightful)

        I wasn't interested in a "my numbers are bigger than your" discussion and obviously, there are more TOTAL Fedora user than the number of Fedora 6 users.

        And yes, it's a big deal having data and the technique for getting those numbers. Shuttleworth didn't state where the numbers came from but also wasn't asked. My guess is those numbers came from their date servers since I've seen default Ubuntu installations setting /etc/default/ntpdate to point to ubuntu.com servers.

        Anyway, it is great these numbers are get
  • Why only now? (Score:5, Insightful)

    by Pecisk (688001) on Tuesday January 30 2007, @04:01PM (#17819112)
    Personally I don't understand shyness/lack of will/underrating ourselves in these case. Look at Firefox, they made whole PR campange around those numbers! And if they won't matter....THEY DO. They are true numbers who can be verifired, checked, compared, etc.

    I think most of problem of using meme "look at the numbers, user count are huge, man" is that there's lot of geeks which don't see this argument as simply valid (those numbers can't be wrong, etc. etc.). They would like to better convince hardware developers that they MUST get those damn specs (by some hidden morale or simple common sense, which, I agree, exists in this case too) out rather trying to wow them to community side (presentations, numbers, proof of concept (you don't have to care about driver, etc.)).

    We need more actions like SpreadFirefox, period. Done right, they just work.
  • by spevack (210449) * on Tuesday January 30 2007, @04:06PM (#17819202) Homepage
    I'm the guy who actually maintains that Statistics page on the Fedora wiki.

    The real "story" here is a couple of things:

    THING 1 -- We're making the best effort that we can at showing the world how many installations of Fedora Core 6 we know about.

    THING 2 -- We're being upfront about the assumptions and caveats that go along with that number. Quoting:

    "Accuracy of metrics

    We believe it is reasonable to equate a "new IP address checking in" with "a new installation of FC6", with the following caveats:

    1. Users who have dynamic IP addresses will likely be counted multiple times, which inflates the number by some amount.
    2. Users who are behind NAT, corporate proxies, or who rsync updates to a local mirror before updating will not be counted at all.

    The anecdotal evidence that we receive from different groups, companies, and organizations makes it quite clear that group (2) is significantly larger than group (1). As such, we believe that the true numbers in the field are higher than the numbers on this page."

    THING 3 -- We're also being upfront about how that number is generated.

    I'm not trying to spin the data in any way. I'm just putting it up there, and trying to do so as objectively as possible. Anyone can draw their own conclusions, or compare it to data from other distributions, if you can find similar reporting.
    • It's cool that Fedora is sharing this stuff openly. It would be great if you could also be upfront about the plans for FC7. Which variant of these FC7 Metrics proposals [fedoraproject.org] have you decided on going with?
      • Re: (Score:2, Informative)

        Well, we'll keep doing what we currently are doing. In addition, the idea currently under consideration is an OPTIONAL screen in firstboot where a user can choose to let us know more about their hardware and/or installed package set.

        KEY POINT TO MAKE: If a user says "no, go away and leave me alone", we will respect that.

        To anyone who wants to be part of the discussion, feel free to follow the Fedora Infrastructure list.

        http://www.redhat.com/mailman/listinfo/fedora-infr astructure-list [redhat.com]
  • I use Kubuntu, but the concept is the same. When I use aptitude, it hits something.archive.ubuntu.com, and I get counted as one person, since I am behind NAT.

    However, I have six machines, all of them on Ubuntu server or Kubuntu. One is AMD64, the rest are i386.

    So, that skews the numbers for sure.

    I wish the Linux Counter is taken more seriously. They used to put an automated email message in Slackware, so the likelyhood of you registering was high. Otherwise, it is only good for comparative studies only, not