Forgot your password?
typodupeerror
Caldera Unix Linux

Claimed Proof That UNIX Code Was Copied Into Linux 578

Posted by kdawson
from the copied-by-a-spider-on-lsd dept.
walterbyrd writes "SCO's ex-CEO's brother, a lawyer named Kevin McBride, has finally revealed some of the UNIX code that SCO claimed was copied into Linux. Scroll down to the comments where it reads: 'SCO submitted a very material amount of literal copying from UNIX to Linux in the SCO v. IBM case. For example, see the following excerpts from SCO's evidence submission in Dec. 2005 in the SCO v. IBM case:' There are a number of links to PDF files containing UNIX code that SCO claimed was copied into Linux (until they lost the battle by losing ownership of UNIX)." Many of the snippets I looked at are pretty generic. Others, like this one (PDF), would require an extremely liberal view of the term "copy and paste."
This discussion has been archived. No new comments can be posted.

Claimed Proof That UNIX Code Was Copied Into Linux

Comments Filter:
  • More details, and a downloadable archive here - because there's no telling how long those files will remain on McBride's blog,

    Also, we find out more about streams, and how SCOsource was bogus.

    • For those not logged in who don't see the download url in my sig [slushdot.com]

      "In a blog post dated July 10th, 2010, Kevin McBride has leaked almost 50 of the code comparisons that were submitted in evidence in SCO vs Novell. You can download the archive. [slushdot.com]

      Read on to view individual files if you don't want to download the whole thing.

      Linux STREAMS

      We also learned that the whole STREAMS fuss was not about linux, but about a product distributed by gcom, a provider of legacy solutions.

      Their Linux STREAMS (LiS) product provides a couple of loadable drivers that would intercept calls to the old streams api and convert them. In other words, far from the allegations that the linux kernel contained code that infringed streams, it's evident from the need of an add-on loadable module that the linux kernel does not contain any STREAMS code.

      Of particular note, and probably a source of much consternation to SCO and their proponents, is that LiS itself doesn't implement streams either, just does protocol translation. So neither linux nor LiS contains infringing code.

      The whole end-user $699 license was a scam

      In my view, contract violations by IBM would not result in liabilities by other Linux users.

      So according to Kevin McBride, one of the lawyers who worked on the case, there was no reason for end users to take out a license. It's logical to conclude that SCOsource was a protection scam. So what happened? To me, it looks like SCO lawyer-shopped until they found attorneys who were willing to go along with the scheme for a price - everyone has their price, and in this case, it was $30,000,000.00.

      The Appeal of SCO's loss to Novell - Novell will probably win.

      Will Novell win the current SCO appeal? Probably. Will Novell donate the UNIX copyrights to the Linux community if it wins the current appeal? Probably-although Novell's Linux activities have been difficult to predict in recent years.

      So it's pretty much as we suspected all along.

      • by Runaway1956 (1322357) on Sunday July 11, 2010 @11:10PM (#32871126) Homepage Journal

        Ho-hum.

        I'm not a coder. I couldn't create a kernel if my life depended on it. I couldn't code a hungry cat to catch a mouse. 1/2 or more of what I read in code is gibberish to me.

        But, one thing is pretty sure. Linus Torvalds wrote Linux, and his programming background came directly from Unix. OF COURSE he is going to write the same commands he has used a thousand times in the same way. OF COURSE there are going to be lines that look very much the same, sometimes even identical.

        Now, open any dozen books that are 50,000 words in length. Search for strings that are duplicated between the books. Entire sentences, or phrases, it hardly matters. Just do the search. Anyone who is used to playing with databases can probably search those dozen books, and find numerous instances of phrases that were copy/pasted from one author's book to another. In fact, I'll bet that technical and factual books will have a higher incidence of matching phrases and sentences than works of fiction - but fiction will have it's share as well.

        And, before we do this data base mining, we need to set up some method of assigning a variable string for proper names. In the wife's romance novels, we would be looking for " $ kissed $ ". It will be repeated so often that you can't help seeing the plegiarism. One author after another rips it from his predecessors.

        50 instances are claimed for "copying". Out of how many lines of code? Good grief. I guess it should have been mandatory that Linus write any code not only in a different programming language, but in a different language than English. Then, he MIGHT have been safe. Maybe. Not likely though, because SCO can probably read Chinese, or hire some scumbag lawyer who can.

        • I've seen cases where me and another person are working on code independently, and when it came time to merge, we had both ended up creating the same variable names, and pretty much the same code.

          About the only difference was in indentation - mine is "always put the opening brace on the same line, one true tab, else in same column as if, no braces for any single-line condition to a control structure (for, if, else, while, etc)". Even the comments were pretty much the same.

          In this case, though, some of the code is from BSD - which is perfectly fine.

          • by bennomatic (691188) on Monday July 12, 2010 @12:17AM (#32871480) Homepage
            I learned perl from someone who named all his variables with variations on "foo" and "bar". Back in those days, if I was writing something short and simple enough, it was hard for me to break the habit of naming things $foo, $bar, $boo, $far, $foofoo, etc. I'll bet a lot of our code looked like it was from the same person :)
            • by sg_oneill (159032) on Monday July 12, 2010 @02:08AM (#32871962)

              And heres the Magic. Linus learned his style by closely reading Andrew Tanenbaum's books, and reading the Minix code. Which of course is what your supposed to do with Minix. So have most OS coders who had their education back then.

              The end result of course is that everyones code ends up looking like Tanenbaums , which is not a bad thing, the guy is up there with the gods in terms of importance to O/S theory.

              • by LWATCDR (28044) on Monday July 12, 2010 @10:57AM (#32874628) Homepage Journal

                Actually it is even simpler than that. The code in the PDF I saw for for ELF.
                They where all typedefs. Elf is a well documented format and not of the code that shows copying was actual functional code.
                As I was reading the code I was thinking just how trivial the example was but also how well written both .h files where. You could tell exactly what each variable in the type def did. It also looked like a lot of my own code when I am having a good day.
                Also these are .h files! they are not functional code blocks just definitions. Of course the definitions for the typedefs of a well documented file format will look a lot alike!
                It is a huge duh but an attorney that knows nothing about programing might not understand that.
                If this was an example of the infringement I would say the court did a great job when they tossed it out.

          • Re: (Score:3, Interesting)

            by gringer (252588)

            mine is "always put the opening brace on the same line, one true tab, else in same column as if, no braces for any single-line condition to a control structure (for, if, else, while, etc)"

            Coding style like this makes me cringe, particularly the thing about no braces for single-line conditionals -- it makes it far too easy to make mistakes because you indent code and forget that indentation doesn't mean it's part of the conditional (unless you are using python, of course).

        • by MSG (12810) on Monday July 12, 2010 @03:15AM (#32872264)

          OF COURSE he is going to write the same commands he has used a thousand times in the same way

          I'm sure this is one of the reasons it's best to call the system GNU: Linus didn't write any of the "commands". Linus wrote a kernel and GNU ended up adopting it. The GNU project wrote the system "commands".

          Just do the search.

          Trivia: Actually, the people at exbiblio found that there is very little repetition of text in literature. Any four or five word sequence in a common magazine article is likely to appear in very few or no other texts. That fact is foundational to their technology.

        • by jimicus (737525) on Monday July 12, 2010 @04:32AM (#32872516)

          Now, open any dozen books that are 50,000 words in length. Search for strings that are duplicated between the books. Entire sentences, or phrases, it hardly matters. Just do the search. Anyone who is used to playing with databases can probably search those dozen books, and find numerous instances of phrases that were copy/pasted from one author's book to another. In fact, I'll bet that technical and factual books will have a higher incidence of matching phrases and sentences than works of fiction - but fiction will have it's share as well.

          Actually, that's not true. There is some evidence to suggest you only need a remarkably short string of words to uniquely identify a piece of English prose - it's this kind of thing that cheating-detection algorithms rely on.

          But we're talking about a structured programming language - with far more structure and rules than the English language - and the things that are at issue are by and large implementations of existing standards. The final link in TFS is a comparison of ELF utility header files, FFS. They've got to look fairly similar or they won't be any use for dealing with ELF executables! Even then they're sufficiently different that it would probably have been easier to write from scratch than it would be to execute the "copy/paste/obfuscate" cycle that is being alleged.

          • by silentcoder (1241496) on Monday July 12, 2010 @07:30AM (#32873084) Homepage

            >But we're talking about a structured programming language - with far more structure and rules than the English language

            Not to mention a far smaller vocabulary, the complete absense of abstract forms of speech (no metaphors, similes)and in fact of even fundemental sentences.
            The vast majority of sentences in a programming language are verb(subject); THAT'S it, a rare few have an "object" (e.g. substr(S1,S2)) but at heart, that's 99% of the lines in a program. Simple commands. There are identifiers, control concepts (loops and conditionals) and structural stuff (classes, functions and the like) but these make up very little of the bulk. The implementation section consists of commands and variables for them to act on.
            Thus for the same algorythmic task, barring minor changes in indentation and identifier naming (which will be minor because both are matters on whic standards exist and within organisations some or other standard is usually enforced) the statistical likelihood of two programmers writing and identical solution to the problem is very high. After all, programming is maths and there is only so many ways you can calculate the same equation - which is basically all any algorithm does.

            You need a lot more than a few functions with identical structures to prove copyright violation when the scope for individual change is that much more limited. Creativity in programming is VERY rarely coming up with a NEW algorythm for an old task. Nearly always it lies in how we combine algorythms with one another to solve the bigger problems. The bits and pieces of code are like nuts and bolts, every engine has a million of them and they all look pretty much the same.

        • by MrHanky (141717) on Monday July 12, 2010 @05:03AM (#32872616) Homepage Journal

          Is that so? Let's see if we take a phrase from your own comment: "a higher incidence of matching phrases [google.com]". One hit. Not bothering with linking to them all, but how about "rips it from his predecessors"? One hit. "strings that are duplicated between the books"? One hit. "his programming background came directly from Unix"? One hit. "open any dozen books"? One.

          I have, of course, duplicated them in this comment, meaning there will be two hits very soon. BTW, these are all the strings I searched for, giving your comment a 100% originality rating (admittedly, I didn't search for "I'm not a coder", which I expect would show up several times).

          Duplication of whole sentences in ordinary human language is actually quite uncommon for all but the most trivial declarations and stock phrases ("Just do the search" gives 3 million hits; "Just do the twist" gives 105 000).

    • by goombah99 (560566) on Sunday July 11, 2010 @10:55PM (#32871028)

      Comparing a variable named elf_t_arname to one names elf_c_arname is not very convincing. The suffix is generic, the prefix is activity specific, and the middle letter is presumably some datatype indicator.
      Where it gets dicey is when there are structs and every variable in the struct has a somewhat similarly named variable in the other one. This does arouse suspicion. even if you forget the variable names for a moment, any pattern like bool,real,real, *real, int, *char,*char,*bool,.... that is identical between two structs would be an improbable occurence. and when you see it in back to back structs it becomes nearly impossible to happen by chance.

        The key question then is if there is some structural reason why the two might share an identical stuct? for example, is there an elf spec that defines a protocol for communication or the way a record on disk is serialized (i.e. packed)? if so then of course these will occur like this. Or perhaps both are derived from a common BSD ancestor so both vary only slightly.

      if the answer is no, there was no reference implementation and no ancestor then I'd say that for examples like 251, Mcbride has some evidence.

      However for most of the ones he cites there is no there, there.

      • by johnmoe (103704) on Sunday July 11, 2010 @11:19PM (#32871180)
      • by Fizzl (209397) <`ten.lzzif' `ta' `lzzif'> on Monday July 12, 2010 @12:44AM (#32871596) Homepage Journal

        This does arouse suspicion. even if you forget the variable names for a moment, any pattern like bool,real,real, *real, int, *char,*char,*bool,.... that is identical between two structs would be an improbable occurence. and when you see it in back to back structs it becomes nearly impossible to happen by chance.

        Umm, no. This is only logical. If he had the documentation to data types, or even if he reverse engineered them, they would be identical. I have done some trampolining of functions to/from code which i only have binary access. If I am handling a struct, I would examine the structure in memory and make my struct identical so I could copy it with simple memcpy(to, fro, sizeof(struct)), or replace references by just changing the pointer to different place.

  • Shocking (Score:5, Funny)

    by Kenoli (934612) on Sunday July 11, 2010 @10:24PM (#32870826)
    How dare they copy/paste those blank lines!
    • Re: (Score:3, Funny)

      by bsDaemon (87307)

      between that and pre-processor #include directives from the standard C library and POSIX stuff, well... damn. How could any judge have failed to see this!? /sarcasm

    • Re:Shocking (Score:4, Informative)

      by physicsdot (530505) on Monday July 12, 2010 @12:07AM (#32871436)

      How dare they copy/paste those blank lines!

      Just in case you thought you were kidding: http://www.mcbride-law.com/wp-content/uploads/2010/07/Tab-422.pdf [mcbride-law.com]

      Line 22 is blank, and is indicated as being copied.

    • The first 33 lines (Score:3, Interesting)

      by Lorens (597774)

      Sure, they copied all those those blank lines, and in at least one case (tab 247) they also copied the BSD copyright header. Shocking! Funnily enough SCOX removed those lines on both sides. Kind of them.

      They also copied strn?casecmp definition (tab 241). For quite astonishing values of "copied":

      -SCOX
      +RedHat glibc

      -38: int strcasecmp();
      -39: int strncasecmp();

      +53: /* Compare S1 and S2, ignoring case. */
      +54: extern int strcasecmp (__const char *__s1, __const char *__s2)
      +55: __THROW __attribute_pure__;
      +57: /* Compare no more than N chars of S1 and S2, ignoring case. */
      +58: extern int strncasecmp (__const char *__s1, __const char *__s2, size_t __n)
      +59: __THROW __attribute_pure__;

      This is clearly an extremely grave violation. However, it is interesting to note that SCOX did not complain about the definitions of all the other string functions. Maybe because the header of their file specifies

      In addition, portions of such source code were derived from Berkeley
      4.3 BSD under license from the Regents of the University of
      California.

      Pre

  • comments added... (Score:3, Interesting)

    by nacks1 (60717) on Sunday July 11, 2010 @10:28PM (#32870860) Homepage Journal

    I find it rather funny that the Linux code is well commented but the SVR4 code has little to no comments at all. Just because the function names are the same doesn't mean it was copied. It just means that the coders implemented functions with the same names (and I bet that the Linux versions worked rather differently than the original SVR4 code).

    • Re: (Score:3, Insightful)

      I agree. As a programmer myself, I saw significant differences between the two sets of code in each example that SCO claimed was "evidence". I would have to look at more of the examples, but from what I saw, if I were the judge, I'd tell SCO to stop wasting everybody's time.
  • by wandazulu (265281) on Sunday July 11, 2010 @10:31PM (#32870868)

    I'm really sorry, but there was some code that was already written that was just too good to pass up for the project I was on:


    #include
    int main(int argc, char* argv[])
    {
            printf("Hello World!\n");
            return 0;
    }

    Now that I'm using Java, it won't happen again.

    • Re: (Score:3, Funny)

      by mysidia (191772)
      Are you sure it hasn't already happened again?

      // hello.java
      /** This application greets the world.
      *
      * @Deprecated Earth has been destroyed by global warming, this is superceded by the goodbye class
      */
      @Deprecated public class hello
      {
      public static void main(String args[])
      {
      System.out.println("Hello World!");
      }
      }

  • SCO! (Score:3, Funny)

    by Anonymous Coward on Sunday July 11, 2010 @10:38PM (#32870918)

    Die, Monster, Die!

  • libelf!?! (Score:5, Informative)

    by Dahamma (304068) on Sunday July 11, 2010 @10:39PM (#32870926)

    I actually find it ironic that libelf was picked as an example of infringement. I can tell you first hand that the (more standard) UNIX/Solaris libelf is NOT compatible with the Linux/libc libelf. And I can also tell you that after pointing this out to Ulrich Drepper he really didn't give a shit... (I think his approximate words were "It's been like that for a while, too late, I won't change it").

    Their only mistake was actually naming it "libelf"... since it is most definitely NOT the same library...

  • Oh Good (Score:4, Insightful)

    by bky1701 (979071) on Sunday July 11, 2010 @10:53PM (#32871020) Homepage
    More news about imaginary property. How much time and money does our society waste on propping up this outdated concept that you can own an idea? "#include " constitutes copy and pasting? I guess every program on earth violates the copyright of the guy who first wrote "int main(", and whoever started the convention of naming C++ files cpp or cxx should be hiring a lawyer about now. Money is to be made.
    • Re: (Score:3, Insightful)

      by jbengt (874751)
      It's not copyright infringement to write something required for a specific functionality (e.g. POSIX compliance or API compatibility) even if it's exaclty the same words. Copyright only covers creative expression. If the ways of saying something are limited, that's not covered by copyright.
      Of course that didn't stop the SCO trolls from trying and wasting everyone's time.
      Delayed justice is no justice at all.
      • Re: (Score:3, Insightful)

        by bky1701 (979071)
        And were it some small group of developers that SCO sued rather than Red Hat/Novell/IBM, it would have been worse than delayed justice. Copyright does not protect the little guy, much like the majority of our legal system.
  • by haruchai (17472) on Sunday July 11, 2010 @11:09PM (#32871120)

    this fiaSCO has been running on for nearly 8 years - what the hell is up with the courts that they keep this bullshit alive.
    Kill -9 all | sort > /dev/null

  • by DrJimbo (594231) on Sunday July 11, 2010 @11:13PM (#32871158)
    This code was the last big unknown in this long sorry saga. Even if SCO owned the copyrights, (and hadn't distributed it under the GPL, and hadn't signed the UnitedLinux agreement, etc.) it is now crystal clear that SCO's Microsoft-funded anti-Linux campaign was based on a stack of frivolous law suits.

    I think Darl's brother is scrambling to cover his backside so that when the disbarments and criminal charges come down, he has a chance to escape.

    Groklaw (of course) has IBM's response [groklaw.net] to SCO's claims that these paltry examples are worth BILLIONS of dollars in copyright damages. None of the code they offered is protectable under copyright law. Some of it is BSD code that everyone is free to use however they want (if they include the copyright notice). A lot of it is header files that were not copy-and-pasted which are nearly impossible to protect under copyright law. Then they have some snippets of generic code. Given the size of the source code for Linux, it would be astounding if there weren't some similar snippets. The idea that this is proof that Linux violated any Unix copyrights is totally absurd. The idea that these generic snippets are what made Linux enterprise-ready is beyond insane.

    The recent SCO v. Novell case decided that SCO never even owned the copyrights it was suing about. And then instead of the millions of lines of code they claimed were infringing, they presented this meager collection of totally unprotectable snippets. I sure hope SCO's lawyers get severely punished for perpetrating this fraud on the court for the past seven years.
    • by UnknowingFool (672806) on Monday July 12, 2010 @01:20AM (#32871758)
      In Gates v Bando [groklaw.net] the Tenth Circuit established the abstraction-filtration-comparison test that would become the standard in software copyright infringement. Specifically in the filtration step, all elements which are not protected by copyright must be removed from consideration. In this case, most of the code falls under scenes a faire: "expressions that are standard, stock, or common to a particular topic or that necessarily follow from a common theme or setting . .these external factors may include: hardware standards and mechanical specifications." Most of the code were simply declarations needed for compatibility and cannot be copyrighted.
  • by Fr33thot (1236686) on Sunday July 11, 2010 @11:59PM (#32871394)
    "until they lost the battle by losing ownership of UNIX" Those are all the words you needed to read. You cannot loose ownership of that which you never did own. SCOs gambit was to gain ownership by bamboozling everyone. You know when one party is blowing smoke in these issues when they refuse to point to the infringing code outside of court. If the quote included is to be believed, they lost on appeal and now that they have filed yet another appeal they are suddenly going to show us all the holy grail of "infringing" code. In each case where they've brought up "infringing" code, it was either released by themselves, or was code they didn't own in the first place.
  • by softcoder (252233) on Monday July 12, 2010 @12:03AM (#32871422)

    Just because Linux and Unix have some of the same lines of code, does NOT mean that linux copied the code from unix.
    The code could have come from BSD for example and in fact there are several instances where linux and Unix share (or shared) the same BSD code.

    The code could also have come from implementing the Posix Standard. The PDF linked to seems to be an implementation of errno.h which I believe is part of the POSIX standard.
    So again just because the code appears in Unix, does NOT mean that Unix had copyright ownership of that code.

    To prove its case SCO would have had to prove that:
    a) Linux had lines of code that were substantially similar to Unix. (some minor examples provided but even that was not definitive)
    In fact the judge who supervised the discovery kept asking for details and at the end of the multi year discovery process, said, "Is this all you've got?"

    b) Unix had copyrights to the code in question (again not proven)

    c) SCO owned the Unix copyrights (again not proven)

    d) SCO never granted the rights to use that code in any way. In fact Caldera (aka SCO) distributed a version of Linux under the GPL which in effect granted GPL license to any of their code that happened to be in Linux.

    So even if all of a, b, and c were true,
    they STILL did not have a case for infringement.
    I almost wish that SCO had owned the UNIX copyrights, because then this whole issue would have been resolved by now, instead of relying on Novell.

    softcoder.

  • by DrJimbo (594231) on Monday July 12, 2010 @12:26AM (#32871526)
    The courts have established that in order to determine software copyright infringement (for non-literal copying, which is what we have here but filtration is required even for literal copying), one must perform what is called the Abstraction, Filtration, Comparison Test [ladas.com]. In court documents related to the code in question, SCO admitted the did not perform this test on this code. They claimed that that was IBM's job. The article linked to above explains the test:

    1. break down the plaintiff’s program into its constituent structural parts (“abstraction”);

    2. examine each part for incorporated “ideas,” elements taken from the public domain, methods of operation, processes or procedures, or otherwise unprotected material (“filtration”); and

    3. compare the remaining kernel of creative expression, if any, to the work alleged to infringe at each level of abstraction (“comparison”).

    They further explain:

    The scenes à faire doctrine is often applied in software cases because it is frequently impossible to write a program in a particular computing environment without employing certain standard programming techniques and design elements. This is because certain functions, data elements, and the order of operation of a program can be dictated by such things as the type of computer on which the program will run, the programming language used, the operating system environment, governmental requirements, industry demands and standards, and widely accepted programming practices.

    I suspect the reason SCO didn't filter this code is because if they did, there would be nothing at all left to present to the court as their fig leaf to avoid being charged with perpetrating a fraud on the court.

  • by harlows_monkeys (106428) on Monday July 12, 2010 @12:57AM (#32871656) Homepage

    I spent several years as a Unix kernel hacker, working extensively with AT&T source code. I also went to law school and was one bad case of writer's block away from becoming a copyright lawyer. Thus I found those code snippets quite interesting, both from my Unix kernel hacker persepective and my almost-became-a-copyright-laywer perspective.

    My conclusion, from the half dozen or so of his samples that I looked at? They show nothing remotely resembling copyright violation.

    Copyright covers expression, not ideas. What that means when dealing with functional works, such as computer programs, is that things that anyone implementing that functionality will have to do are unlikely to be covered by copyright.

    All of the functions I saw that were allegedly copied were very simple functions. All they did was check arguments to make sure they were legal, return the expected error code if not, or return some very simply value otherwise.

    Even if the corresponding functions in Linux were exact matches to the SCO code, it would probably not be enough to support an inference of copying, because there just aren't a lot of ways to reasonably express such simple functions. And they were not exact matches. One would check for a null pointer by comparing to NULL, one would use if(!p), for instance.

    The header files are more similar, so copying is more believable there. The problem with SCO's case there is that the elements in the header files I looked at are entirely dictated by compatibility requirements. There's no copyrightable expression in them.

    To summarize, SCO's claims appear to fall into two groups. First, things where the implementation is so simple that it is not possible to infer copying from similarity since the similarity is imposed by the nature of the function. Second, things where there may have been copying--of things that aren't protected by copyright.

This login session: $13.76, but for you $11.88.

Working...