Forgot your password?
typodupeerror
Caldera Unix Linux

Claimed Proof That UNIX Code Was Copied Into Linux 578

Posted by kdawson
from the copied-by-a-spider-on-lsd dept.
walterbyrd writes "SCO's ex-CEO's brother, a lawyer named Kevin McBride, has finally revealed some of the UNIX code that SCO claimed was copied into Linux. Scroll down to the comments where it reads: 'SCO submitted a very material amount of literal copying from UNIX to Linux in the SCO v. IBM case. For example, see the following excerpts from SCO's evidence submission in Dec. 2005 in the SCO v. IBM case:' There are a number of links to PDF files containing UNIX code that SCO claimed was copied into Linux (until they lost the battle by losing ownership of UNIX)." Many of the snippets I looked at are pretty generic. Others, like this one (PDF), would require an extremely liberal view of the term "copy and paste."
This discussion has been archived. No new comments can be posted.

Claimed Proof That UNIX Code Was Copied Into Linux

Comments Filter:
  • comments added... (Score:3, Interesting)

    by nacks1 (60717) on Sunday July 11, 2010 @10:28PM (#32870860) Homepage Journal

    I find it rather funny that the Linux code is well commented but the SVR4 code has little to no comments at all. Just because the function names are the same doesn't mean it was copied. It just means that the coders implemented functions with the same names (and I bet that the Linux versions worked rather differently than the original SVR4 code).

  • Re:Cheat Detection (Score:3, Interesting)

    by Darkness404 (1287218) on Sunday July 11, 2010 @10:34PM (#32870886)
    The difference is that there are only a handful of solutions for the same problem when it comes to computer code and still make sense.

    On the other hand there are tons of ways of conveying a simple fact in English, consider the statement that "George Washington was the first president of the US" the same statement might read that The first president of the US was George Washington. George Washington was elected as the first president of the US. The first president in office in the US was George Washington. And so on.
  • by marcansoft (727665) <hector@@@marcansoft...com> on Sunday July 11, 2010 @10:47PM (#32870984) Homepage

    Of course it looks rearranged. It's a header file. Some of the ELF constants come straight from the ELF spec. The #ifndef stuff is bog standard code, there are a finite number of ways of writing that and the one presented happens to be the most common. The #include is another "duh" - of course you have to #include the right header, that doesn't mean it's copied. The header file is presumably deliberately compatible with the original, hence the function definitions are prototype-compatible (while being considerably different in style).

    There is nothing indicative of code copying in that PDF. The Linux header is just about as different as it can be while remaining source-compatible, as it should be.

  • by goombah99 (560566) on Sunday July 11, 2010 @10:55PM (#32871028)

    Comparing a variable named elf_t_arname to one names elf_c_arname is not very convincing. The suffix is generic, the prefix is activity specific, and the middle letter is presumably some datatype indicator.
    Where it gets dicey is when there are structs and every variable in the struct has a somewhat similarly named variable in the other one. This does arouse suspicion. even if you forget the variable names for a moment, any pattern like bool,real,real, *real, int, *char,*char,*bool,.... that is identical between two structs would be an improbable occurence. and when you see it in back to back structs it becomes nearly impossible to happen by chance.

      The key question then is if there is some structural reason why the two might share an identical stuct? for example, is there an elf spec that defines a protocol for communication or the way a record on disk is serialized (i.e. packed)? if so then of course these will occur like this. Or perhaps both are derived from a common BSD ancestor so both vary only slightly.

    if the answer is no, there was no reference implementation and no ancestor then I'd say that for examples like 251, Mcbride has some evidence.

    However for most of the ones he cites there is no there, there.

  • by bky1701 (979071) on Sunday July 11, 2010 @11:01PM (#32871072) Homepage
    Well lets see.

    "ELF_C_..." - each of these is the name of a type in C. I don't see how this is even a bit creative. I had a very similar enum in a program I wrote, except with data types from a 3D engine. My guess is that ELF_C means it would be the ELF binary format's C data type. Nothing to see here.

    "ELF_K/ELF_T" - It says in the open source one that these are descriptors as well. More or less the same; universal concepts if you're going to be programming a C compiler. I bet you can find an enum just like this in visual C++.

    Hmm... beyond the headings, that's really all that is in that file. If you really have been a programmer for 20 years, you have without question violated thousands of copyrights... if that file does.
  • by Anonymous Coward on Sunday July 11, 2010 @11:05PM (#32871096)

    (same AC) Again, 20 years programming experience, no experience with some, apparently very important for *NIX, POSIX IEE document. If this is a published standard, obviously they'll be similar. I just didn't understand why every function definition from line 138 to 176 is copied verbatim. With the same cryptic series of letters which obviously imply meaning (strptr, newscn, getshdr, getphdr, newehdr, elf_flagelf, elf32_faize) are VERBATIM. I think a more effective summary would have been to link to the public domain specification with these exact terms mentioned in the document so we could all laugh at how their only evidence is a public spec-based implementation.

    Instead I see a long list of identical function names, enum types, enum labels, and identically named structure fields. When I google the relationship between POSIX ELF I don't exactly get any hints , for something that's supposed to be (I'm assuming its relating to http://en.wikipedia.org/wiki/Executable_and_Linkable_Format)

    How much touching up, commenting, refactoring, general massaging is necessary for it to become "unique" code? I think if you got a jury that wasn't as studied as me, you're right, Linux would have been screwed without a little more explanation..

  • by GigaplexNZ (1233886) on Sunday July 11, 2010 @11:22PM (#32871210)

    Likewise, if someone was allowed to copyright particular words, they would have control over a segment of the publishing industry.

    If the RIAA can claim copyright infringement because someone sung a copyrighted song in public, one could claim copyright infringement if those copyrighted words were spoken in public. You'd have control over a lot more than just the publishing industry.

  • I've seen cases where me and another person are working on code independently, and when it came time to merge, we had both ended up creating the same variable names, and pretty much the same code.

    About the only difference was in indentation - mine is "always put the opening brace on the same line, one true tab, else in same column as if, no braces for any single-line condition to a control structure (for, if, else, while, etc)". Even the comments were pretty much the same.

    In this case, though, some of the code is from BSD - which is perfectly fine.

  • Re:Cheat Detection (Score:3, Interesting)

    by Chibi Merrow (226057) <mrmerrowNO@SPAMmonkeyinfinity.net> on Sunday July 11, 2010 @11:47PM (#32871324) Homepage Journal

    Now I'm not a system programmer, so I may be completely off base, but that looks like a system call for supporting ELF binaries. Meaning it's an implementation of a public standard. Meaning there's a very limited number of correct ways of doing this. Possibly only one correct way of doing this.
    Every file I've randomly selected has been like this; either references to the ELF format (which is a public standard), or ABI type stuff (which is also a public standard). That's what was rumored to be SCO's "copying evidence" like six or so years ago... And now we've found out it's the truth.

    They are either insane or they are "dumb like a fox," because anyone who knows anything about UNIX system development could tell you that this isn't stuff you could sue over.

  • by Jahava (946858) on Sunday July 11, 2010 @11:58PM (#32871392)

    Of course it looks rearranged. It's a header file. Some of the ELF constants come straight from the ELF spec. The #ifndef stuff is bog standard code, there are a finite number of ways of writing that and the one presented happens to be the most common. The #include is another "duh" - of course you have to #include the right header, that doesn't mean it's copied. The header file is presumably deliberately compatible with the original, hence the function definitions are prototype-compatible (while being considerably different in style).

    There is nothing indicative of code copying in that PDF. The Linux header is just about as different as it can be while remaining source-compatible, as it should be.

    Commenting further on that, here is a link to the System V Reference Specs [freestandards.org], one of which is the ELF Tool Interface Standard Specification [freestandards.org]. This contains not only several constants, structures, and function names, but suggests function prototypes and programming style.

    Like you said, any author wishing to build an ELF-capable system would almost have to have that exact same code. There are only so many ways to build an enum or struct following the exact TIS specifications, and there is no virtue in paraphrasing C code.

    Much of the rest of the code is libc and POSIX prototypes (and more headers), all of which are covered in the System V ABI [freestandards.org] specification. Anybody wishing to build a POSIX [wikipedia.org]-compatible system would have to define those prototypes.

    Several of the function implementations with similarities are very basic functions. Most of the similarities are in the constant names (rather than the specific implementation of those simple functions), and the constant names are defined by ... the TIS spec. The remainder is a no-brainer. See, for example, Tab 422 [mcbride-law.com]. This is a simple accessor method. There are only so many ways to retrieve a value from a structure...

  • by 10101001 10101001 (732688) on Monday July 12, 2010 @12:01AM (#32871410) Journal

    The best part? The Tab 229 example includes a define for RTLD_GLOBAL but in the SCO code it's value is 4 instead of the 0x100 used in Linux. Why the discrepancy? Well, probably because the FSF was cloning BSD [googlebit.com], not Unix (as BSD was probably more popular and readily available to many than one of the myriad Unix forks). Oops.

    Perhaps McBride is unaware of the BSD lawsuit? Certainly, if anyone has any room to complain, it'd be Berkley. However, given that the examples seem to repeatedly have jumbled lines or inconsistent values, I'd imagine that regular reverse-engineering was employed in the construction of most of the headers. Ie, it just further highlights how unlikely there was copyright infringement.

  • by softcoder (252233) on Monday July 12, 2010 @12:03AM (#32871422)

    Just because Linux and Unix have some of the same lines of code, does NOT mean that linux copied the code from unix.
    The code could have come from BSD for example and in fact there are several instances where linux and Unix share (or shared) the same BSD code.

    The code could also have come from implementing the Posix Standard. The PDF linked to seems to be an implementation of errno.h which I believe is part of the POSIX standard.
    So again just because the code appears in Unix, does NOT mean that Unix had copyright ownership of that code.

    To prove its case SCO would have had to prove that:
    a) Linux had lines of code that were substantially similar to Unix. (some minor examples provided but even that was not definitive)
    In fact the judge who supervised the discovery kept asking for details and at the end of the multi year discovery process, said, "Is this all you've got?"

    b) Unix had copyrights to the code in question (again not proven)

    c) SCO owned the Unix copyrights (again not proven)

    d) SCO never granted the rights to use that code in any way. In fact Caldera (aka SCO) distributed a version of Linux under the GPL which in effect granted GPL license to any of their code that happened to be in Linux.

    So even if all of a, b, and c were true,
    they STILL did not have a case for infringement.
    I almost wish that SCO had owned the UNIX copyrights, because then this whole issue would have been resolved by now, instead of relying on Novell.

    softcoder.

  • by tomhudson (43916) <barbara DOT huds ... a-hudson DOT com> on Monday July 12, 2010 @12:41AM (#32871590) Journal

    Or they both legitimately got it from BSD [freebsd.org], or linux got it from the standard.

    HISTORY

    The ELF header files made their appearance in FreeBSD 2.2.6. ELF in
    itself first appeared in AT&T System V UNIX. The ELF format is an
    adopted standard.

    We know from the AT&T settlement that there's a lot of BSD in AT&T Unix, and that even some of the non-BSD AT&T stuff simply isn't protectable by copyright.

  • by harlows_monkeys (106428) on Monday July 12, 2010 @12:57AM (#32871656) Homepage

    I spent several years as a Unix kernel hacker, working extensively with AT&T source code. I also went to law school and was one bad case of writer's block away from becoming a copyright lawyer. Thus I found those code snippets quite interesting, both from my Unix kernel hacker persepective and my almost-became-a-copyright-laywer perspective.

    My conclusion, from the half dozen or so of his samples that I looked at? They show nothing remotely resembling copyright violation.

    Copyright covers expression, not ideas. What that means when dealing with functional works, such as computer programs, is that things that anyone implementing that functionality will have to do are unlikely to be covered by copyright.

    All of the functions I saw that were allegedly copied were very simple functions. All they did was check arguments to make sure they were legal, return the expected error code if not, or return some very simply value otherwise.

    Even if the corresponding functions in Linux were exact matches to the SCO code, it would probably not be enough to support an inference of copying, because there just aren't a lot of ways to reasonably express such simple functions. And they were not exact matches. One would check for a null pointer by comparing to NULL, one would use if(!p), for instance.

    The header files are more similar, so copying is more believable there. The problem with SCO's case there is that the elements in the header files I looked at are entirely dictated by compatibility requirements. There's no copyrightable expression in them.

    To summarize, SCO's claims appear to fall into two groups. First, things where the implementation is so simple that it is not possible to infer copying from similarity since the similarity is imposed by the nature of the function. Second, things where there may have been copying--of things that aren't protected by copyright.

  • by ipX (197591) on Monday July 12, 2010 @01:22AM (#32871776)
  • Re:First post (Score:4, Interesting)

    by Cylix (55374) * on Monday July 12, 2010 @01:26AM (#32871802) Homepage Journal

    The pdf linked in the document is a snippet for what looks like a struct for the elf API interface. This specification is open and judging by the code they are using it exactly as intended.

    I'm going to guess the majority of their findings are specifically computer generated. They may have known first hand what the code was or even where it came from. However, if pressed to say how they discovered these violations I'm quite sure they would fall back on "the program made the mistake your honor." This would generate a plausible stance when the foundation began to crumble.

    Going further on a limb I'm also guessing this is why they would never release any of the alleged violations. In days a website similar to groklaw would be up in for everyone to review, identify and mark the source of the "violation." ie, this is a struct for the elf library specification or this is a header of a BSD library. (Remember that BSD ancestry is likely still there in large chunks)

    All of this happening in the court room and they had to know there were big holes in the allegations. Even a cursory glance reveals that some of the crap submitted is just that. This was a court room poker face with a huge bluff that many parties would just settle. I suppose it worked because too many people rolled over and handed out free cash.

  • by gringer (252588) on Monday July 12, 2010 @01:27AM (#32871814)

    mine is "always put the opening brace on the same line, one true tab, else in same column as if, no braces for any single-line condition to a control structure (for, if, else, while, etc)"

    Coding style like this makes me cringe, particularly the thing about no braces for single-line conditionals -- it makes it far too easy to make mistakes because you indent code and forget that indentation doesn't mean it's part of the conditional (unless you are using python, of course).

  • The first 33 lines (Score:3, Interesting)

    by Lorens (597774) on Monday July 12, 2010 @02:22AM (#32872026) Journal

    Sure, they copied all those those blank lines, and in at least one case (tab 247) they also copied the BSD copyright header. Shocking! Funnily enough SCOX removed those lines on both sides. Kind of them.

    They also copied strn?casecmp definition (tab 241). For quite astonishing values of "copied":

    -SCOX
    +RedHat glibc

    -38: int strcasecmp();
    -39: int strncasecmp();

    +53: /* Compare S1 and S2, ignoring case. */
    +54: extern int strcasecmp (__const char *__s1, __const char *__s2)
    +55: __THROW __attribute_pure__;
    +57: /* Compare no more than N chars of S1 and S2, ignoring case. */
    +58: extern int strncasecmp (__const char *__s1, __const char *__s2, size_t __n)
    +59: __THROW __attribute_pure__;

    This is clearly an extremely grave violation. However, it is interesting to note that SCOX did not complain about the definitions of all the other string functions. Maybe because the header of their file specifies

    In addition, portions of such source code were derived from Berkeley
    4.3 BSD under license from the Regents of the University of
    California.

    Presumably the other string functions came from BSD, and ignoring character case was a UNIX improvement that BSD couldn't have thought of by themselves. Right? Right??

  • by Dantoo (176555) on Monday July 12, 2010 @02:24AM (#32872040)

    Ostriches aren't Australian, they're African. Omelettes can be made from emu eggs and I have tasted one. It really wasn't any different to one prepared from hen's eggs. It looked no different to this observer. Compare an emu egg to a hen's egg and they are quite different in size, colour and even texture internally and externally. The formula (recipe) however was just for a standard omelette that we would all recognise by sight instantly. Interestingly, it tasted like one prepared from hen's eggs as well. Couldn't tell the finished product apart.

    Posix header files also look remarkably similar to this observer. If code is being written to a required formula so that it interacts correctly with other code (a standard) then there should be little surprise that it looks the same.

    Egg analogies make me hungry.
     

  • by Score Whore (32328) on Monday July 12, 2010 @02:46AM (#32872142)

    Why i,j,k and not a,b,c?

  • by moronoxyd (1000371) on Monday July 12, 2010 @02:53AM (#32872172)

    The truth is that code was reused from a UNIX derivative, which is now (somewhat disputably) owned by SCO.

    Did I miss a verdict here?
    As far as I know, it is right now only a claim, not yet proven.

    And using the terms "truth" and "SCO" in one sentence... well, it just feels wrong.

  • by MSG (12810) on Monday July 12, 2010 @03:15AM (#32872264)

    OF COURSE he is going to write the same commands he has used a thousand times in the same way

    I'm sure this is one of the reasons it's best to call the system GNU: Linus didn't write any of the "commands". Linus wrote a kernel and GNU ended up adopting it. The GNU project wrote the system "commands".

    Just do the search.

    Trivia: Actually, the people at exbiblio found that there is very little repetition of text in literature. Any four or five word sequence in a common magazine article is likely to appear in very few or no other texts. That fact is foundational to their technology.

  • by MSG (12810) on Monday July 12, 2010 @03:27AM (#32872296)

    Just because 2 programs have hooks or functions called "ReadX" does not mean there was any copying involved.

    On the other hand... [slashdot.org]

  • by spitzak (4019) on Monday July 12, 2010 @03:44AM (#32872360) Homepage

    Clicking on these I find a lot of .h files implementing POSIX and BSD standards (here is a choice one that is such an absurd claim of copyright violation that I can't believe they did it: http://www.mcbride-law.com/wp-content/uploads/2010/07/Tab-2421.pdf [mcbride-law.com]) Most of the others are not quite that bad.

    So you don't waste your time, after quite a lot of clicking I finally found some actual code: http://www.mcbride-law.com/wp-content/uploads/2010/07/Tab-415.pdf [mcbride-law.com]

    Here we see that they both used the name "elf" to name a pointer to the ELF structure! Why the chances of two programmers deciding to do that must be astronomical!

    I stared at this thing for quite awhile trying to match up the code as it certainly is different. Finally figured it out: the Unix code goes to the i'th field in the structure and returns it if and only if the "index" field in it is equal to i. The Linux code instead searches and returns the first field with the index field equal to i, whether or not it is at i. Umm, this seems to be a pretty significant difference!

    This is such a load of bullshit that mr Mc Bride should be ashamed.

  • by slack_justyb (862874) on Monday July 12, 2010 @04:24AM (#32872490)
    Having done patches for some of our Linux servers, I can tell you that most structs, unions, etc... are in the order that they are specified in the man page for a given function. Same way with BSD and other open source projects that I've seen.

    If you take a look at something like (just picking something) IP, you'll notice the struct is in the same order as the actual packet. Does it have to be that way? No, but it's usually what everyone goes with.

    All the examples that I saw on the site seem a bit too generic to be called copying. I know that some of the snippets have a bit of history, aka didn't come from SCO or Linux, but I'm not that buff in kernel programming to know the difference.
  • by jimicus (737525) on Monday July 12, 2010 @04:32AM (#32872516)

    Now, open any dozen books that are 50,000 words in length. Search for strings that are duplicated between the books. Entire sentences, or phrases, it hardly matters. Just do the search. Anyone who is used to playing with databases can probably search those dozen books, and find numerous instances of phrases that were copy/pasted from one author's book to another. In fact, I'll bet that technical and factual books will have a higher incidence of matching phrases and sentences than works of fiction - but fiction will have it's share as well.

    Actually, that's not true. There is some evidence to suggest you only need a remarkably short string of words to uniquely identify a piece of English prose - it's this kind of thing that cheating-detection algorithms rely on.

    But we're talking about a structured programming language - with far more structure and rules than the English language - and the things that are at issue are by and large implementations of existing standards. The final link in TFS is a comparison of ELF utility header files, FFS. They've got to look fairly similar or they won't be any use for dealing with ELF executables! Even then they're sufficiently different that it would probably have been easier to write from scratch than it would be to execute the "copy/paste/obfuscate" cycle that is being alleged.

  • by MrHanky (141717) on Monday July 12, 2010 @05:03AM (#32872616) Homepage Journal

    Is that so? Let's see if we take a phrase from your own comment: "a higher incidence of matching phrases [google.com]". One hit. Not bothering with linking to them all, but how about "rips it from his predecessors"? One hit. "strings that are duplicated between the books"? One hit. "his programming background came directly from Unix"? One hit. "open any dozen books"? One.

    I have, of course, duplicated them in this comment, meaning there will be two hits very soon. BTW, these are all the strings I searched for, giving your comment a 100% originality rating (admittedly, I didn't search for "I'm not a coder", which I expect would show up several times).

    Duplication of whole sentences in ordinary human language is actually quite uncommon for all but the most trivial declarations and stock phrases ("Just do the search" gives 3 million hits; "Just do the twist" gives 105 000).

  • by SkunkPussy (85271) on Monday July 12, 2010 @06:57AM (#32873000) Journal

    So essentially Darl McBride took over a failing company, employed his brother as counsel. Then proceeded to embark on a huge programme of litigation until the company was dead. Thus transferring assets from SCO to his family.

    To what extent is this legitimate?

  • Re:First post (Score:5, Interesting)

    by jimfrost (58153) * <jimf@frostbytes.com> on Monday July 12, 2010 @08:26AM (#32873310) Homepage

    That's true, but in the push to get UNIX into the commercial space the SysV interfaces were released as an open specification. This was actually covered during the trial.

    The fact of the matter was that the Linux folk didn't copy code, something that would have been obvious to any observer following it's development. The idea that there were vast amounts of stolen code was ludicrous if you knew anything at all about the internal structure of the two operating systems.

    There was always the possibility of code that got injected during the large commercial code donations by e.g. IBM or SGI, and in fact the only piece of code that showed actual derivation came from SGI ... But it turned out to be both a very small amount of code and buggy to boot. As soon as people got a look at it they excised it in favor of working, original, code.

    I personally expected it to go more the way of the AT&T veresus BSD case, where it turned out that AT&T had stolen tons of code from BSD, not the other way around. The Linux emulation layer in SCO UNIX seemed a particularly likely candidate. Either that turned out not to be the case or IBM simply didn't push the issue (perhaps because SCO was having so much trouble proving anything in their claims) though.

    SCO's strategy always seemed to me to be a shakedown, scare companies into license agreements. Why they went after one of the deepsest pockets first is beyond me, IBM was very likely to fight given their investment, but it was clear early on that management was not very competent.

  • Re:First post (Score:5, Interesting)

    by Anonymous Coward on Monday July 12, 2010 @09:11AM (#32873612)

    And the idea that this key book to early '80s PC tech (still worryingly relevant today!) was somehow missing from all the bookshelves reachable by the Compaq BIOS writing department is just silly.

    You don't know what you're talking about. I was there at the time: Compaq had administrative staff remove the BIOS listings from all IBM tech ref manuals before they were given to the engineers. (This was especially easy to do because they came in the form of ring binders.)

    At one point, since I didn't work on writing BIOS code, I was assigned to be the one designated guy who could disassemble the IBM BIOS for a certain model. When the BIOS developers got stumped by a compatibility problem, they could send me a question, and I was allowed to poke around in the IBM ROM and then give a "Magic 8 Ball" type vague answer.

    Here's a bit of trivia: A few PC applications wouldn't work unless the ID string "IBM" appeared at a certain address within the BIOS code. Compaq developers worked out a way to make those bytes at that address appear in part of an actual executable code sequence instead.

  • Re:First post (Score:2, Interesting)

    by Anonymous Coward on Monday July 12, 2010 @10:08AM (#32874162)

    You just reminded me of something amusing. Back in the early 1980's I met the guy who was in charge of the Unix business at AT&T. He told me that they had a problem - because they were a regulated utility, they were not allowed to make a profit on Unix. But of course, they also wanted it to pay for itself. But the lower the price they put on it, the more companies bought licenses, so the revenues went up instead of down!

    -- not logging in because it's too much hassle for one comment.

  • Two things (Score:3, Interesting)

    by br00tus (528477) on Monday July 12, 2010 @10:56AM (#32874624)
    First, SCO released early versions of Unix [mckusick.com] to the public under a BSD-like license in 2002. So if any of that code happened to be in Linux, it is there legally. Only code added in later versions of Unix would be affected.

    Secondly, a lot of code in Linux is created to follow POSIX standards. There is code in SCO Unix and Linux which looks very similar, but the source for the material is the IEEE Posix standard, not SCO Unix source code.

  • Of course it's insane. We've had 7 years of insanity - why should it suddenly stop now?

    The latest speculation is that insiders suspect SCO will be in Chapter 7, and it's time to start making with the "plausible deniability" game because the creditors will be getting a closer look at SCO's internals.

Those who do not understand Unix are condemned to reinvent it, poorly. -- Henry Spencer

Working...