Linux and OSS to Aid the Library of Congress 63
flakeman2 writes with a link to Linux.com article about Linux's new role at the Library of Congress. The national archive of books is looking to begin an ambitious digitization project, aimed at getting some rare and crumbling documents into the public record online. These will include "Civil War and genealogical documents, technical and artistic works concerning photography, scores of books, and the 850 titles written, printed, edited, or published by Benjamin Franklin. According to Brewster Kahle of the Internet Archive, which developed the digitizing technology, open source software will play an 'absolutely critical' role in getting the job done. The main component is Scribe, a combination of hardware and free software. 'Scribe is a book-scanning system that takes high-quality images of books and then does a set of manipulations, gets them in optical character recognition and compressed, so you can get beautiful, printable versions of the book that are also searchable,' says Kahle." Linux.com and Slashdot.org are both owned by OSTG.
Re:Hmm... (Score:5, Informative)
"the Internet Archive has migrated Scribe entirely to Linux, and Windows support has been dropped."
Seems focused on Linux to me.
Help the Library of Congress save American History (Score:5, Funny)
Re:Help the Library of Congress save American Hist (Score:1, Funny)
Re: (Score:2)
Re:Help the Library of Congress save American Hist (Score:1)
Re:Help the Library of Congress save American Hist (Score:2)
They tried with Windows first...;-) (Score:1, Troll)
Suffice to say, they settled with Linux. The Microsoft version had psychic powers, apparently!
more info (Score:5, Informative)
It's only natural (Score:3, Funny)
Re: (Score:2)
Re: (Score:1)
Re: (Score:1)
No matter how you look at it (Score:3, Interesting)
Arguably one of the most important repositories of information in the U.S. is about to be available via OSS software and not MS products. For all the efforts that MS put out in Mass. this has got to be a kick in the face! Just wow!
The most important part is not free software (Score:5, Interesting)
Oh, it will be. (Score:3, Interesting)
Re: (Score:2)
Good thing I decided to comment before reading the article.
All copyrighted works should be held (Score:4, Interesting)
The revisions to the law would not be infringing freedom of speech, in fact by allowing the free copying of works that did not further the arts or the sciences it would be limiting copyrights impact upon the freedom of speech. If people are really concerned about the quality of content, they should remember that eliminating the profit motive will have a substantial impact upon the amount of questionable content that is out there including movies, music, pictures and literature. Most of the members of the RIAA and the MPAA have a total disregard for the harm their content cause to society, let them feel some of the pain, wipe out the copyright protections on some of their more divisive content ;).
Re: (Score:2)
Way to kill the entire non-fiction genre.
Re: (Score:3, Funny)
Sir, you pre-suppose that morally questionable articles do not serve to further the arts or the sciences. I protest most heartily. Every modern technology is served by serving pornography, as we all know. Let us firstly ponder the case
Re: (Score:2)
When Shakespeare was writing his work, do you think he thought 'I'll improve the arts and be known throughout the ages as a great writer' or do you think he merely enjoyed his work and liked the money? At the time, I'm sure nobody thought his work even a fraction as important as we now think it is.
So how are we to judge works of today? We obviously
Re: (Score:2)
Should work that attacks family values be protected by the tax dollars that are taken from families. Copyright protect
Re: (Score:2)
Re: (Score:3, Insightful)
Uh-uh. Let's repeat the same errors from the past, keeping what the current generation deems "of excelent moral quality", and censoring everything else, just like some works of Michelangelo [wikipedia.org] were. People must to remember, what is of q
Re: (Score:2)
Re:All copyrighted works should be held (Score:5, Insightful)
Stop right there.
When the purpose of your organisation is, to put it in very simple terms, "catalogue everything", you can't start making exceptions on moral grounds on the simple basis that what constitutes "questionable moral quality" today may be totally different tomorrow. Furthermore, who gets to define "questionable moral quality"? The closest anyone's ever come to creating such a definition is to say "Well, I can't actually come up with a concrete definition but I knows it when I sees it".
Re: (Score:2)
I support free speech, 'FREE' as in 'FREE', you want society to allow your to generate a profit at society expense, then you are
Re: (Score:2)
The one copy that goes to a nation's library hardly constitutes a great profit at societies expense.
And the whole point of my post was that society changes. What may be considered perfectly acceptable today may not have been 100 years ago. Pre-marital sex immediately springs to mind, but I'm sure the
Re: (Score:2)
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
You don't have to be a lawyer, and you are mistaken - just look at the Copyright Office [copyright.gov] website. Simply creating a work in fixed form copyrights it [copyright.gov]. If you want to be able to prove it in court later that you are the creator of said work, however, it's best to register your copyrights with the Library of Congress. It used to be that you were required to put a copyright notice on your works lest you could lose the copyright, but that's no longer true.
Re: (Score:2)
Re: (Score:2)
There's nothing in copyright law that provides for the "poor man's copyright registration" you're talking about, and I don't think there are any cases in which it proved someone the true owner of a copyright. All it proves it that the work was made into a fixed form at some time. It's an easily forged method - you can just mail yourself an unsealed empty envelope, fill it with whatever you want when you receive it, then seal it - and any attorney you're up against will certainly bring that up in the court
Re: (Score:2)
Re: (Score:1)
Re: (Score:1)
That's why it's modded as flamebait.
I mean, honestly, putting communism and homosexuality together? Hah.
Scribe? (Score:2)
Re: (Score:2)
Re: (Score:3, Informative)
http://sourceforge.net/projects/scribesw [sourceforge.net]
Re: (Score:3, Funny)
Excellent project!!! (Score:3, Interesting)
The question now is: would they accept technical contributions from the public (I mean, OS geek communities), just like other open source projects? I know a lot of people would be eager to join. How about a SETI-like system to harness the power of desktop computers around the world to help with image processing and OCR? Hey, I got 4 decent desktop computers that can contribute at least 8 hours/day each.
How much data? (Score:2)
Full spectrum scanning (Score:2)
Sure, I imagine most of the consumption in the future will be done in a digital environment, but it would be nice if future generations had t
The sad part of digitization. (Score:5, Interesting)
Eventually we will have no physical record of these writings and may someday learn from the digital copies that Benjamin Franklin, George Washington, and others had offered enthusiastic support for wiretapping and other forms of electronic surveillance [huffingtonpost.com].
OCR software is still closed source (Score:1, Redundant)
The OCR software from Scribe is still closed source.
What OCR-Engine do they use? (Score:1)
How much? (Score:2)
Replacing paper documents with digital documents.. (Score:2)
Re:Replacing paper documents with digital document (Score:1)
Re: (Score:2)
Quality as well as quantity, please (Score:4, Informative)
It does, of course, vary a lot depending on the style of image. Bold illustrations for children's books, for example, do better at, say, 800dpi greyscale or colour. Fine steel engravings with lines at, say, less than a tenth of a degree from horizontal (they were done by hand after all) and that come out only a couple of pixels wide even at 1200dpi just turn into gray mush with weird banding artefacts until you go to a higher resolution (I use 2400dpi). There's a widely-cited study indicating that an "ultra-high" scan resolution of 400dpi is more than sufficient, based on an extremely small sample of images.
The damage that's done by poor quality digitization is that it makes it harder to justify doing a better job in the future.
Oh NOESZZZ (Score:1)
Just in case you're immeasurably thick.
What they don't mention... (Score:1)