Li18nux Effort Announced 121
The Li18nux effort is now underway. The group seems to have ambitious goals and plenty of members (not to mention a multi-lingual press release). The effort is supposed to lead to documentation, components, etc. in many many languages, including languages with different character sets. Japan and many other nations seem to be fairly represented.
Ha, totally useless! (Score:1)
Let me make illustrate how a Swedish user interface could look like in your scheme. Guess what the English message was!
The list is endless!
Lars
--
Old Swedish saying: The sea sucks
Lars
--
Teaching the world Chinese and Finnish (Score:1)
Li18nux is a way of free development to route around the problem of language restrictions of the computers and net (we need a joint word for these).
--
Kiina on paljon aikuisempi kieliperhe jonka käyttäjäkunta on mahdollisesti maailman laajin. Ja jolla on hyvin monipuolinen merkkivalikoima
Li18nux on vapaan kehityksen tapa kiertää tietokoneiden ja verkon (joille tarvitsemme yhteisen termin) kielirajoittuneisuus.
--
Kinesisk är en mycket mera vuxen språkfamilj och har kanske mest talare i hela världet. Och det har flera bokstäver än något annat språk
Li18nux presenterar fria programmerings kunskap att svara problemet av spräktvungande i datorer och nätverket (vi behöver ett enkelt ord för båda).
(Förlåt! Jag var tvungen
Re:thanks (Score:1)
Re:Teaching the world English (Score:1)
>English, but none are as easy to switch the world
>over to.
biggest problem would be to get americans to write proper english, instead of that bastardised slang they use. colour is supposed to have a 'u' in it.
Re:Use UTF-8 (Score:1)
Step 1: good
Step 2-4: Oh my. Big problems there. Major over-simplification
Step 5: Replace compile-time static sizeof() with run-time dynamic GetLengthOfNextUtf8CharInThisString() and get faster and smaller??? Easier to debug??? Replace pString++ and pString-- with pString = GetToNextUtf8CharInThisString( pString ) and pString = GetToPreviousUtf8String( pString ) and get smaller and faster programs that are easier to debug???
Just do proper translation to and from Unicode ( UTF-16) on entry and exit from your main program and then you can suddenly work in multiple locales via mbsrtowcs() and the like.
Yes, use UTF-8, but use it wisely.
Re:Teaching the world English (Score:1)
Excactly. At this moment english is the dominant language when people that use two different languages try to communicate, kind of like a lingua franca of the world (at least the western hemisphere). This has always been the case. In medieval times latin was the dominant language, everyone who was anything knew latin. However, you are not taking into account that english is not dominant everywhere. In parts of Africa Swahili is used for this purpose, un South America it's spanish (except for Brazil, of course). In parts of Asia it's mandarinchinese, in Papua-New Guinea Tok Pisin. The situation may be totally different from now in like 100 years, english took on this role only relatively recently.
I see no practical reason to maintain several hundred different languages and translate all the world's knowledge constantly between them and not just standardize on one.
There are lots of practical reasons. People like to use the language they are most proficient in, I for one would rather express my feelings and opinions in icelandic (my mother tongue) than english. And there are political reasons, too. If you go to f.ex. France or Germany you may find it difficult to use english 'cause people don't want to use english although they may know it well enough. This is mainly due to U.S. imperialism in the period of the Cold War. You don't gain friends by placing an army base close to a foreign capital.
Language preservation is another thing. At the moment there is a constant bombardment of english everywhere, in movies, music and computers. It's quite difficult to prevent english vocabulary from seeping into your language in a situation like that, thus risking loosing a large part of the native vocabulary. This is one reason why people like to translate their desktops into their native language. Getting giants like Microsoft to consider translations of their software is difficult, especially for languages with smaller populations. But with open source you can just take matters into your own hands and do something about it.
In the US, we have a whole lot of different cultures all living under one roof. Russian, Chinese, Japanese, Korean, British, African, Brazilian, Egyptian, Indian, and so on. Everyone seems to accept the fact that in order to participate in a meaningful way in our society, you must learn English. The culture doesn't go away just because you learn a second language but English becomes your primary language.
Methinks you're actually comparing apples and oranges. Of course, when people immigrate to a country where one language is dominant they tend to learn the dominant language or they would isolate themselves from the rest. If you are trying to compare that to standardizing a language somewhere where a totally different language is already dominant then you're on the wrong track. Not only would it be difficult, but in some areas close to impossible. In poorer countries where education is already lacking you don't make people learn english just like that. It would take years of planning and the cost would be great.
Besides, if people are ready to make an effort to speak and use their language, including translating software, who are you to complain? It doesn't really concern you, does it?
Regards,
Guðmundur Erlingsson, gudmuner-nospam@lexis.hi.is (forgot my password:-)
Re:Use UTF-8 (Score:1)
Re:Use UTF-8 (Score:1)
Internationalisation is NOT easy (Score:2)
I'm all for using Unicode accross the board - this would be an extremely good thing, but it doesn't even begin to solve internationalisation problems. The major european languages use variants of the Roman or Cyrillic alphabets and place spaces between words, as do many other languages that have been alphabetised in the last two centuries. That can be handled easily enough. However, even among these languages, there are some serious issues. Alphabetical order is different from English in many European langauges, such as Swedish, Spanish, Icelandic, and even French (just to name a few). In languages like Hungarian, Finnish and Turkish, complex hyphenation rules have to be taken into account. Indexing of compound characters can be real nightmare. And in German, ü, ö and ä are sometimes indexed with ue, oe and ae for orthographic reasons. Yet another complication: books in French have the table of contents in back. Structured publishing software has to take this kind of thing into account too.
Going global, this is just the beginning of the problems. Chinese, Japanese and Korean don't usually employ spaces between words at all, messing up line justification algorithms. Korean writers sometimes use Sino-Korean (Chinese characters) and sometimes the Korean syllabary for the same words - should they be indexed together? In Japanese, unusual kanji characters are often accompanied by hiragana characters that are either above or to the right of the kanji to aid in pronunciation and comprehension - this is called a ruby, and it's an I18N nightmare. Korean sometimes does the same thing, as do the Chinese speakers in Taiwan with their bopomofo system. Many Chinese characters have two forms - a traditional one still used by many overseas Chinese and simplified characters used in the PRC. This is another indexing problem. Vietnamese uses a very complex variant on the Roman alphabet today, but only a century ago, they too still used Chinese characters, and that double system imposes still more constraints. Also, some Japanese, Korean and Chinese texts are written top to bottom and right to left. Books are generally printed right to left, the reverse of western order. Oh, and I won't bother you with the utter nightmare of alphabetical order in Chinese. Take a look at a Chinese dictionary if you want to see it in action.
Arabic, Farsi, Urdu, Thai, Hebrew and Yiddish (among other languages) are written right to left, and Arabic and Hebrew sometimes write the vowels and sometimes don't. This is a major consideration when doing, for example, searches in documents. Allowing a terminal or any application to accept characters in both right to left and left to right (and top to bottom for traditional Mongolian and sometimes for other Asian langauges, as described above) is not very easy, and to truly internationalise, it has to be done across the board. Not to mention the frequent occaisions when both data in the local language and English or some other western language have to be mixed.
Indian and southeast Asian langauges each have their own alphabet, but they are often derived from common sources (usually Sanskrit). Although the letters in each langauge may be different, their are certain equivalencies between them that are often taken into account when indexing or transcribing. The old Indian telegraph system used to use these similarities to provide a unified code for data exchage. How is an I18N system to take these into account?
The Unified Canadian Syllabic system (used for some dialects of Cree, Ojibwa and Inuktitut) has compound letters that have to be taken into account, as well as a unique system of rotating the letter to indicate part of its phonetic value. This matters in making data entry systems (e.g. keyboards) and in alphabetisation.
And of course, it's a huge nightmare when one system has to provide for unknown combinations of languages, e.g. a Russian who needs to use Japanese, a Pakistani who does business in Chinese, or an Israeli linguist doing work in Cree.
And, lastly and even less pleasantly, an I18N project has to take into account all the existing half-assed standards for encoding and working with these langauges. There are at least three different systems for Japanese alone (EUC, JIS, Shift-JIS), and four for Chinese (GB, Big5, JIS, EUC). The dominant system for encoding Inuktitut is just a font the main newspaper in Iqaluit made up so they can put their articles on the web (Nunatsiaq News [nunatsiaq.com]).
There are no simple answers for internationalisation, and we can't just tear down all the existing software and rebuild it to work right. I hope the OSS community is up to this because this is a major project that has to be undertaken in a unified way, from top to bottom, or else it will not succeed.
Pronouncation? (Score:1)
Re:the world would be better off. (Score:1)
Language is the filter through which you see the world.
As a native English speaker, I see the world through a different sort of filter, simple, but I think, effective example.
I think of Morning, Afternoon, Evening, and Night to refer to times of the day. Spanish has only three of these words, I think.
If you eliminate a language, that way of looking at the world dies with it. A culture is robbed of it's medium, and so on.
Learning a common second language is a possibility. But, language is a cognative prison. I know I misspelled it, but try and respect the way other people look at the world. One isn't right or wrong, just different.
Bye
esperanto is simple (Score:1)
Point is, nobody wanted to use it, no one needed to, everyone spoke english. The bar actually had a decent percentage of non-natives, but english has become the second language of choice. If you want to argue this on slashdot do it in another language than english. You wouldn't get much responce, huh?
Re:Myth on i18n and "One World Language" (Score:2)
Unless one of the two is Dutch. Those folks are amazing (my old German teacher says "the Dutch have mouths shaped for languages). Many Dutch people speak four or more languages. I was at a workshop in Italy once, watching this Dutch woman I know carrying on conversions in English, German, Dutch, and Italian all at once, switching languages as she talked to each person at the table. And she doesn't think of herself as an outstanding linguist, that kind of language ability is considered typical.
Re:Should /. go with the times? (Score:1)
~GoRK
MS i18n vs. unix i18n (Score:1)
guys,
it seems that a lot of people have the idea that internationalization means language translation: certainly it does not.
the critical step right now is to get standard input methods working for a broad range of languages that works for all programs. i have used a japanese language input methods in windows, and it works uniformly in word, on netscape forms, and even in notepad, IIRC.
by contrast, emacs 20.x has its own kind of input method, which cannot be used by other programs. a better (?!?) solution: redirect standard input to a program that converts keyboard input to the language of your choice. this is how UNIX is engineered in the first place, and it's a far less klugy way than windows! but, for now, windows works much better!!
what do you guys think?
sh_
re: english usage on decline? really? (Score:1)
can you support that with reference, please?
Here's a claim [actonwebexport.com] that there are up to a 1.5 billion people using English. (350 mil native, 300 ESL, the rest learning.)
Re:Teaching the world English (Score:2)
Hong Kong's population is such a small proportion of the total population of China that it's not funny. Even then, English proficiency in Hong Kong seems to be declining.
You say that the idea is not to force everyone to speak English, but then you come up with ideas like "requiring a year or two of English", only publishing information on the Internet in English, making most software English-only, etc. etc. etc. What the hell is that if not forcing people to use English!?
Strangely enough, the people who I know in industrial nations that don't speak English seem to get along just fine. I hope to God that you're not a software developer, because you'd no doubt drag us back to the stone age of 7-bit ASCII on teletypes or something.
And what is it with your obsession with "superior" and "inferior" languages? There is NO SUCH THING! Ask any linguist - every language has its complex areas. In your first comment, you stated that:
- English has no gender-specific terms.
There are, however, significant differences in male and female speech.
- English has no formal/informal dialects.
Oh, and I suppose you speak to your boss the same way that you speak to your children, your friends, your dog...
- English has no tone-dependent words.
Maybe not, but a large portion of conversation depends on differences in emphasis. Consider "EX-tract"-"ex-TRACT", "PER-vert"-"per-VERT" or any of the hundreds of other lexical items that rely on emphasis to distinguish between noun/verb forms. Also remember that without emphasis, it becomes extremely difficult to recognise sarcasm, humor, etc. (Just look at all the misunderstandings that occur on
- English has no time-dependent words.
Look at English's use of tense and compare it with that of Japanese, Chinese, the Polynesian languages, or many others. They are all simpler in structure.
I'm really not sure what else I can say to someone who seems determined to ignore the right of 80% of the world's population to speak the language that they choose to speak. Grow up and try leaving your own country once in a while.
Re:Compiler Concept (Score:1)
Take a look at the GNU gettext system. It's not automatic, but it provides a very easy way to replace all text messages in a program with those from a particular locale - and you don't have to recompile every time you want to use a different language.
AmigaOS locale system (Score:2)
When AmigaOS 2.1 was released, it had something called locale. This was basically a system where a small catalog of the text used in any application was stored in a file, and could fairly easily be translated to other languages. The user just selected their language of choice in the locale preferences, and any application that had the appropriate locale catalog available would load it up and be presented in the user's language.
It seemed to work quite well... Well, it had problems, such as large differences in lengths of translated words would cause nasty UI mess-ups. But we're talking 1992-1993 here, so it was a fairly nice thing to have that early. From a programmer's point of view, making your applications 'locale-aware' was very simple, it involved a few minor changes to your code, and then running a tool over your code to extract the locale catalog information. You ended up with a program which was written to use a native language by default, and then would support any language for which the translated locale catalog was available on the system.
I don't think the system had support for non-latin character sets (I could be wrong, I only played with it briefly), so it wasn't the best solution, but it was fairly impressive for it's time...
How very chauvinistic! (Score:2)
Spoken like a true English-speaker. Probably from the USA, right? Sigh.
For every mod hip progressive netizen who thinks that English is the obvious World Interlanguage, and oh, wouldn't it be better if all those misguided little multi-colored and warring peoples in those places with unpronounceable names just stopped using that yip-yap jibber-jabber they call a language, at least in public, and start speaking The Obvious Choice, English, there's another person who thinks roughly the same thing about French, or Chinese, or Russian, or any one of a hundred other of the world's languages.
As for what would be easier, Esperanto would be a good candidate. The Esperanto community has some pretty convincing data that it's actually easier to learn than most other languages, even for those whose native tongue is not derived from the same Latin that Eo is. But hey, that'd mean that you would have to bend your poor little brain around something new, and we couldn't have that, now, could we? Better that the wogs learn to talk proper, eh?
Sorry, you probably deserved only about half of that. I'm calmer now. On the odd chance that you or others would care for some facts in place of my ranting, see esperanto.net [esperanto.net] for some actual details about a real alternative.
Re:No, i18n == internationalization (Score:1)
Linternationalizationux project?
Holy cow!
Re:Teaching the world English (Score:2)
English could just as easily be Spanish or French or Russian. It really doesn't matter. I chose English simply because it is used a lot in the global setting.
I see no practical reason to maintain several hundred different languages and translate all the world's knowledge constantly between them and not just standardize on one.
In the US, we have a whole lot of different cultures all living under one roof. Russian, Chinese, Japanese, Korean, British, African, Brazilian, Egyptian, Indian, and so on. Everyone seems to accept the fact that in order to participate in a meaningful way in our society, you must learn English. The culture doesn't go away just because you learn a second language but English becomes your primary language.
Now, one might point out that you do loose a piece of yourself here and it is true. I being a white middle-class male have pretty much no specific ethnic culture to speak of and I'll be honest, I don't see it as a big loss.
One more thing, during this discussion I've tried to be sensible and practical as possible and just throw out some ideas which I'd hope people would comment on with an open mind.
Please try and refrain from making personal attacks when criticizing posts.
--
Re:How very chauvinistic! (Score:1)
-----------
"You can't shake the Devil's hand and say you're only kidding."
It's not gnome that does that. (Score:1)
Know rather than guess (Score:1)
I know you are wrong. Good guess, and one shared by a lot of other folks. It has been shared down through the ages by a lot of people about a lot of languages--Latin, French, Russian, and so on. But it was wrong then, and is wrong now. Also, your assertion that English "has a very simple grammar" is quite wrong. As is your assumption that "Sooner or later most of the world's population will acquire this international English as a second language." English usage is actually on the decline worldwide.
But you don't need to continue guessing and getting it wrong--these issues have been studied pretty extensively, and there's a lot of hard data and experience to back it up. Do some reading and speak from a position of strength next time, okay? A good place to start is on one of the many Esperanto sites [esperanto.net] around the net. Those folks have a strong interest in comparative linguistics, and have good links to the straight poop.
Re:Lacking in content? (Score:1)
export LANG=sv_SE
in your .bashrc on your favourite Linux distro and get a lot of program output automatically translated for you (I thought your native tongue had to be Swedish, just like mine, after reading your user info ;)
Hence, in case you want your software translated, you may try to use this translation effort.
Another comment regarding this story: I know that i18n is a huge and complex process, and it has to be divided into several sub-processes and efforts, but i still hope that there won't be a lot of new other translation APIs and translation groups that won't work together in their efforts. I hope this Li18n initiative will act much like a common resource for Linux Software translation, and work with those translation efforts already existant. We don't need the translation work done twice by different groups.
Today there is already the Gnome i18n group, the KDE i18n group, the GNU/FSF translation project, the various documentation translation groups, the Mozilla i18n project, and various other groups, and although they don't work with the same translations, I wish they could work more closely together, without duplicating efforts with translation APIs, translation software, documentation for translators, etc. Just my thoughts.
Superior language (Score:1)
Cool list. It is interesting to note how Esperanto stacks up against your criteria. Check it out [esperanto.net] for yourself and see.
Re:Myth on i18n and "One World Language" (Score:1)
1) English is the second most spoken language, with Mandarin Chinese first and Spanish third.
2) English speaking is far, far more widespread than that of Mandarin, which is prevalent in China only.
3) English is the language spoken between air planes and air traffic controllers throughout the world.
4) Most technical papers are now put out in English, and most of the crap on the internet is in English.
5) If two people from different countries without a common language meet, they are most likely to start trying to communicate in English.
Don't get me wrong; I think everybody should be multilingual and have respect for multiple languages. I personally have some fluency in Spanish and Japanese in addition to English. Where it gets really interesting is in places like India, where English was acquired from the British, but is being slowly altered to be their own.
--------
Hi, Jim, nice to meet you here (Score:1)
Re:Myth on i18n and "One World Language" (Score:1)
thejeff
Re:Teaching the world English (Score:1)
-----------
"You can't shake the Devil's hand and say you're only kidding."
Re:Know rather than guess (Score:1)
Re:Stark? Esperanto's where it's at... (Score:1)
More to the point, one of the most important reasons for internationalization is because regardless of whether users can understand menu items and error messages, they want to be able to use text in their native languages. If you have a database of clients, you'll want their names represented in a form that the clients and the post office will recognize.
As for believing you about Esperanto, I'm the Team Leader of the Esperanto Translation Team for the Free Translation Project. I can't think of any better way that I can support it than to make it easier to use free software in Esperanto and for editing Esperanto text. One recurring theme in soc.culture.esperanto is the complaint about the lack of software support for the Esperanto alphabet.
Re:How very chauvinistic! (Score:1)
world, but if there are other Esperantists reading this who would like to help out the Free Translation Project, the Esperanto Team can use you help. The mailing list for the team is down right now, so e-mail me at dsplat@rochester.rr.com for details. English or Esperanto queries are welcome.
Re:MS i18n vs. unix i18n (Score:1)
I simply think that you should be able to input text into *any* program in *any* language -- it should be transparent to the application. I guess that means reworking X a bit (?). It would seem rather silly for each particular widget set to support an IME on its own.
Re:i18n == international?! Please! (Score:1)
Well, for those doing any work at all in the field, those terms are very well established. Since the project is initially to coordinate all those working in the field to be unified on Linux, it seems like a very good choice of names. See their charter [li18nux.org] for why I have that impression.
For The Java stuff, try the Java documentation itself. http://java.sun.com/products/jdk/1.1/docs/guide/in tl/index.html [sun.com]
...eton suoires a nO (Score:1)
Due to the uncertainty you claim inherent in the grammar being used here, I'd expect you to expect others to be confused by your post.
Well, infer what you want, but the person speaking here says these words have meaning.
Not to be too contrary, but Latin is a dead language. No one still living even knows how those words were pronounced.
Re:MS i18n vs. unix i18n (Score:1)
i know. i wasn't responding to your post in particular there. nevermind.
about needing japanese version of windows--no, you don't. i downloaded some program (i can't find it now, and i'm not at my home computer, but it also lets you view japanese characters on webpages), installed it, and could immediately write to a lot of apps including notepad.
about your point about inputting text into any program in any language (which i think is the main point for both of us), i definitely agree. i don't know if it should be x's job, or be built into the kernel (i don't like this way), or if it should simply be a command line program that uses pipes to talk to any application (like by redirecting stdin).
sh_
Re:Use UTF-8 (Score:1)
I would estimate that 99.5% of all the processing done on character strings will work with UTF-8 if they just treated bytes with the high bit set as "letters".
For instance there is absolutely no reason strlen() has to return any value other than the number of bytes in a string and it should be obvious that you don't have to rewrite it for this. Take a look at every use of strlen you can find and see if it is ever used for any purpose other than to measure how much memory is needed to store the string. I think you will find there are zero other uses.
Though less obvious, the same thing is true of almost every other operation done in C on characters.
Another way of looking at it is to remember that English (and most other languages) are made of "words" and these "words" are already stored as "multibyte sequences". But writing code that does not break up words (which would make it unreadable) apparently is not too hard, and nobody seems to thing that "p = nextword(p)" is a function that needs to be in the C library.
If you are still convinced that having fixed-size storage for each character is important, the other killer is that wchar does not solve it!. Look up "combining characters" and you will see.
I am sorry about ranting, but imho "wide characters" are one of the most stupid things ever invented, they are pushed by people who who fear being politically incorrect (ie "don't show a bias toward English") and have made it impossible to do internationalization where we can just look at a text file and see it in the language it was written.
Re:MS i18n vs. unix i18n (Score:1)
Arg! That's right... I forgot all about those! Might you be talking about AsianSuite or AsianPack?
There would probably need to be some sort of conversion engine [library] that programs could link against. Then, the same core functionality (converting kana to kanji) can be handled by the library, but the application (or even another add-on library) could handle displaying the candidate list, etc... Maybe GNU readline could be modified to do this. As far as X goes, I don't even know where to start! There are too many widget sets out there, and if the IME were a part of X, it would probably feel awkward. The kernel shouldn't really be involved because it needn't be tied to X.
Admittedly, I'm short on ideas about this. :)
Re:Pronouncation? (Score:1)
thanks (Score:1)
Li 18 nux (Score:1)
:)
Should /. go with the times? (Score:2)
_____________
Gnome i18n (Score:2)
I realized I could do things like,
$ LANG=es_ES gnome-help-browser
or even run my whole X session in another
language,
$ LANG=fr_FR startx
Not everything is translated, but its
still pretty impressive.
Sounds interesting (Score:2)
I think it will be difficult not to run into that again because a great part of the software for Linux is written by single persons, who do not necessarily have the time or the resources to translate all of their software into multiple languages. The Open Source concept of course comes in very handy here (Everybody can read the code and translate the text messages himself) but I guess there will always be some programs which won't be available in your preferred language.
I appreciate the effort which will add even more popularity to Linux but personally I will always stick to the English version of Linux.
Why don't they just learn english like we did? (Score:1)
Re:Should /. go with the times? (Score:1)
Re:Should /. go with the times? (Score:2)
The best part about the language stuff in HTTP is the 1st, 2nd, 3rd preferences... I have mine set to show me english first, then japanese, then spanish, which is the order in which I can comprehend languages from perfect to something that I can read most of to something that bablefish can machine translate!
~GoRK
the world would be better off. (Score:1)
Re:Should /. go with the times? (Score:1)
However, at some future point when neural net computing has been developed much further than at present, a computer may be able to do a much more acceptable job of translation (since it will act like a human mind or even better than a human mind in being able to interpret these nuances and appropriately translate them).
\/my eight pennies/\
Re:thanks (Score:1)
Um... it's afternoon where I am. Perhaps you should think about the fact that not everyone is on US time before posting to an article about i18n.
Re:Lacking in content? (Score:1)
The local i18n projects are much friendlier.
CJK (Score:2)
linuxi18n.org (Score:1)
Compiler Concept (Score:1)
Linux locale system (Score:1)
It's beeing distributed by default with many Linux distros (try man gettext) and is essentially a tool for software developers to make their software translateable. This is essentially done by storing all the program output strings in a file, making it easy for translation.
After a software package is translated into a language and sent pack to the developer, it is distributed by default with all instances of that software, and all the end user has to do is to set an environment variable (in bash export LANG=xx, where xx is the language code, like ja for japanese, sv for swedish etc.) to get all output on his system in that language, in case a translation exists.
It works very well, except from the sad fact that far from all software is translated, not to mention the documentation.
Myth on i18n and "One World Language" (Score:2)
It's not a direct replay to the article, but I just want to point out two myths about the subject.
It doesn't. Original goal of Unicode was to 1) provide virtually unlimited number of characters needed to express textual expression in any language, and 2) to integrate namespace of each encodings, so whereever the character is located, you can tell what that character really means (existing encoding scheme "switches" mode by context, so this can't be done).
As it turned out that Unicode failed to accomplish BOTH, although it is superior in some part compared to current scheme, it has solved nothing in concept. I know most ASCII-only people who would probably never experience problems on this doesn't care, but I just hope people stop saying "Unicode is the land of promise" type marketspeak.
If you're talking about population, then everyone should be forced to speak either Chinese or Spanish by now...Although English seems to be dominant language in some world out there (which probably includes only USA, Austraria, and (part of) Europe), things are different on the Earth in whole.
I sometimes wonder where do people really mean when they use the word "world"...
Re:Teaching the world English (Score:1)
The world certainly needs a universal language to communicate in and I think English has already become this universal language - a rather simple, dumbed down version, but enough to get along with under most circumstances. Sooner or later most of the world's population will acquire this international English as a second language. That won't have much effect on the use of the respective national language. And it won't replace the need to translate from English to these national languages if you want acceptance in a foreign country. If someone wants to use Linux and he has to revert to his rather sketchy knowledge of English in order to read a man page of a command he doesn't know about either, he will probably never switch to Linux if he doesn't have to, because he can get another, simpler to use OS in his native language.
Re:Teaching the world English (Score:1)
//rdj
Re:Linux locale system (Score:1)
I also don't know if this tool works with languages other than C. I'm just a translator, not a software developer... ;)
Re:It's not gnome that does that. (Score:1)
--
Re:Teaching the world English (Score:1)
Before you ask me to do that, perhaps you should refrain from making comments that most people would view as a troll.
The Free Translation Project (Score:1)
Stark? Esperanto's where it's at... (Score:1)
Derived from European languages, it was designed to be a simple, effective language for use in business etc, and to allow easier access to foreign language teaching resources for those whose native tongue was not widespread. It was also mooted to be an official language of the EC (where did that other E go from EEC, anyway?), but everybody realised English was better since everybody spoke it already. Except the French, who insisted that their language be adopted too, followed by the Germans. Way to go, EC dudes!
Well, if you don't believe me about Esperanto, here's a quote from one of the finest literary minds this century:
"My advice to all who have the time or inclination to concern themselves with the international language movement would be: "Back Esperanto loyally." - J.R.R. Tolkien
On a serious note... (Score:1)
-----
Misconceptions about Language (Score:1)
That said, allow me to say quickly what I believe qualifies me to say a little bit (or maybe not so little
1. Language !== Writing
Many (most) people from literate cultures tend to make the mistake that writing is language, in the sense that a statement about the writing system used in a particular language is applicable to the language itself. In fact the vast majority (ca. %90) of all living (app. 4000-7000) languages have no written form other than in descriptive systems used by Linguists. Spoken language evolved (unless you're from Kansas
Example: Someone above mentioned something about languages using the Latin alphabet and equated it to Latin-descended languages.
Counter-Example: Hungarian, Estonian, Finnish, and various Eskimo languages (to name a very few) all use the Latin alphabet. None of these languages is remotely related to Latin or English. Russian, the various Caucasian languages (e.g. Udi), and a number of Finno-Ugric languages (e.g. Khanty) all use the Cyrillic alphabet; however, Russian is related to English (both belong to the Indo-European family), Khanty to Hungarian (and more distantly to Estonian and Finnish), and Udi only to the other Caucasian languages.
2. Language X is Superior/has simpler Grammar/is easier to Learn/is more elegant/is more sensible/etc. than Language Y
It's all relative. Learning German is easier for an English speaker than a Finnish speaker because German and English are closely related (both Western Germanic); likewise, learning Hungarian is easier for a Finnish speaker than an English speaker since the large number of postfixes and postpositions to express case isn't as daunting to her since Finnish uses a similar system. And that's completely ignoring natural talent, previous exposure to other foreign languages, etc.
The fact of the matter is, that languages are vastly too complicated to make any blanket comparisons. Think about doing a regex search in Assembly and writing directly to give processor registers in pure Perl to get a very rough idea of what I mean (and this comparison really doesn't do justice to natural languages).
Hmmmm.... it's getting a little long here and I need to do a little work before my boss gets annoyed, so I'll continue later...
Chris
Re:Use UTF-8 (Score:1)
Basically, if you go wchar_t or 16-bit (not always the same), the algorithms are simpler and the code itself is smaller. The only thing that should be any larger are static strings and memory use. Then again, you shouldn't have a lot of fixed strings in your program to begin with. :-)
If you went to purely UTF-8 internally instead, you'd end up with the problem of needing more complex handling code, and thus a larger and slower program. Also, if you use wchar_t then you can easily go to and from whatever the local encoding is. With UTF-8 internally you'd have to either write code for all that yourself, or do a conversion from local encoding to wchar_t and then from that to UTF-8.
strlen() is quite often used to count characters. There are times when this is the proper use, so wcslen() is used instead. Just as fast, or maybe even faster as on Linux wchar_t is int which by the C language specs is what is most efficient for the processor.
Your caution about combining characters is a good point. There still are issues, but in practice your code should encounter those fairly rarely, and thus should still execute quickly. But still, this needs to be accounted for. (And often just converting those to the pre-composed form is a viable option)
From working with stuff over the last 7 years including Chinese, Japanese and Korean, I can say in the stuff I've done and seen, Unicode makes it a whole lot easier. No, it doesn't solve everything, but it does make a lot of things much easier.
As always, the proper thing to do is carefully anylize your needs and then implement what makes sense for the specific application. I just recommend trying to stick with more straightforward code and only 'optimize' when needed as determined by actual performance measurements.
Re:Teaching the world English (Score:1)
I'm a Scot, and you can tell what part of Scotland someone is from by their accent in most cases (the Aberdeen accent is very different from the Edinburgh one, and the Glasow one also). None of these sound like Scouse (Liverpool), or people from the West Country (Bristol etc.). I'm at university in the South, and here accents are noticeably different once again.
For example a Scot would in general pronounce "bury" "buhry", and someone from the South of England would say "berry".
Of course, all of these are gross stereotypes.
None of them sound like Received Pronunciation, which is probably what you think of as English, and even _that_ is nothing like the Hollywood version of the "English" accent.
Interestingly, most people in the UK seem to appreciate the massive variation in accent across Northern America (the US and Canada).
So stating the existence of a "british" accent is somewhat fallacious. However, British English is somewhat different from American English in many grammatical aspects (trivial examples being in spelling, such as colour/color and flavour/flavor). In general, British English seems to have absorbed more in the way of external influences; this probably has more to do with the proximity of other languages and the country's seafaring history than anything else.
On a computing level, the keymap of UK keyboards is somewhat different to American ones, with (for example) the £ (UKP) symbol on shift-3, and ~ and # next to return on the right-hand side. As you can see, therefore, internationalisation is necessary even between dialects of the same language!
More Unicode Info -- please! (Score:1)
Unicode does not seem to address the input of characters. Does anyone know of any good input methods for Unicode?
Canadian, eh? (Score:1)
Huh?! I didn't know we had our own language! I thought us Canadians spoke Enlish / French / German / Polish / etc.
Old Canadian Joke:
Q. How do you spell Canada?
A. C, eh? N, Eh? D, eh?
Cheers
Re:MS i18n vs. unix i18n (Score:1)
i am not sure, but now i have the idea that this works with anything written with microsoft foundation classes or something like that. (i coule look it up, but at this point i'm too lazy...) it's not a bad deal for developers since you don't have to build your apps to be i18n aware...
anyway, this is a huge thing for linux right now... when i went to japan in '97, windows was already deployed there in people's homes. that's the entire system, every program written with support for japanese. and now, W2k is supposed to be written "languageless" which certainly sounds like a good thing. anyway, i'm talking too much about windows here, but that's prolly because windows is definitely out in front on this front (to make a pun and mix a metaphor). oh well, maybe the unix guys, and especially the free unix guys, will get into the game soon. here's hoping.
sh_
i18n == international?! Please! (Score:1)
Re:the world would be better off. (Score:1)
--Brian
Use UTF-8 (Score:1)
Step 1: Use UTF-8
Step 2: Use UTF-8 for everything.
Step 3: If you think you need to use fixed-sized characters, think again and use UTF-8.
Step 4: search for and delete all remaining occurances of "wchar" and replace with UTF-8
Step 5: delete C++ string and stream templates that think the storage unit has to be bigger than 8 bits, making them far faster and smaller and easier to debug. Delete all "if" statements that test the "size of a character". Delete all interfaces that take "strings" that are made of things bigger than bytes. Watch your code get far, far smaller and faster and suddenly work in multiple locales!
Step 6: Did I forget to mention you should use UTF-8 for everything!
Thank you, I hope my instructions have been helpful in giving you a clue.
No, i18n == internationalization (Score:3)
That is, "i" + 18 letters + "n".
Cheers,
ZicoKnows@hotmail.com
Huh? (Score:1)
Why is this unfortunate? It's helped me a tremendous deal when communicating with Japanese friends (My Japanese is improving, but still not great -- the more I can practice it in real life situations, the better). Anyway, it seems to me like Microsoft's doing a great thing here and I'm not seeing what you have against it.
Cheers,
ZicoKnows@hotmail.com
Re:Use UTF-8 (Score:2)
And yes, you are a troll here.
The Language of Love... (Score:1)
_____________
Uh, that would be the UNIX locale system (Score:1)
Locale creates a layer of abstraction that allows you to work with character types, alphabetic sort order, and representations of numbers, dates and currency. So you don't have to know that in German, 'ä' is an alphanumeric that gets sorted after 'a' and before 'b', the decimal point is a comma, and the day of the month comes before the month sepatated by a dot. You just set your locales to 'de', and the system functions do it all correctly.
But those are only some of the problems surrounding I18N. Locale doesn't and can't solve them all. Setting up a non-US keyboard after installation, for example, is always a problem, and it's not covered very well by existing documentation.
Re:Huh? (Score:1)
I just wish it was the Unix community who had this problem solved first.
Re:i18n == international?! Please! (Score:1)
The Mozilla site has a pretty good introduction [mozilla.org] to the goals and problems of I18N and L10N.
Re:i18n == international?! Please! (Score:1)
Re:Myth on i18n and "One World Language" (Score:2)
2. You forgot to include large parts of Africa and South Asia in the part of the world where educated people speak English.
Re:the world would be better off. (Score:1)
Actually, the French Govt has passed laws about usage of foreign language (esp. English) in the national media, everything from what the radio plays to actual words used to describe things eg "computer" is "ordinateur" and must always be referred to as such...
If we tried that in England, we'd immediately be denounced as the worst sort of racist scum, the rabid "little Englanders". But we've never had that in our history, anyway - look at how many English words are of continental descent... eg Rendez-vous etc... That is one strength of the language - I think English evolves at a faster rate than most other languages...
Industry Consortium? (Score:1)
From the web site, li18nux looks like the kind of industry consortiums we knew from the commercial world, where a number of big companies get together to create some new technologi.
Notheing wrong with that, I just wonder how it will work in the Linux world.
Re:the world would be better off. (Score:1)
"Stark n'est pas parlé ici. Veuillez utiliser Starque à la place!"
Re:Teaching the world English (Score:1)
And as for the totally arrogant attitude of "English is a superior language", that is just such complete crap that I'm not even going to start to refute it.
You propose that everybody speak English - well, if somebody came to your home tomorrow and told you that from now on, your children would be educated in Japanese and everyone in your house would be forced to speak it, what would you say to them? Think about it for a while.
Re:Teaching the world English (Score:1)
China's highly educated population (Hong Kong, etc) also has a large percentage of English speakers.
The idea is not to force everyone to speak English, only to know it so that one you meet someone who speaks English or you need to use a computer, you can do so and receive a first-rate experience.
The phase out would be gradual; require all students to take a year or two of English, only publish information on the bulk of the Internet in English, make most software English-only, submit scientific articles in English, submit technological breakthroughs in English.
Really, most of this stuff already happens by itself. I really don't see how people in industrial nations can live without knowing English and not be a step behind the rest of the world.
There are several languages that are superior to English, but none are as easy to switch the world over to.
--
Re:Teaching the world English (Score:1)
Re:Teaching the world English (Score:1)
Lacking in content? (Score:2)