Check out the new USENIX Web site. next up previous
Next: Examples Up: GNU Mailman, Internationalized Previous: Unicode

Other Issues

There are some operational issues that need to be addressed for an internationalized application such as Mailman. Care must be taken when marking the source code for translation so that the text is split in a grammatically clear way. For example, whenever possible full sentences should be used, since translating sentence fragments may not be possible in all languages. Also, plural forms and genders pose particularly thorny problems. Python 2.3's gettext module supports plural forms, but only alpha releases of Python 2.3 have been made available as of this writing. English doesn't have gendered nouns, and sometimes, English text source strings need to be rewritten to accommodate translators.

Python supports a number of specific character encoding ``codecs'' in the standard distribution. While Python has built-in support for most Western codecs, Asian codecs in particular are not supported. Fortunately Japanese, Korean, and Chinese codecs are available as third party distributions.

Internationalization is a lot more than simply translating strings; many other values from currencies to dates must also be localized if they are to be displayed correctly for a particular language or country. Long term goals include wrapping IBM's ICU library [ICU] in Python.

While internationalization imposes some performance overhead, the effect is negligible. In an application such as Mailman, the performance of the mail server that Mailman feeds messages to, the network bandwidth, and the performance of the operating system and file system have a far greater influence on the performance of the system than does the Mailman software. Internationalization has imposed no perceived performance penalty.

Internationalization has increased the size of the software distribution, since by default the download contains the message catalogs for all supported languages. The current catalog contains over 1200 message ids and is approximately 228 KB in size. The translated and compiled catalog files are from 80 to 300 KB in size depending on the completeness of the translation. In all, the message catalogs themselves add approximately 16 MB to the uncompressed program source code. The templates add about another approximately 3 MB. For this reason, future releases of Mailman may provide an English-only distribution, with separately downloadable language packs.


next up previous
Next: Examples Up: GNU Mailman, Internationalized Previous: Unicode
Barry Warsaw 2003-04-08