Which character coding to use for text files?

This is a discussion on Which character coding to use for text files? within the Linux General forums, part of the Linux Forums category; Using KDE and various editors I sometimes get a mess with character encoding (with German umlauts). What is the best ...


Go Back   Usenet Forums > Linux Forums > Linux General

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008
Anton81
 
Posts: n/a
Default Which character coding to use for text files?

Using KDE and various editors I sometimes get a mess with character encoding
(with German umlauts). What is the best way to keep the files consistent?
Which encoding is best to avoid problems?
Is there any central configuration (KDE,X11) apart from setting preferences
in the editor?
Reply With Quote
  #2 (permalink)  
Old 04-19-2008
Enrique Perez-Terron
 
Posts: n/a
Default Re: Which character coding to use for text files?

On Sat, 19 Apr 2008 12:55:52 +0200, Anton81 wrote:

> Using KDE and various editors I sometimes get a mess with character
> encoding (with German umlauts). What is the best way to keep the files
> consistent? Which encoding is best to avoid problems?


UTF-8 is the best, I think. It handles all known characters.

Notice, however, that no matter what encoding you pick, if you get a file
from somewhere that is encoded using a different scheme, you will still
have some mess.

> Is there any
> central configuration (KDE,X11) apart from setting preferences in the
> editor?


Yes, sort of. However, each distro has its own ways.

In all cases, it works through having your login shell execute the command

export LANG=de_DE.UTF-8

The "de" part is your language (Deutch). The "DE" part is your territory
(Deutchland). After the dot comes your default encoding.

After that, all programs started from your login shell, inherit this
setting. (Notice that the Desktop KDE/Gnome/whatever is also started from
a login shell.)

In all cases, what makes this happen, is that the shell, when it starts,
looks for a file named

/etc/profile

If it finds the file, it runs the commands contained in the file.

You can edit the file and put the above command there. However, next time
you upgrade the software, this file will be overwritten with whatever
comes with your distro.

If you understand the contents of the file you will probably find it
directs the shell to look for further files in /etc/profile.d. This seems
to be universal with all distros now.

If you add a file named "zz-local.sh" in this directory, the commands in
this file will be run after all else has been run. The value you set for
LANG in this file will stick.

Oh, yes, if you add a file in /etc/profile.d, make sure to make it
executable: "chmod a+x /etc/profile.d/zz-local.sh".

However, it might be better to play nice with your distro. For instance,
Fedora, my distro, places a small file in /etc/sysconfig/i18n, with the
following contents:

LANG="en_US.UTF-8"
SYSFONT="latarcyrheb-sun16"

Then my distro also puts a file "lang.sh" in said directory /etc/
profile.d, and this file contains the code to pick up the contents of /
etc/sysconfig/i18n. On my distro, the user is supposed to edit the latter
file. I guess there is also some entry in the desktop menu, which updates
this file. Yes indeed, I just checked it, there is a language setting in
the administration/language menu. However, this tool does not let me set
the encoding, as this is supposed to be UTF-8 for everybody (according to
my distro's ideas.)

So, look around for your distro's thingie. If you can't find out, add the
file zz-local.sh in /etc/profile.d. (The zz in the file name is so it
comes last in an alphabetical listing. That makes it run last also.)

This will set the language and encoding for all users, including root.
If you want to set it just for one user, edit the file ".bashrc" in that
user's home directory.

(Notice the dot at the start of the file name. Most of the time, files
with such names are not shown, they are like "hidden" files. However, you
can type the file name in your editor's "open" dialog.)

Again, if possible, play nice with your distro, and you avoid surprises.
Most distros have a reasonably complete user guide and installation
guide, where you can find this. Tip: search for the words lang, language,
locale, encoding, internationalization, and i18n.

In my case, Fedora, I can also place a file ".i18n" in my home directory
to override the file /etc/sysconfig/i18n.

(If you wonder about the name i18n, it is "internationalization"
abbreviated. 18 is the number of characters in "internationalization".

Regards
Reply With Quote
  #3 (permalink)  
Old 04-20-2008
Darren Salt
 
Posts: n/a
Default Re: Which character coding to use for text files?

I demand that Enrique Perez-Terron may or may not have written...

[snip]
> (If you wonder about the name i18n, it is "internationalization"
> abbreviated. [...])


No it's not. It's the usual abbreviation for "internationalisation". :-)

--
| Darren Salt | linux or ds at | nr. Ashington, | Toon
| RISC OS, Linux | youmustbejoking,demon,co,uk | Northumberland | Army
| + Buy less and make it last longer. INDUSTRY CAUSES GLOBAL WARMING.

All laws are basically false.
Reply With Quote
  #4 (permalink)  
Old 04-20-2008
phil-news-nospam@ipal.net
 
Posts: n/a
Default Re: Which character coding to use for text files?

On Sat, 19 Apr 2008 23:41:21 +0100 Darren Salt <news@youmustbejoking.demon.cu.invalid> wrote:
| I demand that Enrique Perez-Terron may or may not have written...
|
| [snip]
|> (If you wonder about the name i18n, it is "internationalization"
|> abbreviated. [...])
|
| No it's not. It's the usual abbreviation for "internationalisation". :-)

That depends on whether you are referring to en_UK or en_US.

--
|WARNING: Due to extreme spam, I no longer see any articles originating from |
| Google Groups. If you want your postings to be seen by more readers |
| you will need to find a different place to post on Usenet. |
| Phil Howard KA9WGN (email for humans: first name in lower case at ipal.net) |
Reply With Quote
  #5 (permalink)  
Old 04-21-2008
Darren Salt
 
Posts: n/a
Default Re: Which character coding to use for text files?

vI demand that phil-news-nospam@ipal.net may or may not have written...

> On Sat, 19 Apr 2008 23:41:21 +0100 Darren Salt

<news@youmustbejoking.demon.cu.invalid> wrote:
>>> I demand that Enrique Perez-Terron may or may not have written...

>> [snip]
>>> (If you wonder about the name i18n, it is "internationalization"
>>> abbreviated. [...])

>> No it's not. It's the usual abbreviation for "internationalisation". :-)


> That depends on whether you are referring to en_UK or en_US.


Ukrainian English? Are you sure? ;-)

--
| Darren Salt | linux or ds at | nr. Ashington, | Toon
| RISC OS, Linux | youmustbejoking,demon,co,uk | Northumberland | Army
| + Interception of this message for advertising purposes is not permitted.

It's worse than that, it's physics, Jim!
Reply With Quote
  #6 (permalink)  
Old 04-21-2008
phil-news-nospam@ipal.net
 
Posts: n/a
Default Re: Which character coding to use for text files?

On Mon, 21 Apr 2008 01:01:36 +0100 Darren Salt <news@youmustbejoking.demon.cu.invalid> wrote:
| vI demand that phil-news-nospam@ipal.net may or may not have written...
|
|> On Sat, 19 Apr 2008 23:41:21 +0100 Darren Salt
| <news@youmustbejoking.demon.cu.invalid> wrote:
|>>> I demand that Enrique Perez-Terron may or may not have written...
|>> [snip]
|>>> (If you wonder about the name i18n, it is "internationalization"
|>>> abbreviated. [...])
|>> No it's not. It's the usual abbreviation for "internationalisation". :-)
|
|> That depends on whether you are referring to en_UK or en_US.
|
| Ukrainian English? Are you sure? ;-)

There is an en_UA? Wow, I never knew that ;-)

--
|WARNING: Due to extreme spam, I no longer see any articles originating from |
| Google Groups. If you want your postings to be seen by more readers |
| you will need to find a different place to post on Usenet. |
| Phil Howard KA9WGN (email for humans: first name in lower case at ipal.net) |
Reply With Quote
Reply
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are Off
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




All times are GMT +1. The time now is 09:52 AM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0