
On Thursday 30 April 2009 10:00:41 Detlev Zundel wrote:
I wasn't even thinking about language comments but simply names especially in git-commit messages.
i have no problem with localized names. i do have a problem with localized comments and source code.
Ok, so we're more or less in accord - remembering that names also show up in git-logs and thus may force an encoding on the patches.
the encoding is going to be forced regardless of location of the unicode chars. and in the case of mailman, it's going to get base64 encoded.
Obviously I'm lucky that I don't have an umlaut in my name, but if I had, what would I do?
currently, like every other person with an umlaut, you use an "e". ö -> oe.
This was a solution of the 1900s, but I was under the impression, that modern systems can cope with non-ascii characters, no?
if it means the patch gets base64 encoded, then Wolfgang isnt going to accept it. so that method is still relevant.
if mailman were fixed, then i imagine Wolfgang wouldnt have a problem anymore (assuming he even noticed) ...
Resort to latin-1?
I was unclear - what I meant was to use a "iso 8859-1" encoding without explicitely stating it, i.e. that the > 0x7f byte is from 8859-1?
presumably you mean ISO 8859-1, and in that case, that character set is the same as in unicode (it was designed that way).
Well it may be at the same unicode position, but the encoding in byte strings is still different of course. The 8859-1 umlaut "ä" (= 0xe4 in 8859-1) is encoded as 0xce 0xa4 in utf-8 encoding which seems slightly different.
i meant it's forward compatible. any character encoded in ISO 8859-1 will work properly with unicode by design. i didnt mean that unicode encodings will be single byte and thus the same as ISO 8859-1.
this charset btw covers most European countries. -mike