
On 31/03/2019 19:28, Alexander Graf wrote:
Hi Simon, Alex,
On 31.03.19 04:18, Simon Glass wrote:
Hi Andre,
On Fri, 22 Mar 2019 at 19:32, Andre Przywara andre.przywara@arm.com wrote:
The character set used by U-Boot's built-in fonts is the old "code page 437" (from the original IBM PC). However people would probably expect UTF-8 on a terminal these days, the UEFI code definitely does.
Provide a conversion routine to convert a UTF-8 byte stream into a CP437 character code. This uses a combination of arrays and switch/case statements to provide an efficient way of translating the large Unicode character range to the 8 bits used for CP437.
This fixes UEFI display on the DM_VIDEO console, which were garbled for any non-ASCII characters, for instance for the block graphic characters used by Grub to display the menu.
Signed-off-by: Andre Przywara andre.przywara@arm.com
drivers/video/Makefile | 1 + drivers/video/utf8_cp437.c | 170 ++++++++++++++++++++++++++++++++++++++ drivers/video/vidconsole-uclass.c | 8 +- include/video_console.h | 9 ++ 4 files changed, 186 insertions(+), 2 deletions(-) create mode 100644 drivers/video/utf8_cp437.c
OMG unicode comes to U-Boot. This might be the beginning of the end.
Well, while I can understand reservations against the (complexity of) Unicode, but it isn't too bad after all: - We don't blow everything up to 16 bits, instead just use UTF-8 here, which is a quite clever way of keeping things 8 bits mostly. - There is some Unicode in U-Boot already, namely in the UEFI code. Here the UCS-2 encoding (fixed 16 bits) used in UEFI get converted into UTF-8. This works nicely these days because on serial lines there is probably an UTF-8 capable terminal emulator on the other end. - The Truetype font console already uses a Unicode-to-glyph-ID mapping. It's just not in affect because it expects an int, but we only have a signed char as in input, so effectively limit everything to 7-bit ASCII.
Can we make this a Kconfig option to avoid increasing code size? We can imply it when EFI is enabled.
Sure, sounds easy enough. Just be aware that the current situation is somewhat broken, since Truetype is somewhat ISO8859-1/ASCII, bitmap fonts use CP437, but serial terminal (emulators) use probably UTF-8 these days. So the option would be to switch between (7-bit) ASCII and Unicode? Or between "current mess" and Unicode?
Actually I found a better way to fix both bitmap and Truetype fonts in a joint effort, a so I will send a different patch later on, considering a Kconfig option.
This looks vaguely familiar. Take a look at include/cp437.h. We even have a Kconfig option for it already :).
But this is a) for converting FAT file name entries, and b) is from CP437 to Unicode, but we need it the other way round (incoming UCS-2/UTF-8 to CP437 glyphs). I briefly considered a reverse lookup scheme, but found this switch-case/array-look-up combination more elegant, given that we need to do this on every character.
Cheers, Andre.