[PATCH v3 0/7] video: Add UTF-8 support for UEFI applications

Andre submitted 2 years ago DM_VIDEO improvements for UEFI applications using UEFI's EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL but did not follow with suggested changes. This series takes care of the UTF-8 support which required to draw symbol and box drawing characters used by UEFI applications like grub2 and sd-boot correctly.
Compared to Andre's version this version has the following changes: - use and extend existing conversion functions from lib/charset.c - convert to Unicode code points for truetype console support - conversion is conditional on CONFIG_CHARSET - use escape sequences in tests as proposed by Heinrich
Link: https://lore.kernel.org/u-boot/20220110005638.21599-1-andre.przywara@arm.com... Signed-off-by: Janne Grunau j@jannau.net --- Changes in v3: - added Reviewed-by tag - removed unnecessary u8 casts - limited utf-8 conversion buffer to 5 bytes as documented - added missing console_utf_to_cp437() documentation - adapted EFI text output self-tests according to review comments - dropped wait after EFI text output self tests - added StrToFat EFI self test to ensure Unicode code points which map to code page 437 code points 1-31 are converted to '_' - Link to v2: https://lore.kernel.org/r/20240210-vidconsole-utf8-uefi-v2-0-88c03db60de2@ja...
Changes in v2: - use "CONFIG_IS_ENABLED(CHARSET)" instead of EFI_LOADER - rewritten commit message for mapping CP437 cp 1-31 - extended utf8_to_utf32_stream() documentation as suggested by Heinrich - Link to RFC: https://lore.kernel.org/r/20240117-vidconsole-utf8-uefi-v1-0-539f7ce74fb9@ja...
--- Andre Przywara (2): efi_selftest: Add international characters test efi_selftest: Add box drawing character selftest
Janne Grunau (5): lib: charset: Fix and extend utf8_to_utf32_stream() documentation video: console: Parse UTF-8 character sequences lib/charset: Map Unicode code points to CP437 code points 1-31 efi_selftest: Add geometric shapes character selftest efi_selftest: Update StrToFat() unit test after CP473 map extension
drivers/video/console_normal.c | 6 ++- drivers/video/console_rotate.c | 16 ++++--- drivers/video/console_truetype.c | 8 ++-- drivers/video/vidconsole-uclass.c | 18 +++++--- drivers/video/vidconsole_internal.h | 19 ++++++++ include/charset.h | 16 +++++-- include/cp1250.h | 12 ++++- include/cp437.h | 12 ++++- include/video_console.h | 10 +++-- lib/charset.c | 9 ++-- lib/efi_loader/efi_unicode_collation.c | 2 +- lib/efi_selftest/efi_selftest_textoutput.c | 54 +++++++++++++++++++++++ lib/efi_selftest/efi_selftest_unicode_collation.c | 12 +++++ 13 files changed, 162 insertions(+), 32 deletions(-) --- base-commit: 866ca972d6c3cabeaf6dbac431e8e08bb30b3c8e change-id: 20240117-vidconsole-utf8-uefi-fa23b4ac65d6
Best regards,

From: Janne Grunau j@jannau.net
Suggested-by: Heinrich Schuchardt xypron.glpk@gmx.de Reviewed-by: Heinrich Schuchardt xypron.glpk@gmx.de Signed-off-by: Janne Grunau j@jannau.net --- include/charset.h | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/include/charset.h b/include/charset.h index 44034c71d3..f1050c903d 100644 --- a/include/charset.h +++ b/include/charset.h @@ -324,11 +324,21 @@ int utf_to_cp(s32 *c, const u16 *codepage); int utf8_to_cp437_stream(u8 c, char *buffer);
/** - * utf8_to_utf32_stream() - convert UTF-8 stream to UTF-32 + * utf8_to_utf32_stream() - convert UTF-8 byte stream to Unicode code points + * + * The function is called for each byte @c in a UTF-8 stream. The byte is + * appended to the temporary storage @buffer until the UTF-8 stream in + * @buffer describes a Unicode code point. + * + * When a new code point has been decoded it is returned and buffer[0] is + * set to '\0', otherwise the return value is 0. + * + * The buffer must be at least 5 characters long. Before the first function + * invocation buffer[0] must be set to '\0'." * * @c: next UTF-8 character to convert * @buffer: buffer, at least 5 characters - * Return: next codepage 437 character or 0 + * Return: Unicode code point or 0 */ int utf8_to_utf32_stream(u8 c, char *buffer);

On 3/16/24 22:50, Janne Grunau via B4 Relay wrote:
From: Janne Grunau j@jannau.net
Suggested-by: Heinrich Schuchardt xypron.glpk@gmx.de Reviewed-by: Heinrich Schuchardt xypron.glpk@gmx.de Signed-off-by: Janne Grunau j@jannau.net
This patch was already merged. No need to resend it.
Best regards
Heinrich
include/charset.h | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/include/charset.h b/include/charset.h index 44034c71d3..f1050c903d 100644 --- a/include/charset.h +++ b/include/charset.h @@ -324,11 +324,21 @@ int utf_to_cp(s32 *c, const u16 *codepage); int utf8_to_cp437_stream(u8 c, char *buffer);
/**
- utf8_to_utf32_stream() - convert UTF-8 stream to UTF-32
- utf8_to_utf32_stream() - convert UTF-8 byte stream to Unicode code points
- The function is called for each byte @c in a UTF-8 stream. The byte is
- appended to the temporary storage @buffer until the UTF-8 stream in
- @buffer describes a Unicode code point.
- When a new code point has been decoded it is returned and buffer[0] is
- set to '\0', otherwise the return value is 0.
- The buffer must be at least 5 characters long. Before the first function
- invocation buffer[0] must be set to '\0'."
- @c: next UTF-8 character to convert
- @buffer: buffer, at least 5 characters
- Return: next codepage 437 character or 0
*/ int utf8_to_utf32_stream(u8 c, char *buffer);
- Return: Unicode code point or 0

From: Janne Grunau j@jannau.net
efi_console / UEFI applications (grub2, sd-boot, ...) pass UTF-8 character sequences to vidconsole which results in wrong glyphs for code points outside of ASCII. The truetype console expects Unicode code points and bitmap font based consoles expect code page 437 code points. To support both convert UTF-8 to UTF-32 and pass Unicode code points in vidconsole_ops.putc_xy(). These can be used directly in console_truetype and after conversion to code page 437 in console_{normal,rotate}.
This fixes rendering of international, symbol and box drawing characters used by UEFI applications.
Signed-off-by: Janne Grunau j@jannau.net --- drivers/video/console_normal.c | 6 ++++-- drivers/video/console_rotate.c | 16 ++++++++++------ drivers/video/console_truetype.c | 8 ++++---- drivers/video/vidconsole-uclass.c | 18 +++++++++++++----- drivers/video/vidconsole_internal.h | 19 +++++++++++++++++++ include/video_console.h | 10 ++++++---- 6 files changed, 56 insertions(+), 21 deletions(-)
diff --git a/drivers/video/console_normal.c b/drivers/video/console_normal.c index a0231293f3..34ef5a5229 100644 --- a/drivers/video/console_normal.c +++ b/drivers/video/console_normal.c @@ -7,6 +7,7 @@ */
#include <common.h> +#include <charset.h> #include <dm.h> #include <video.h> #include <video_console.h> @@ -63,7 +64,7 @@ static int console_move_rows(struct udevice *dev, uint rowdst, return 0; }
-static int console_putc_xy(struct udevice *dev, uint x_frac, uint y, char ch) +static int console_putc_xy(struct udevice *dev, uint x_frac, uint y, int cp) { struct vidconsole_priv *vc_priv = dev_get_uclass_priv(dev); struct udevice *vid = dev->parent; @@ -73,8 +74,9 @@ static int console_putc_xy(struct udevice *dev, uint x_frac, uint y, char ch) int pbytes = VNBYTES(vid_priv->bpix); int x, linenum, ret; void *start, *line; + u8 ch = console_utf_to_cp437(cp); uchar *pfont = fontdata->video_fontdata + - (u8)ch * fontdata->char_pixel_bytes; + ch * fontdata->char_pixel_bytes;
if (x_frac + VID_TO_POS(vc_priv->x_charsize) > vc_priv->xsize_frac) return -EAGAIN; diff --git a/drivers/video/console_rotate.c b/drivers/video/console_rotate.c index 65358a1c6e..e4303dfb36 100644 --- a/drivers/video/console_rotate.c +++ b/drivers/video/console_rotate.c @@ -7,6 +7,7 @@ */
#include <common.h> +#include <charset.h> #include <dm.h> #include <video.h> #include <video_console.h> @@ -67,7 +68,7 @@ static int console_move_rows_1(struct udevice *dev, uint rowdst, uint rowsrc, return 0; }
-static int console_putc_xy_1(struct udevice *dev, uint x_frac, uint y, char ch) +static int console_putc_xy_1(struct udevice *dev, uint x_frac, uint y, int cp) { struct vidconsole_priv *vc_priv = dev_get_uclass_priv(dev); struct udevice *vid = dev->parent; @@ -77,8 +78,9 @@ static int console_putc_xy_1(struct udevice *dev, uint x_frac, uint y, char ch) int pbytes = VNBYTES(vid_priv->bpix); int x, linenum, ret; void *start, *line; + u8 ch = console_utf_to_cp437(cp); uchar *pfont = fontdata->video_fontdata + - (u8)ch * fontdata->char_pixel_bytes; + ch * fontdata->char_pixel_bytes;
if (x_frac + VID_TO_POS(vc_priv->x_charsize) > vc_priv->xsize_frac) return -EAGAIN; @@ -145,7 +147,7 @@ static int console_move_rows_2(struct udevice *dev, uint rowdst, uint rowsrc, return 0; }
-static int console_putc_xy_2(struct udevice *dev, uint x_frac, uint y, char ch) +static int console_putc_xy_2(struct udevice *dev, uint x_frac, uint y, int cp) { struct vidconsole_priv *vc_priv = dev_get_uclass_priv(dev); struct udevice *vid = dev->parent; @@ -155,8 +157,9 @@ static int console_putc_xy_2(struct udevice *dev, uint x_frac, uint y, char ch) int pbytes = VNBYTES(vid_priv->bpix); int linenum, x, ret; void *start, *line; + u8 ch = console_utf_to_cp437(cp); uchar *pfont = fontdata->video_fontdata + - (u8)ch * fontdata->char_pixel_bytes; + ch * fontdata->char_pixel_bytes;
if (x_frac + VID_TO_POS(vc_priv->x_charsize) > vc_priv->xsize_frac) return -EAGAIN; @@ -227,7 +230,7 @@ static int console_move_rows_3(struct udevice *dev, uint rowdst, uint rowsrc, return 0; }
-static int console_putc_xy_3(struct udevice *dev, uint x_frac, uint y, char ch) +static int console_putc_xy_3(struct udevice *dev, uint x_frac, uint y, int cp) { struct vidconsole_priv *vc_priv = dev_get_uclass_priv(dev); struct udevice *vid = dev->parent; @@ -237,8 +240,9 @@ static int console_putc_xy_3(struct udevice *dev, uint x_frac, uint y, char ch) int pbytes = VNBYTES(vid_priv->bpix); int linenum, x, ret; void *start, *line; + u8 ch = console_utf_to_cp437(cp); uchar *pfont = fontdata->video_fontdata + - (u8)ch * fontdata->char_pixel_bytes; + ch * fontdata->char_pixel_bytes;
if (x_frac + VID_TO_POS(vc_priv->x_charsize) > vc_priv->xsize_frac) return -EAGAIN; diff --git a/drivers/video/console_truetype.c b/drivers/video/console_truetype.c index 14fb81e956..bc3ec1c31f 100644 --- a/drivers/video/console_truetype.c +++ b/drivers/video/console_truetype.c @@ -262,7 +262,7 @@ static int console_truetype_move_rows(struct udevice *dev, uint rowdst, }
static int console_truetype_putc_xy(struct udevice *dev, uint x, uint y, - char ch) + int cp) { struct vidconsole_priv *vc_priv = dev_get_uclass_priv(dev); struct udevice *vid = dev->parent; @@ -281,7 +281,7 @@ static int console_truetype_putc_xy(struct udevice *dev, uint x, uint y, int row, ret;
/* First get some basic metrics about this character */ - stbtt_GetCodepointHMetrics(font, ch, &advance, &lsb); + stbtt_GetCodepointHMetrics(font, cp, &advance, &lsb);
/* * First out our current X position in fractional pixels. If we wrote @@ -290,7 +290,7 @@ static int console_truetype_putc_xy(struct udevice *dev, uint x, uint y, xpos = frac(VID_TO_PIXEL((double)x)); if (vc_priv->last_ch) { xpos += met->scale * stbtt_GetCodepointKernAdvance(font, - vc_priv->last_ch, ch); + vc_priv->last_ch, cp); }
/* @@ -320,7 +320,7 @@ static int console_truetype_putc_xy(struct udevice *dev, uint x, uint y, * return NULL; */ data = stbtt_GetCodepointBitmapSubpixel(font, met->scale, met->scale, - x_shift, 0, ch, &width, &height, + x_shift, 0, cp, &width, &height, &xoff, &yoff); if (!data) return width_frac; diff --git a/drivers/video/vidconsole-uclass.c b/drivers/video/vidconsole-uclass.c index 22d55df71f..5f89f6a521 100644 --- a/drivers/video/vidconsole-uclass.c +++ b/drivers/video/vidconsole-uclass.c @@ -11,6 +11,7 @@
#include <common.h> #include <abuf.h> +#include <charset.h> #include <command.h> #include <console.h> #include <log.h> @@ -20,7 +21,7 @@ #include <video_font.h> /* Bitmap font for code page 437 */ #include <linux/ctype.h>
-int vidconsole_putc_xy(struct udevice *dev, uint x, uint y, char ch) +int vidconsole_putc_xy(struct udevice *dev, uint x, uint y, int ch) { struct vidconsole_ops *ops = vidconsole_get_ops(dev);
@@ -426,8 +427,8 @@ error: priv->escape = 0; }
-/* Put that actual character on the screen (using the CP437 code page). */ -static int vidconsole_output_glyph(struct udevice *dev, char ch) +/* Put that actual character on the screen (using the UTF-32 code points). */ +static int vidconsole_output_glyph(struct udevice *dev, int ch) { struct vidconsole_priv *priv = dev_get_uclass_priv(dev); int ret; @@ -455,7 +456,7 @@ static int vidconsole_output_glyph(struct udevice *dev, char ch) int vidconsole_put_char(struct udevice *dev, char ch) { struct vidconsole_priv *priv = dev_get_uclass_priv(dev); - int ret; + int cp, ret;
if (priv->escape) { vidconsole_escape_char(dev, ch); @@ -489,7 +490,14 @@ int vidconsole_put_char(struct udevice *dev, char ch) priv->last_ch = 0; break; default: - ret = vidconsole_output_glyph(dev, ch); + if (CONFIG_IS_ENABLED(CHARSET)) { + cp = utf8_to_utf32_stream(ch, priv->utf8_buf); + if (cp == 0) + return 0; + } else { + cp = ch; + } + ret = vidconsole_output_glyph(dev, cp); if (ret < 0) return ret; break; diff --git a/drivers/video/vidconsole_internal.h b/drivers/video/vidconsole_internal.h index 0ec581b266..bb0277ee45 100644 --- a/drivers/video/vidconsole_internal.h +++ b/drivers/video/vidconsole_internal.h @@ -6,6 +6,9 @@ * (C) Copyright 2023 Dzmitry Sankouski dsankouski@gmail.com */
+#include <charset.h> +#include <config.h> + #define FLIPPED_DIRECTION 1 #define NORMAL_DIRECTION 0
@@ -142,3 +145,19 @@ int console_simple_get_font(struct udevice *dev, int seq, struct vidfont_info *i * See details in video_console.h select_font function **/ int console_simple_select_font(struct udevice *dev, const char *name, uint size); + +/** + * Internal function to convert Unicode code points to code page 437. + * Used by video consoles using bitmap fonts. + * + * @param codepoint Unicode code point + * @returns code page 437 character. + */ +static inline u8 console_utf_to_cp437(int codepoint) +{ + if (CONFIG_IS_ENABLED(CHARSET)) { + utf_to_cp(&codepoint, codepage_437); + return codepoint; + } + return codepoint; +} diff --git a/include/video_console.h b/include/video_console.h index bde67fa9a5..8b5928dc5e 100644 --- a/include/video_console.h +++ b/include/video_console.h @@ -43,6 +43,7 @@ enum { * @col_saved: Saved X position, in fractional units (VID_TO_POS(x)) * @row_saved: Saved Y position in pixels (0=top) * @escape_buf: Buffer to accumulate escape sequence + * @utf8_buf: Buffer to accumulate UTF-8 byte sequence */ struct vidconsole_priv { struct stdio_dev sdev; @@ -66,6 +67,7 @@ struct vidconsole_priv { int row_saved; int col_saved; char escape_buf[32]; + char utf8_buf[5]; };
/** @@ -124,12 +126,12 @@ struct vidconsole_ops { * @x_frac: Fractional pixel X position (0=left-most pixel) which * is the X position multipled by VID_FRAC_DIV. * @y: Pixel Y position (0=top-most pixel) - * @ch: Character to write + * @cp: UTF-32 code point to write * @return number of fractional pixels that the cursor should move, * if all is OK, -EAGAIN if we ran out of space on this line, other -ve * on error */ - int (*putc_xy)(struct udevice *dev, uint x_frac, uint y, char ch); + int (*putc_xy)(struct udevice *dev, uint x_frac, uint y, int cp);
/** * move_rows() - Move text rows from one place to another @@ -403,12 +405,12 @@ void vidconsole_pop_colour(struct udevice *dev, struct vidconsole_colour *old); * @x_frac: Fractional pixel X position (0=left-most pixel) which * is the X position multipled by VID_FRAC_DIV. * @y: Pixel Y position (0=top-most pixel) - * @ch: Character to write + * @cp: UTF-32 code point to write * Return: number of fractional pixels that the cursor should move, * if all is OK, -EAGAIN if we ran out of space on this line, other -ve * on error */ -int vidconsole_putc_xy(struct udevice *dev, uint x, uint y, char ch); +int vidconsole_putc_xy(struct udevice *dev, uint x, uint y, int cp);
/** * vidconsole_move_rows() - Move text rows from one place to another

From: Janne Grunau j@jannau.net
Code page 437 uses code points 1-31 for glyphs instead of control characters. Map the appropriate Unicode code points to this code points. Fixes rendering of grub2's menu as EFI application using the EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL on a console with bitmap fonts.
Signed-off-by: Janne Grunau j@jannau.net --- include/charset.h | 2 +- include/cp1250.h | 12 ++++++++++-- include/cp437.h | 12 ++++++++++-- lib/charset.c | 9 ++++++--- lib/efi_loader/efi_unicode_collation.c | 2 +- 5 files changed, 28 insertions(+), 9 deletions(-)
diff --git a/include/charset.h b/include/charset.h index f1050c903d..348bad5883 100644 --- a/include/charset.h +++ b/include/charset.h @@ -16,7 +16,7 @@ /* * codepage_437 - Unicode to codepage 437 translation table */ -extern const u16 codepage_437[128]; +extern const u16 codepage_437[160];
/** * console_read_unicode() - read Unicode code point from console diff --git a/include/cp1250.h b/include/cp1250.h index adacf8a958..b762c78d9f 100644 --- a/include/cp1250.h +++ b/include/cp1250.h @@ -1,10 +1,18 @@ /* SPDX-License-Identifier: GPL-2.0+ */
/* - * Constant CP1250 contains the Unicode code points for characters 0x80 - 0xff - * of the code page 1250. + * Constant CP1250 contains the Unicode code points for characters 0x00 - 0x1f + * and 0x80 - 0xff of the code page 1250. */ #define CP1250 { \ + 0x0000, 0x0000, 0x0000, 0x0000, \ + 0x0000, 0x0000, 0x0000, 0x0000, \ + 0x0000, 0x0000, 0x0000, 0x0000, \ + 0x0000, 0x0000, 0x0000, 0x0000, \ + 0x0000, 0x0000, 0x0000, 0x0000, \ + 0x0000, 0x0000, 0x0000, 0x0000, \ + 0x0000, 0x0000, 0x0000, 0x0000, \ + 0x0000, 0x0000, 0x0000, 0x0000, \ 0x20ac, 0x0000, 0x201a, 0x0000, \ 0x201e, 0x2026, 0x2020, 0x2021, \ 0x0000, 0x2030, 0x0160, 0x2039, \ diff --git a/include/cp437.h b/include/cp437.h index 0b2b97132e..5093130f5e 100644 --- a/include/cp437.h +++ b/include/cp437.h @@ -1,10 +1,18 @@ /* SPDX-License-Identifier: GPL-2.0+ */
/* - * Constant CP437 contains the Unicode code points for characters 0x80 - 0xff - * of the code page 437. + * Constant CP437 contains the Unicode code points for characters 0x00 - 0x1f + * and 0x80 - 0xff of the code page 437. */ #define CP437 { \ + 0x0000, 0x263a, 0x263b, 0x2665, \ + 0x2666, 0x2663, 0x2660, 0x2022, \ + 0x25d8, 0x25cb, 0x25d9, 0x2642, \ + 0x2640, 0x266a, 0x266b, 0x263c, \ + 0x25ba, 0x25c4, 0x2195, 0x203c, \ + 0x00b6, 0x00a7, 0x25ac, 0x21a8, \ + 0x2191, 0x2193, 0x2192, 0x2190, \ + 0x221f, 0x2194, 0x25b2, 0x25bc, \ 0x00c7, 0x00fc, 0x00e9, 0x00e2, \ 0x00e4, 0x00e0, 0x00e5, 0x00e7, \ 0x00ea, 0x00eb, 0x00e8, 0x00ef, \ diff --git a/lib/charset.c b/lib/charset.c index 5e4c4f948a..1f8480150a 100644 --- a/lib/charset.c +++ b/lib/charset.c @@ -16,7 +16,7 @@ /** * codepage_437 - Unicode to codepage 437 translation table */ -const u16 codepage_437[128] = CP437; +const u16 codepage_437[160] = CP437;
static struct capitalization_table capitalization_table[] = #ifdef CONFIG_EFI_UNICODE_CAPITALIZATION @@ -517,9 +517,12 @@ int utf_to_cp(s32 *c, const u16 *codepage) int j;
/* Look up codepage translation */ - for (j = 0; j < 0x80; ++j) { + for (j = 0; j < 0xA0; ++j) { if (*c == codepage[j]) { - *c = j + 0x80; + if (j < 0x20) + *c = j; + else + *c = j + 0x60; return 0; } } diff --git a/lib/efi_loader/efi_unicode_collation.c b/lib/efi_loader/efi_unicode_collation.c index c4c7572063..4b2c52918a 100644 --- a/lib/efi_loader/efi_unicode_collation.c +++ b/lib/efi_loader/efi_unicode_collation.c @@ -257,7 +257,7 @@ static void EFIAPI efi_fat_to_str(struct efi_unicode_collation_protocol *this, for (i = 0; i < fat_size; ++i) { c = (unsigned char)fat[i]; if (c > 0x80) - c = codepage[c - 0x80]; + c = codepage[c - 0x60]; string[i] = c; if (!c) break;

From: Andre Przywara andre.przywara@arm.com
UEFI relies entirely on unicode output, which actual fonts displayed on the screen might not be ready for.
Add a test displaying some international characters, to reveal missing glyphs, especially in our builtin fonts. This would be needed to be manually checked on the screen for correctness.
Signed-off-by: Andre Przywara andre.przywara@arm.com Suggested-by: Heinrich Schuchardt xypron.glpk@gmx.de Signed-off-by: Janne Grunau j@jannau.net --- lib/efi_selftest/efi_selftest_textoutput.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)
diff --git a/lib/efi_selftest/efi_selftest_textoutput.c b/lib/efi_selftest/efi_selftest_textoutput.c index cc44b38bc2..917903473d 100644 --- a/lib/efi_selftest/efi_selftest_textoutput.c +++ b/lib/efi_selftest/efi_selftest_textoutput.c @@ -31,6 +31,21 @@ static int execute(void) 0xD804, 0xDC22, 0};
+ const u16 text[] = +u"This should render international characters as described\n" +u"U+00D6 \u00D6 - Latin capital letter O with diaresis\n" +u"U+00DF \u00DF - Latin small letter sharp s\n" +u"U+00E5 \u00E5 - Latin small letter a with ring above\n" +u"U+00E9 \u00E9 - Latin small letter e with acute\n" +u"U+00F1 \u00F1 - Latin small letter n with tilde\n" +u"U+00F6 \u00F6 - Latin small letter o with diaresis\n" +u"The following characters will render as '?' with bitmap fonts\n" +u"U+00F8 \u00F8 - Latin small letter o with stroke\n" +u"U+03AC \u03AC - Greek small letter alpha with tonus\n" +u"U+03BB \u03BB - Greek small letter lambda\n" +u"U+03C2 \u03C2 - Greek small letter final sigma\n" +u"U+1F19 \u1F19 - Greek capital letter epsilon with dasia\n"; + /* SetAttribute */ efi_st_printf("\nColor palette\n"); for (foreground = 0; foreground < 0x10; ++foreground) { @@ -119,6 +134,12 @@ static int execute(void) return EFI_ST_FAILURE; } efi_st_printf("\n"); + ret = con_out->output_string(con_out, text); + if (ret != EFI_ST_SUCCESS) { + efi_st_error("OutputString failed for international chars\n"); + return EFI_ST_FAILURE; + } + efi_st_printf("\n");
return EFI_ST_SUCCESS; }

On 3/16/24 22:50, Janne Grunau via B4 Relay wrote:
From: Andre Przywara andre.przywara@arm.com
UEFI relies entirely on unicode output, which actual fonts displayed on the screen might not be ready for.
Add a test displaying some international characters, to reveal missing glyphs, especially in our builtin fonts. This would be needed to be manually checked on the screen for correctness.
Signed-off-by: Andre Przywara andre.przywara@arm.com Suggested-by: Heinrich Schuchardt xypron.glpk@gmx.de Signed-off-by: Janne Grunau j@jannau.net
lib/efi_selftest/efi_selftest_textoutput.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)
diff --git a/lib/efi_selftest/efi_selftest_textoutput.c b/lib/efi_selftest/efi_selftest_textoutput.c index cc44b38bc2..917903473d 100644 --- a/lib/efi_selftest/efi_selftest_textoutput.c +++ b/lib/efi_selftest/efi_selftest_textoutput.c @@ -31,6 +31,21 @@ static int execute(void) 0xD804, 0xDC22, 0};
- const u16 text[] =
+u"This should render international characters as described\n" +u"U+00D6 \u00D6 - Latin capital letter O with diaresis\n" +u"U+00DF \u00DF - Latin small letter sharp s\n" +u"U+00E5 \u00E5 - Latin small letter a with ring above\n" +u"U+00E9 \u00E9 - Latin small letter e with acute\n" +u"U+00F1 \u00F1 - Latin small letter n with tilde\n" +u"U+00F6 \u00F6 - Latin small letter o with diaresis\n" +u"The following characters will render as '?' with bitmap fonts\n" +u"U+00F8 \u00F8 - Latin small letter o with stroke\n" +u"U+03AC \u03AC - Greek small letter alpha with tonus\n" +u"U+03BB \u03BB - Greek small letter lambda\n" +u"U+03C2 \u03C2 - Greek small letter final sigma\n" +u"U+1F19 \u1F19 - Greek capital letter epsilon with dasia\n";
The strings should be indented by two tabs.
Otherwise:
Reviewed-by: Heinrich Schuchardt xypron.glpk@gmx.de
- /* SetAttribute */ efi_st_printf("\nColor palette\n"); for (foreground = 0; foreground < 0x10; ++foreground) {
@@ -119,6 +134,12 @@ static int execute(void) return EFI_ST_FAILURE; } efi_st_printf("\n");
ret = con_out->output_string(con_out, text);
if (ret != EFI_ST_SUCCESS) {
efi_st_error("OutputString failed for international chars\n");
return EFI_ST_FAILURE;
}
efi_st_printf("\n");
return EFI_ST_SUCCESS; }

On Sun, Mar 17, 2024 at 10:24:13AM +0100, Heinrich Schuchardt wrote:
On 3/16/24 22:50, Janne Grunau via B4 Relay wrote:
From: Andre Przywara andre.przywara@arm.com
UEFI relies entirely on unicode output, which actual fonts displayed on the screen might not be ready for.
Add a test displaying some international characters, to reveal missing glyphs, especially in our builtin fonts. This would be needed to be manually checked on the screen for correctness.
Signed-off-by: Andre Przywara andre.przywara@arm.com Suggested-by: Heinrich Schuchardt xypron.glpk@gmx.de Signed-off-by: Janne Grunau j@jannau.net
lib/efi_selftest/efi_selftest_textoutput.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)
diff --git a/lib/efi_selftest/efi_selftest_textoutput.c b/lib/efi_selftest/efi_selftest_textoutput.c index cc44b38bc2..917903473d 100644 --- a/lib/efi_selftest/efi_selftest_textoutput.c +++ b/lib/efi_selftest/efi_selftest_textoutput.c @@ -31,6 +31,21 @@ static int execute(void) 0xD804, 0xDC22, 0};
- const u16 text[] =
+u"This should render international characters as described\n" +u"U+00D6 \u00D6 - Latin capital letter O with diaresis\n" +u"U+00DF \u00DF - Latin small letter sharp s\n" +u"U+00E5 \u00E5 - Latin small letter a with ring above\n" +u"U+00E9 \u00E9 - Latin small letter e with acute\n" +u"U+00F1 \u00F1 - Latin small letter n with tilde\n" +u"U+00F6 \u00F6 - Latin small letter o with diaresis\n" +u"The following characters will render as '?' with bitmap fonts\n" +u"U+00F8 \u00F8 - Latin small letter o with stroke\n" +u"U+03AC \u03AC - Greek small letter alpha with tonus\n" +u"U+03BB \u03BB - Greek small letter lambda\n" +u"U+03C2 \u03C2 - Greek small letter final sigma\n" +u"U+1F19 \u1F19 - Greek capital letter epsilon with dasia\n";
The strings should be indented by two tabs.
done locally for all selftests, I'm waiting on fedback for Patch 2 and 3 before resending.
thanks
Janne

From: Andre Przywara andre.przywara@arm.com
UEFI applications rely on Unicode output capability, and might use that for drawing pseudo-graphical interfaces using Unicode defined box drawing characters.
Add a simple test to display the most basic box characters, which would need to be checked manually on the screen for correctness.
Signed-off-by: Andre Przywara andre.przywara@arm.com Suggested-by: Heinrich Schuchardt xypron.glpk@gmx.de Signed-off-by: Janne Grunau j@jannau.net --- lib/efi_selftest/efi_selftest_textoutput.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/lib/efi_selftest/efi_selftest_textoutput.c b/lib/efi_selftest/efi_selftest_textoutput.c index 917903473d..b56fd2ab76 100644 --- a/lib/efi_selftest/efi_selftest_textoutput.c +++ b/lib/efi_selftest/efi_selftest_textoutput.c @@ -46,6 +46,20 @@ u"U+03BB \u03BB - Greek small letter lambda\n" u"U+03C2 \u03C2 - Greek small letter final sigma\n" u"U+1F19 \u1F19 - Greek capital letter epsilon with dasia\n";
+ const u16 boxes[] = +u"This should render as four boxes with text\n" +u"\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500" +u"\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500" +u"\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502" +u" left top \u2502 right top \u2502\n\u251c\u2500" +u"\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500" +u"\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500" +u"\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n\u2502 " +u"left bottom \u2502 right bottom \u2502\n\u2514\u2500\u2500\u2500" +u"\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534" +u"\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500" +u"\u2500\u2500\u2500\u2500\u2518\n"; + /* SetAttribute */ efi_st_printf("\nColor palette\n"); for (foreground = 0; foreground < 0x10; ++foreground) { @@ -140,6 +154,12 @@ u"U+1F19 \u1F19 - Greek capital letter epsilon with dasia\n"; return EFI_ST_FAILURE; } efi_st_printf("\n"); + ret = con_out->output_string(con_out, boxes); + if (ret != EFI_ST_SUCCESS) { + efi_st_error("OutputString failed for box drawing chars\n"); + return EFI_ST_FAILURE; + } + efi_st_printf("\n");
return EFI_ST_SUCCESS; }

On 3/16/24 22:50, Janne Grunau via B4 Relay wrote:
From: Andre Przywara andre.przywara@arm.com
UEFI applications rely on Unicode output capability, and might use that for drawing pseudo-graphical interfaces using Unicode defined box drawing characters.
Add a simple test to display the most basic box characters, which would need to be checked manually on the screen for correctness.
Signed-off-by: Andre Przywara andre.przywara@arm.com Suggested-by: Heinrich Schuchardt xypron.glpk@gmx.de Signed-off-by: Janne Grunau j@jannau.net
lib/efi_selftest/efi_selftest_textoutput.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/lib/efi_selftest/efi_selftest_textoutput.c b/lib/efi_selftest/efi_selftest_textoutput.c index 917903473d..b56fd2ab76 100644 --- a/lib/efi_selftest/efi_selftest_textoutput.c +++ b/lib/efi_selftest/efi_selftest_textoutput.c @@ -46,6 +46,20 @@ u"U+03BB \u03BB - Greek small letter lambda\n" u"U+03C2 \u03C2 - Greek small letter final sigma\n" u"U+1F19 \u1F19 - Greek capital letter epsilon with dasia\n";
- const u16 boxes[] =
+u"This should render as four boxes with text\n" +u"\u250c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500" +u"\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500" +u"\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2510\n\u2502" +u" left top \u2502 right top \u2502\n\u251c\u2500" +u"\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500" +u"\u2500\u253c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500" +u"\u2500\u2500\u2500\u2500\u2500\u2500\u2524\n\u2502 " +u"left bottom \u2502 right bottom \u2502\n\u2514\u2500\u2500\u2500" +u"\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534" +u"\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500" +u"\u2500\u2500\u2500\u2500\u2518\n";
Please, indent the strings by two tabs.
Otherwise:
Reviewed-by: Heinrich Schuchardt xypron.glpk@gmx.de
- /* SetAttribute */ efi_st_printf("\nColor palette\n"); for (foreground = 0; foreground < 0x10; ++foreground) {
@@ -140,6 +154,12 @@ u"U+1F19 \u1F19 - Greek capital letter epsilon with dasia\n"; return EFI_ST_FAILURE; } efi_st_printf("\n");
ret = con_out->output_string(con_out, boxes);
if (ret != EFI_ST_SUCCESS) {
efi_st_error("OutputString failed for box drawing chars\n");
return EFI_ST_FAILURE;
}
efi_st_printf("\n");
return EFI_ST_SUCCESS; }

From: Janne Grunau j@jannau.net
Draw symbols from Unicode's "Geometric shapes" page which translate to code page 437 code points 1-31. These are used by UEFI applications to draw user interfaces using EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL. The output has to be checked manually on the screen for correctness.
Signed-off-by: Janne Grunau j@jannau.net --- lib/efi_selftest/efi_selftest_textoutput.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/lib/efi_selftest/efi_selftest_textoutput.c b/lib/efi_selftest/efi_selftest_textoutput.c index b56fd2ab76..2208a9877a 100644 --- a/lib/efi_selftest/efi_selftest_textoutput.c +++ b/lib/efi_selftest/efi_selftest_textoutput.c @@ -60,6 +60,13 @@ u"\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534" u"\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500" u"\u2500\u2500\u2500\u2500\u2518\n";
+ const u16 shapes[] = +u"Geometric shapes as described\n" +u"U+25B2 \u25B2 - Black up-pointing triangle\n" +u"U+25BA \u25BA - Black right-pointing pointer\n" +u"U+25BC \u25BC - Black down-pointing triangle\n" +u"U+25C4 \u25C4 - Black left-pointing pointer\n"; + /* SetAttribute */ efi_st_printf("\nColor palette\n"); for (foreground = 0; foreground < 0x10; ++foreground) { @@ -160,6 +167,12 @@ u"\u2500\u2500\u2500\u2500\u2518\n"; return EFI_ST_FAILURE; } efi_st_printf("\n"); + ret = con_out->output_string(con_out, shapes); + if (ret != EFI_ST_SUCCESS) { + efi_st_error("OutputString failed for geometric shapes\n"); + return EFI_ST_FAILURE; + } + efi_st_printf("\n");
return EFI_ST_SUCCESS; }

On 3/16/24 22:50, Janne Grunau via B4 Relay wrote:
From: Janne Grunau j@jannau.net
Draw symbols from Unicode's "Geometric shapes" page which translate to code page 437 code points 1-31. These are used by UEFI applications to draw user interfaces using EFI_SIMPLE_TEXT_OUTPUT_PROTOCOL. The output has to be checked manually on the screen for correctness.
Signed-off-by: Janne Grunau j@jannau.net
lib/efi_selftest/efi_selftest_textoutput.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/lib/efi_selftest/efi_selftest_textoutput.c b/lib/efi_selftest/efi_selftest_textoutput.c index b56fd2ab76..2208a9877a 100644 --- a/lib/efi_selftest/efi_selftest_textoutput.c +++ b/lib/efi_selftest/efi_selftest_textoutput.c @@ -60,6 +60,13 @@ u"\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2534" u"\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500" u"\u2500\u2500\u2500\u2500\u2518\n";
- const u16 shapes[] =
+u"Geometric shapes as described\n" +u"U+25B2 \u25B2 - Black up-pointing triangle\n" +u"U+25BA \u25BA - Black right-pointing pointer\n" +u"U+25BC \u25BC - Black down-pointing triangle\n" +u"U+25C4 \u25C4 - Black left-pointing pointer\n";
Please, indent the strings by two tabs.
Otherwise
Reviewed-by: Heinrich Schuchardt xypron.glpk@gmx.de
- /* SetAttribute */ efi_st_printf("\nColor palette\n"); for (foreground = 0; foreground < 0x10; ++foreground) {
@@ -160,6 +167,12 @@ u"\u2500\u2500\u2500\u2500\u2518\n"; return EFI_ST_FAILURE; } efi_st_printf("\n");
ret = con_out->output_string(con_out, shapes);
if (ret != EFI_ST_SUCCESS) {
efi_st_error("OutputString failed for geometric shapes\n");
return EFI_ST_FAILURE;
}
efi_st_printf("\n");
return EFI_ST_SUCCESS; }

From: Janne Grunau j@jannau.net
Test that Unicode code points which map to CP437 code points 1-31 are converted to '_'. This ensures no FAT file names do not contain chars which are control characters in other code pages (CP 1250 for example).
Signed-off-by: Janne Grunau j@jannau.net --- lib/efi_selftest/efi_selftest_unicode_collation.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/lib/efi_selftest/efi_selftest_unicode_collation.c b/lib/efi_selftest/efi_selftest_unicode_collation.c index 32c99caf35..ad7dfa9fb9 100644 --- a/lib/efi_selftest/efi_selftest_unicode_collation.c +++ b/lib/efi_selftest/efi_selftest_unicode_collation.c @@ -220,6 +220,18 @@ static int test_str_to_fat(void) return EFI_ST_FAILURE; }
+ /* + * Test unicode code points which map to CP 437 0x01 - 0x1f are + * converted to '_'. + */ + boottime->set_mem(fat, 16, 0); + ret = unicode_collation_protocol->str_to_fat(unicode_collation_protocol, + u"\u263a\u2666\u2022\u25d8\u2642\u2194\u00b6\u203c", 8, fat); + if (!ret || efi_st_strcmp_16_8(u"________", fat)) { + efi_st_error("str_to_fat returned %u, "%s"\n", ret, fat); + return EFI_ST_FAILURE; + } + return EFI_ST_SUCCESS; }

On 3/16/24 22:50, Janne Grunau via B4 Relay wrote:
From: Janne Grunau j@jannau.net
Test that Unicode code points which map to CP437 code points 1-31 are converted to '_'. This ensures no FAT file names do not contain chars which are control characters in other code pages (CP 1250 for example).
Signed-off-by: Janne Grunau j@jannau.net
lib/efi_selftest/efi_selftest_unicode_collation.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/lib/efi_selftest/efi_selftest_unicode_collation.c b/lib/efi_selftest/efi_selftest_unicode_collation.c index 32c99caf35..ad7dfa9fb9 100644 --- a/lib/efi_selftest/efi_selftest_unicode_collation.c +++ b/lib/efi_selftest/efi_selftest_unicode_collation.c @@ -220,6 +220,18 @@ static int test_str_to_fat(void) return EFI_ST_FAILURE; }
- /*
* Test unicode code points which map to CP 437 0x01 - 0x1f are
* converted to '_'.
*/
- boottime->set_mem(fat, 16, 0);
- ret = unicode_collation_protocol->str_to_fat(unicode_collation_protocol,
u"\u263a\u2666\u2022\u25d8\u2642\u2194\u00b6\u203c", 8, fat);
- if (!ret || efi_st_strcmp_16_8(u"________", fat)) {
efi_st_error("str_to_fat returned %u, \"%s\"\n", ret, fat);
return EFI_ST_FAILURE;
- }
Reviewed-by: Heinrich Schuchardt xypron.glpk@gmx.de
return EFI_ST_SUCCESS; }
participants (3)
-
Heinrich Schuchardt
-
Janne Grunau
-
Janne Grunau via B4 Relay