Re: [U-Boot] [PATCH 11/15] efi_loader: capitalization table

27 Aug 2018

Alexander Graf agraf@suse.de さんは書きました:
...
On 26.08.18 21:00, Heinrich Schuchardt wrote:
...
On 08/26/2018 08:22 PM, Alexander Graf wrote:
...
On 11.08.18 17:28, Heinrich Schuchardt wrote:
...
This patch provides a define to initialize a table that maps lower to
capital letters for Unicode code point 0x0000 - 0xffff.
Signed-off-by: Heinrich Schuchardt xypron.glpk@gmx.de
MAINTAINERS              |    1 +
 include/capitalization.h | 1909 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 1910 insertions(+)
 create mode 100644 include/capitalization.h

diff --git a/MAINTAINERS b/MAINTAINERS
index a324139471..0a543309f2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -368,6 +368,7 @@ F:	doc/DocBook/efi.tmpl
 F:	doc/README.uefi
 F:	doc/README.iscsi
 F:	Documentation/efi.rst
+F:	include/capitalization.h
 F:	include/efi*
 F:	include/pe.h
 F:	include/asm-generic/pe.h
diff --git a/include/capitalization.h b/include/capitalization.h
new file mode 100644
index 0000000000..50d5108f98
--- /dev/null
+++ b/include/capitalization.h
@@ -0,0 +1,1909 @@
+/* SPDX-License-Identifier: Unicode-DFS-2016 */
+/*


Correspondence table for small and capital Unicode letters in the range of



0x0000 - 0xffff based on http://www.unicode.org/Public/UCA/11.0.0/allkeys.txt


*/


+struct capitalization_table {

u16 upper;
u16 lower;

+};



+#define UNICODE_CAPITALIZATION_TABLE { \
Ugh, that is a *lot* of data. How much does the binary size grow with
the table compiled in?
That data is also in glibc. I don’t know whether you use glibc though
...
...
...
...
Is there any slightly more sophisticated pattern in the table maybe that
we could just express as code? Would that turn out smaller maybe?
This is 3792 bytes of data. Unicode capitalization is quite random in
arranging lower and upper letters.
We could resort to zlib or gzip. But these libraries are not built by
default.
Yeah, and that only adds to more overhead.
...
Most urgently we will need the capitalization table for generating and
checking short FAT filenames, so we could create a configuration switch
that would reduce this table to codepage 437 or codepage 1250 letters
depending on the chosen native character set.
I think that's a great idea. There probably is a lot of overlap even
between the two, so maybe just make it a config option for "non-latin
upper/lower case conversion".
...
In EDK2 I only found code for codepage 1250.
Yeah, I'd be surprised if people really needed more. In fact, how about
you just default the config option to =n by default?
Alex
-- 
📧 Mike FABIAN   mike.fabian@gmx.de
睡眠不足はいい仕事の敵だ。