136 lines
6.6 KiB
Text
136 lines
6.6 KiB
Text
17 Internationalisation
|
|
***********************
|
|
|
|
17.1 Charset
|
|
============
|
|
|
|
GRUB uses UTF-8 internally other than in rendering where some
|
|
GRUB-specific appropriate representation is used. All text files
|
|
(including config) are assumed to be encoded in UTF-8.
|
|
|
|
17.2 Filesystems
|
|
================
|
|
|
|
NTFS, JFS, UDF, HFS+, exFAT, long filenames in FAT, Joliet part of
|
|
ISO9660 are treated as UTF-16 as per specification. AFS and BFS are
|
|
read as UTF-8, again according to specification. BtrFS, cpio, tar,
|
|
squash4, minix, minix2, minix3, ROMFS, ReiserFS, XFS, ext2, ext3, ext4,
|
|
FAT (short names), F2FS, RockRidge part of ISO9660, nilfs2, UFS1, UFS2
|
|
and ZFS are assumed to be UTF-8. This might be false on systems
|
|
configured with legacy charset but as long as the charset used is
|
|
superset of ASCII you should be able to access ASCII-named files. And
|
|
it's recommended to configure your system to use UTF-8 to access the
|
|
filesystem, convmv may help with migration. ISO9660 (plain) filenames
|
|
are specified as being ASCII or being described with unspecified escape
|
|
sequences. GRUB assumes that the ISO9660 names are UTF-8 (since any
|
|
ASCII is valid UTF-8). There are some old CD-ROMs which use CP437 in
|
|
non-compliant way. You're still able to access files with names
|
|
containing only ASCII characters on such filesystems though. You're
|
|
also able to access any file if the filesystem contains valid Joliet
|
|
(UTF-16) or RockRidge (UTF-8). AFFS, SFS and HFS never use unicode and
|
|
GRUB assumes them to be in Latin1, Latin1 and MacRoman respectively.
|
|
GRUB handles filesystem case-insensitivity however no attempt is
|
|
performed at case conversion of international characters so e.g. a file
|
|
named lowercase greek alpha is treated as different from the one named
|
|
as uppercase alpha. The filesystems in questions are NTFS (except POSIX
|
|
namespace), HFS+ (configurable at mkfs time, default insensitive), SFS
|
|
(configurable at mkfs time, default insensitive), JFS (configurable at
|
|
mkfs time, default sensitive), HFS, AFFS, FAT, exFAT and ZFS
|
|
(configurable on per-subvolume basis by property "casesensitivity",
|
|
default sensitive). On ZFS subvolumes marked as case insensitive files
|
|
containing lowercase international characters are inaccessible. Also
|
|
like all supported filesystems except HFS+ and ZFS (configurable on
|
|
per-subvolume basis by property "normalization", default none) GRUB
|
|
makes no attempt at check of canonical equivalence so a file name
|
|
u-diaresis is treated as distinct from u+combining diaresis. This
|
|
however means that in order to access file on HFS+ its name must be
|
|
specified in normalisation form D. On normalized ZFS subvolumes
|
|
filenames out of normalisation are inaccessible.
|
|
|
|
17.3 Output terminal
|
|
====================
|
|
|
|
Firmware output console "console" on ARC and IEEE1275 are limited to
|
|
ASCII.
|
|
|
|
BIOS firmware console and VGA text are limited to ASCII and some
|
|
pseudographics.
|
|
|
|
None of above mentioned is appropriate for displaying international
|
|
and any unsupported character is replaced with question mark except
|
|
pseudographics which we attempt to approximate with ASCII.
|
|
|
|
EFI console on the other hand nominally supports UTF-16 but actual
|
|
language coverage depends on firmware and may be very limited.
|
|
|
|
The encoding used on serial can be chosen with 'terminfo' as either
|
|
ASCII, UTF-8 or "visual UTF-8". Last one is against the specification
|
|
but results in correct rendering of right-to-left on some readers which
|
|
don't have own bidi implementation.
|
|
|
|
On emu GRUB checks if charset is UTF-8 and uses it if so and uses
|
|
ASCII otherwise.
|
|
|
|
When using gfxterm or gfxmenu GRUB itself is responsible for
|
|
rendering the text. In this case GRUB is limited by loaded fonts. If
|
|
fonts contain all required characters then bidirectional text, cursive
|
|
variants and combining marks other than enclosing, half (e.g. left half
|
|
tilde or combining overline) and double ones. Ligatures aren't
|
|
supported though. This should cover European, Middle Eastern (if you
|
|
don't mind lack of lam-alif ligature in Arabic) and East Asian scripts.
|
|
Notable unsupported scripts are Brahmic family and derived as well as
|
|
Mongolian, Tifinagh, Korean Jamo (precomposed characters have no
|
|
problem) and tonal writing (2e5-2e9). GRUB also ignores deprecated (as
|
|
specified in Unicode) characters (e.g. tags). GRUB also doesn't handle
|
|
so called "annotation characters" If you can complete either of two
|
|
lists or, better, propose a patch to improve rendering, please contact
|
|
developer team.
|
|
|
|
17.4 Input terminal
|
|
===================
|
|
|
|
Firmware console on BIOS, IEEE1275 and ARC doesn't allow you to enter
|
|
non-ASCII characters. EFI specification allows for such but author is
|
|
unaware of any actual implementations. Serial input is currently
|
|
limited for latin1 (unlikely to change). Own keyboard implementations
|
|
(at_keyboard and usb_keyboard) supports any key but work on
|
|
one-char-per-keystroke. So no dead keys or advanced input method. Also
|
|
there is no keymap change hotkey. In practice it makes difficult to
|
|
enter any text using non-Latin alphabet. Moreover all current input
|
|
consumers are limited to ASCII.
|
|
|
|
17.5 Gettext
|
|
============
|
|
|
|
GRUB supports being translated. For this you need to have language *.mo
|
|
files in $prefix/locale, load gettext module and set "lang" variable.
|
|
|
|
17.6 Regexp
|
|
===========
|
|
|
|
Regexps work on unicode characters, however no attempt at checking
|
|
cannonical equivalence has been made. Moreover the classes like
|
|
[:alpha:] match only ASCII subset.
|
|
|
|
17.7 Other
|
|
==========
|
|
|
|
Currently GRUB always uses YEAR-MONTH-DAY HOUR:MINUTE:SECOND [WEEKDAY]
|
|
24-hour datetime format but weekdays are translated. GRUB always uses
|
|
the decimal number format with [0-9] as digits and . as descimal
|
|
separator and no group separator. IEEE1275 aliases are matched
|
|
case-insensitively except non-ASCII which is matched as binary. Similar
|
|
behaviour is for matching OSBundleRequired. Since IEEE1275 aliases and
|
|
OSBundleRequired don't contain any non-ASCII it should never be a
|
|
problem in practice. Case-sensitive identifiers are matched as raw
|
|
strings, no canonical equivalence check is performed. Case-insenstive
|
|
identifiers are matched as RAW but additionally [a-z] is equivalent to
|
|
[A-Z]. GRUB-defined identifiers use only ASCII and so should
|
|
user-defined ones. Identifiers containing non-ASCII may work but aren't
|
|
supported. Only the ASCII space characters (space U+0020, tab U+000b,
|
|
CR U+000d and LF U+000a) are recognised. Other unicode space characters
|
|
aren't a valid field separator. 'test' (*note test::) tests <, >, <=,
|
|
>=, -pgt and -plt compare the strings in the lexicographical order of
|
|
unicode codepoints, replicating the behaviour of test from coreutils.
|
|
environment variables and commands are listed in the same order.
|
|
|