137 lines
6.6 KiB
Text
137 lines
6.6 KiB
Text
|
17 Internationalisation
|
||
|
***********************
|
||
|
|
||
|
17.1 Charset
|
||
|
============
|
||
|
|
||
|
GRUB uses UTF-8 internally other than in rendering where some
|
||
|
GRUB-specific appropriate representation is used. All text files
|
||
|
(including config) are assumed to be encoded in UTF-8.
|
||
|
|
||
|
17.2 Filesystems
|
||
|
================
|
||
|
|
||
|
NTFS, JFS, UDF, HFS+, exFAT, long filenames in FAT, Joliet part of
|
||
|
ISO9660 are treated as UTF-16 as per specification. AFS and BFS are
|
||
|
read as UTF-8, again according to specification. BtrFS, cpio, tar,
|
||
|
squash4, minix, minix2, minix3, ROMFS, ReiserFS, XFS, ext2, ext3, ext4,
|
||
|
FAT (short names), F2FS, RockRidge part of ISO9660, nilfs2, UFS1, UFS2
|
||
|
and ZFS are assumed to be UTF-8. This might be false on systems
|
||
|
configured with legacy charset but as long as the charset used is
|
||
|
superset of ASCII you should be able to access ASCII-named files. And
|
||
|
it's recommended to configure your system to use UTF-8 to access the
|
||
|
filesystem, convmv may help with migration. ISO9660 (plain) filenames
|
||
|
are specified as being ASCII or being described with unspecified escape
|
||
|
sequences. GRUB assumes that the ISO9660 names are UTF-8 (since any
|
||
|
ASCII is valid UTF-8). There are some old CD-ROMs which use CP437 in
|
||
|
non-compliant way. You're still able to access files with names
|
||
|
containing only ASCII characters on such filesystems though. You're
|
||
|
also able to access any file if the filesystem contains valid Joliet
|
||
|
(UTF-16) or RockRidge (UTF-8). AFFS, SFS and HFS never use unicode and
|
||
|
GRUB assumes them to be in Latin1, Latin1 and MacRoman respectively.
|
||
|
GRUB handles filesystem case-insensitivity however no attempt is
|
||
|
performed at case conversion of international characters so e.g. a file
|
||
|
named lowercase greek alpha is treated as different from the one named
|
||
|
as uppercase alpha. The filesystems in questions are NTFS (except POSIX
|
||
|
namespace), HFS+ (configurable at mkfs time, default insensitive), SFS
|
||
|
(configurable at mkfs time, default insensitive), JFS (configurable at
|
||
|
mkfs time, default sensitive), HFS, AFFS, FAT, exFAT and ZFS
|
||
|
(configurable on per-subvolume basis by property "casesensitivity",
|
||
|
default sensitive). On ZFS subvolumes marked as case insensitive files
|
||
|
containing lowercase international characters are inaccessible. Also
|
||
|
like all supported filesystems except HFS+ and ZFS (configurable on
|
||
|
per-subvolume basis by property "normalization", default none) GRUB
|
||
|
makes no attempt at check of canonical equivalence so a file name
|
||
|
u-diaresis is treated as distinct from u+combining diaresis. This
|
||
|
however means that in order to access file on HFS+ its name must be
|
||
|
specified in normalisation form D. On normalized ZFS subvolumes
|
||
|
filenames out of normalisation are inaccessible.
|
||
|
|
||
|
17.3 Output terminal
|
||
|
====================
|
||
|
|
||
|
Firmware output console "console" on ARC and IEEE1275 are limited to
|
||
|
ASCII.
|
||
|
|
||
|
BIOS firmware console and VGA text are limited to ASCII and some
|
||
|
pseudographics.
|
||
|
|
||
|
None of above mentioned is appropriate for displaying international
|
||
|
and any unsupported character is replaced with question mark except
|
||
|
pseudographics which we attempt to approximate with ASCII.
|
||
|
|
||
|
EFI console on the other hand nominally supports UTF-16 but actual
|
||
|
language coverage depends on firmware and may be very limited.
|
||
|
|
||
|
The encoding used on serial can be chosen with 'terminfo' as either
|
||
|
ASCII, UTF-8 or "visual UTF-8". Last one is against the specification
|
||
|
but results in correct rendering of right-to-left on some readers which
|
||
|
don't have own bidi implementation.
|
||
|
|
||
|
On emu GRUB checks if charset is UTF-8 and uses it if so and uses
|
||
|
ASCII otherwise.
|
||
|
|
||
|
When using gfxterm or gfxmenu GRUB itself is responsible for
|
||
|
rendering the text. In this case GRUB is limited by loaded fonts. If
|
||
|
fonts contain all required characters then bidirectional text, cursive
|
||
|
variants and combining marks other than enclosing, half (e.g. left half
|
||
|
tilde or combining overline) and double ones. Ligatures aren't
|
||
|
supported though. This should cover European, Middle Eastern (if you
|
||
|
don't mind lack of lam-alif ligature in Arabic) and East Asian scripts.
|
||
|
Notable unsupported scripts are Brahmic family and derived as well as
|
||
|
Mongolian, Tifinagh, Korean Jamo (precomposed characters have no
|
||
|
problem) and tonal writing (2e5-2e9). GRUB also ignores deprecated (as
|
||
|
specified in Unicode) characters (e.g. tags). GRUB also doesn't handle
|
||
|
so called "annotation characters" If you can complete either of two
|
||
|
lists or, better, propose a patch to improve rendering, please contact
|
||
|
developer team.
|
||
|
|
||
|
17.4 Input terminal
|
||
|
===================
|
||
|
|
||
|
Firmware console on BIOS, IEEE1275 and ARC doesn't allow you to enter
|
||
|
non-ASCII characters. EFI specification allows for such but author is
|
||
|
unaware of any actual implementations. Serial input is currently
|
||
|
limited for latin1 (unlikely to change). Own keyboard implementations
|
||
|
(at_keyboard and usb_keyboard) supports any key but work on
|
||
|
one-char-per-keystroke. So no dead keys or advanced input method. Also
|
||
|
there is no keymap change hotkey. In practice it makes difficult to
|
||
|
enter any text using non-Latin alphabet. Moreover all current input
|
||
|
consumers are limited to ASCII.
|
||
|
|
||
|
17.5 Gettext
|
||
|
============
|
||
|
|
||
|
GRUB supports being translated. For this you need to have language *.mo
|
||
|
files in $prefix/locale, load gettext module and set "lang" variable.
|
||
|
|
||
|
17.6 Regexp
|
||
|
===========
|
||
|
|
||
|
Regexps work on unicode characters, however no attempt at checking
|
||
|
cannonical equivalence has been made. Moreover the classes like
|
||
|
[:alpha:] match only ASCII subset.
|
||
|
|
||
|
17.7 Other
|
||
|
==========
|
||
|
|
||
|
Currently GRUB always uses YEAR-MONTH-DAY HOUR:MINUTE:SECOND [WEEKDAY]
|
||
|
24-hour datetime format but weekdays are translated. GRUB always uses
|
||
|
the decimal number format with [0-9] as digits and . as descimal
|
||
|
separator and no group separator. IEEE1275 aliases are matched
|
||
|
case-insensitively except non-ASCII which is matched as binary. Similar
|
||
|
behaviour is for matching OSBundleRequired. Since IEEE1275 aliases and
|
||
|
OSBundleRequired don't contain any non-ASCII it should never be a
|
||
|
problem in practice. Case-sensitive identifiers are matched as raw
|
||
|
strings, no canonical equivalence check is performed. Case-insenstive
|
||
|
identifiers are matched as RAW but additionally [a-z] is equivalent to
|
||
|
[A-Z]. GRUB-defined identifiers use only ASCII and so should
|
||
|
user-defined ones. Identifiers containing non-ASCII may work but aren't
|
||
|
supported. Only the ASCII space characters (space U+0020, tab U+000b,
|
||
|
CR U+000d and LF U+000a) are recognised. Other unicode space characters
|
||
|
aren't a valid field separator. 'test' (*note test::) tests <, >, <=,
|
||
|
>=, -pgt and -plt compare the strings in the lexicographical order of
|
||
|
unicode codepoints, replicating the behaviour of test from coreutils.
|
||
|
environment variables and commands are listed in the same order.
|
||
|
|