grubby/boot/grub/persistent/docs/17_internationalisation

17 Internationalisation
***********************

17.1 Charset
============

GRUB uses UTF-8 internally other than in rendering where some
GRUB-specific appropriate representation is used.  All text files
(including config) are assumed to be encoded in UTF-8.

17.2 Filesystems
================

NTFS, JFS, UDF, HFS+, exFAT, long filenames in FAT, Joliet part of
ISO9660 are treated as UTF-16 as per specification.  AFS and BFS are
read as UTF-8, again according to specification.  BtrFS, cpio, tar,
squash4, minix, minix2, minix3, ROMFS, ReiserFS, XFS, ext2, ext3, ext4,
FAT (short names), F2FS, RockRidge part of ISO9660, nilfs2, UFS1, UFS2
and ZFS are assumed to be UTF-8.  This might be false on systems
configured with legacy charset but as long as the charset used is
superset of ASCII you should be able to access ASCII-named files.  And
it's recommended to configure your system to use UTF-8 to access the
filesystem, convmv may help with migration.  ISO9660 (plain) filenames
are specified as being ASCII or being described with unspecified escape
sequences.  GRUB assumes that the ISO9660 names are UTF-8 (since any
ASCII is valid UTF-8).  There are some old CD-ROMs which use CP437 in
non-compliant way.  You're still able to access files with names
containing only ASCII characters on such filesystems though.  You're
also able to access any file if the filesystem contains valid Joliet
(UTF-16) or RockRidge (UTF-8).  AFFS, SFS and HFS never use unicode and
GRUB assumes them to be in Latin1, Latin1 and MacRoman respectively.
GRUB handles filesystem case-insensitivity however no attempt is
performed at case conversion of international characters so e.g.  a file
named lowercase greek alpha is treated as different from the one named
as uppercase alpha.  The filesystems in questions are NTFS (except POSIX
namespace), HFS+ (configurable at mkfs time, default insensitive), SFS
(configurable at mkfs time, default insensitive), JFS (configurable at
mkfs time, default sensitive), HFS, AFFS, FAT, exFAT and ZFS
(configurable on per-subvolume basis by property "casesensitivity",
default sensitive).  On ZFS subvolumes marked as case insensitive files
containing lowercase international characters are inaccessible.  Also
like all supported filesystems except HFS+ and ZFS (configurable on
per-subvolume basis by property "normalization", default none) GRUB
makes no attempt at check of canonical equivalence so a file name
u-diaresis is treated as distinct from u+combining diaresis.  This
however means that in order to access file on HFS+ its name must be
specified in normalisation form D. On normalized ZFS subvolumes
filenames out of normalisation are inaccessible.

17.3 Output terminal
====================

Firmware output console "console" on ARC and IEEE1275 are limited to
ASCII.

   BIOS firmware console and VGA text are limited to ASCII and some
pseudographics.

   None of above mentioned is appropriate for displaying international
and any unsupported character is replaced with question mark except
pseudographics which we attempt to approximate with ASCII.

   EFI console on the other hand nominally supports UTF-16 but actual
language coverage depends on firmware and may be very limited.

   The encoding used on serial can be chosen with 'terminfo' as either
ASCII, UTF-8 or "visual UTF-8".  Last one is against the specification
but results in correct rendering of right-to-left on some readers which
don't have own bidi implementation.

   On emu GRUB checks if charset is UTF-8 and uses it if so and uses
ASCII otherwise.

   When using gfxterm or gfxmenu GRUB itself is responsible for
rendering the text.  In this case GRUB is limited by loaded fonts.  If
fonts contain all required characters then bidirectional text, cursive
variants and combining marks other than enclosing, half (e.g.  left half
tilde or combining overline) and double ones.  Ligatures aren't
supported though.  This should cover European, Middle Eastern (if you
don't mind lack of lam-alif ligature in Arabic) and East Asian scripts.
Notable unsupported scripts are Brahmic family and derived as well as
Mongolian, Tifinagh, Korean Jamo (precomposed characters have no
problem) and tonal writing (2e5-2e9).  GRUB also ignores deprecated (as
specified in Unicode) characters (e.g.  tags).  GRUB also doesn't handle
so called "annotation characters" If you can complete either of two
lists or, better, propose a patch to improve rendering, please contact
developer team.

17.4 Input terminal
===================

Firmware console on BIOS, IEEE1275 and ARC doesn't allow you to enter
non-ASCII characters.  EFI specification allows for such but author is
unaware of any actual implementations.  Serial input is currently
limited for latin1 (unlikely to change).  Own keyboard implementations
(at_keyboard and usb_keyboard) supports any key but work on
one-char-per-keystroke.  So no dead keys or advanced input method.  Also
there is no keymap change hotkey.  In practice it makes difficult to
enter any text using non-Latin alphabet.  Moreover all current input
consumers are limited to ASCII.

17.5 Gettext
============

GRUB supports being translated.  For this you need to have language *.mo
files in $prefix/locale, load gettext module and set "lang" variable.

17.6 Regexp
===========

Regexps work on unicode characters, however no attempt at checking
cannonical equivalence has been made.  Moreover the classes like
[:alpha:] match only ASCII subset.

17.7 Other
==========

Currently GRUB always uses YEAR-MONTH-DAY HOUR:MINUTE:SECOND [WEEKDAY]
24-hour datetime format but weekdays are translated.  GRUB always uses
the decimal number format with [0-9] as digits and .  as descimal
separator and no group separator.  IEEE1275 aliases are matched
case-insensitively except non-ASCII which is matched as binary.  Similar
behaviour is for matching OSBundleRequired.  Since IEEE1275 aliases and
OSBundleRequired don't contain any non-ASCII it should never be a
problem in practice.  Case-sensitive identifiers are matched as raw
strings, no canonical equivalence check is performed.  Case-insenstive
identifiers are matched as RAW but additionally [a-z] is equivalent to
[A-Z]. GRUB-defined identifiers use only ASCII and so should
user-defined ones.  Identifiers containing non-ASCII may work but aren't
supported.  Only the ASCII space characters (space U+0020, tab U+000b,
CR U+000d and LF U+000a) are recognised.  Other unicode space characters
aren't a valid field separator.  'test' (*note test::) tests <, >, <=,
>=, -pgt and -plt compare the strings in the lexicographical order of
unicode codepoints, replicating the behaviour of test from coreutils.
environment variables and commands are listed in the same order.