136 lines
		
	
	
	
		
			6.6 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			136 lines
		
	
	
	
		
			6.6 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
| 17 Internationalisation
 | |
| ***********************
 | |
| 
 | |
| 17.1 Charset
 | |
| ============
 | |
| 
 | |
| GRUB uses UTF-8 internally other than in rendering where some
 | |
| GRUB-specific appropriate representation is used.  All text files
 | |
| (including config) are assumed to be encoded in UTF-8.
 | |
| 
 | |
| 17.2 Filesystems
 | |
| ================
 | |
| 
 | |
| NTFS, JFS, UDF, HFS+, exFAT, long filenames in FAT, Joliet part of
 | |
| ISO9660 are treated as UTF-16 as per specification.  AFS and BFS are
 | |
| read as UTF-8, again according to specification.  BtrFS, cpio, tar,
 | |
| squash4, minix, minix2, minix3, ROMFS, ReiserFS, XFS, ext2, ext3, ext4,
 | |
| FAT (short names), F2FS, RockRidge part of ISO9660, nilfs2, UFS1, UFS2
 | |
| and ZFS are assumed to be UTF-8.  This might be false on systems
 | |
| configured with legacy charset but as long as the charset used is
 | |
| superset of ASCII you should be able to access ASCII-named files.  And
 | |
| it's recommended to configure your system to use UTF-8 to access the
 | |
| filesystem, convmv may help with migration.  ISO9660 (plain) filenames
 | |
| are specified as being ASCII or being described with unspecified escape
 | |
| sequences.  GRUB assumes that the ISO9660 names are UTF-8 (since any
 | |
| ASCII is valid UTF-8).  There are some old CD-ROMs which use CP437 in
 | |
| non-compliant way.  You're still able to access files with names
 | |
| containing only ASCII characters on such filesystems though.  You're
 | |
| also able to access any file if the filesystem contains valid Joliet
 | |
| (UTF-16) or RockRidge (UTF-8).  AFFS, SFS and HFS never use unicode and
 | |
| GRUB assumes them to be in Latin1, Latin1 and MacRoman respectively.
 | |
| GRUB handles filesystem case-insensitivity however no attempt is
 | |
| performed at case conversion of international characters so e.g.  a file
 | |
| named lowercase greek alpha is treated as different from the one named
 | |
| as uppercase alpha.  The filesystems in questions are NTFS (except POSIX
 | |
| namespace), HFS+ (configurable at mkfs time, default insensitive), SFS
 | |
| (configurable at mkfs time, default insensitive), JFS (configurable at
 | |
| mkfs time, default sensitive), HFS, AFFS, FAT, exFAT and ZFS
 | |
| (configurable on per-subvolume basis by property "casesensitivity",
 | |
| default sensitive).  On ZFS subvolumes marked as case insensitive files
 | |
| containing lowercase international characters are inaccessible.  Also
 | |
| like all supported filesystems except HFS+ and ZFS (configurable on
 | |
| per-subvolume basis by property "normalization", default none) GRUB
 | |
| makes no attempt at check of canonical equivalence so a file name
 | |
| u-diaresis is treated as distinct from u+combining diaresis.  This
 | |
| however means that in order to access file on HFS+ its name must be
 | |
| specified in normalisation form D. On normalized ZFS subvolumes
 | |
| filenames out of normalisation are inaccessible.
 | |
| 
 | |
| 17.3 Output terminal
 | |
| ====================
 | |
| 
 | |
| Firmware output console "console" on ARC and IEEE1275 are limited to
 | |
| ASCII.
 | |
| 
 | |
|    BIOS firmware console and VGA text are limited to ASCII and some
 | |
| pseudographics.
 | |
| 
 | |
|    None of above mentioned is appropriate for displaying international
 | |
| and any unsupported character is replaced with question mark except
 | |
| pseudographics which we attempt to approximate with ASCII.
 | |
| 
 | |
|    EFI console on the other hand nominally supports UTF-16 but actual
 | |
| language coverage depends on firmware and may be very limited.
 | |
| 
 | |
|    The encoding used on serial can be chosen with 'terminfo' as either
 | |
| ASCII, UTF-8 or "visual UTF-8".  Last one is against the specification
 | |
| but results in correct rendering of right-to-left on some readers which
 | |
| don't have own bidi implementation.
 | |
| 
 | |
|    On emu GRUB checks if charset is UTF-8 and uses it if so and uses
 | |
| ASCII otherwise.
 | |
| 
 | |
|    When using gfxterm or gfxmenu GRUB itself is responsible for
 | |
| rendering the text.  In this case GRUB is limited by loaded fonts.  If
 | |
| fonts contain all required characters then bidirectional text, cursive
 | |
| variants and combining marks other than enclosing, half (e.g.  left half
 | |
| tilde or combining overline) and double ones.  Ligatures aren't
 | |
| supported though.  This should cover European, Middle Eastern (if you
 | |
| don't mind lack of lam-alif ligature in Arabic) and East Asian scripts.
 | |
| Notable unsupported scripts are Brahmic family and derived as well as
 | |
| Mongolian, Tifinagh, Korean Jamo (precomposed characters have no
 | |
| problem) and tonal writing (2e5-2e9).  GRUB also ignores deprecated (as
 | |
| specified in Unicode) characters (e.g.  tags).  GRUB also doesn't handle
 | |
| so called "annotation characters" If you can complete either of two
 | |
| lists or, better, propose a patch to improve rendering, please contact
 | |
| developer team.
 | |
| 
 | |
| 17.4 Input terminal
 | |
| ===================
 | |
| 
 | |
| Firmware console on BIOS, IEEE1275 and ARC doesn't allow you to enter
 | |
| non-ASCII characters.  EFI specification allows for such but author is
 | |
| unaware of any actual implementations.  Serial input is currently
 | |
| limited for latin1 (unlikely to change).  Own keyboard implementations
 | |
| (at_keyboard and usb_keyboard) supports any key but work on
 | |
| one-char-per-keystroke.  So no dead keys or advanced input method.  Also
 | |
| there is no keymap change hotkey.  In practice it makes difficult to
 | |
| enter any text using non-Latin alphabet.  Moreover all current input
 | |
| consumers are limited to ASCII.
 | |
| 
 | |
| 17.5 Gettext
 | |
| ============
 | |
| 
 | |
| GRUB supports being translated.  For this you need to have language *.mo
 | |
| files in $prefix/locale, load gettext module and set "lang" variable.
 | |
| 
 | |
| 17.6 Regexp
 | |
| ===========
 | |
| 
 | |
| Regexps work on unicode characters, however no attempt at checking
 | |
| cannonical equivalence has been made.  Moreover the classes like
 | |
| [:alpha:] match only ASCII subset.
 | |
| 
 | |
| 17.7 Other
 | |
| ==========
 | |
| 
 | |
| Currently GRUB always uses YEAR-MONTH-DAY HOUR:MINUTE:SECOND [WEEKDAY]
 | |
| 24-hour datetime format but weekdays are translated.  GRUB always uses
 | |
| the decimal number format with [0-9] as digits and .  as descimal
 | |
| separator and no group separator.  IEEE1275 aliases are matched
 | |
| case-insensitively except non-ASCII which is matched as binary.  Similar
 | |
| behaviour is for matching OSBundleRequired.  Since IEEE1275 aliases and
 | |
| OSBundleRequired don't contain any non-ASCII it should never be a
 | |
| problem in practice.  Case-sensitive identifiers are matched as raw
 | |
| strings, no canonical equivalence check is performed.  Case-insenstive
 | |
| identifiers are matched as RAW but additionally [a-z] is equivalent to
 | |
| [A-Z]. GRUB-defined identifiers use only ASCII and so should
 | |
| user-defined ones.  Identifiers containing non-ASCII may work but aren't
 | |
| supported.  Only the ASCII space characters (space U+0020, tab U+000b,
 | |
| CR U+000d and LF U+000a) are recognised.  Other unicode space characters
 | |
| aren't a valid field separator.  'test' (*note test::) tests <, >, <=,
 | |
| >=, -pgt and -plt compare the strings in the lexicographical order of
 | |
| unicode codepoints, replicating the behaviour of test from coreutils.
 | |
| environment variables and commands are listed in the same order.
 | |
| 
 |