Mercurial > vim
comparison runtime/doc/mbyte.txt @ 7:3fc0f57ecb91 v7.0001
updated for version 7.0001
author | vimboss |
---|---|
date | Sun, 13 Jun 2004 20:20:40 +0000 |
parents | |
children | 4102fb4ea781 |
comparison
equal
deleted
inserted
replaced
6:c2daee826b8f | 7:3fc0f57ecb91 |
---|---|
1 *mbyte.txt* For Vim version 7.0aa. Last change: 2004 Jun 07 | |
2 | |
3 | |
4 VIM REFERENCE MANUAL by Bram Moolenaar et al. | |
5 | |
6 | |
7 Multi-byte support *multibyte* *multi-byte* | |
8 *Chinese* *Japanese* *Korean* | |
9 This is about editing text in languages which have many characters that can | |
10 not be represented using one byte (one octet). Examples are Chinese, Japanese | |
11 and Korean. Unicode is also covered here. | |
12 | |
13 For an introduction to the most common features, see |usr_45.txt| in the user | |
14 manual. | |
15 For changing the language of messages and menus see |mlang.txt|. | |
16 | |
17 {not available when compiled without the +multi_byte feature} | |
18 | |
19 | |
20 1. Getting started |mbyte-first| | |
21 2. Locale |mbyte-locale| | |
22 3. Encoding |mbyte-encoding| | |
23 4. Using a terminal |mbyte-terminal| | |
24 5. Fonts on X11 |mbyte-fonts-X11| | |
25 6. Fonts on MS-Windows |mbyte-fonts-MSwin| | |
26 7. Input on X11 |mbyte-XIM| | |
27 8. Input on MS-Windows |mbyte-IME| | |
28 9. Input with a keymap |mbyte-keymap| | |
29 10. Using UTF-8 |mbyte-utf8| | |
30 11. Overview of options |mbyte-options| | |
31 | |
32 NOTE: This file contains UTF-8 characters. These may show up as strange | |
33 characters or boxes when using another encoding. | |
34 | |
35 ============================================================================== | |
36 1. Getting started *mbyte-first* | |
37 | |
38 This is a summary of the multibyte features in Vim. If you are lucky it works | |
39 as described and you can start using Vim without much trouble. If something | |
40 doesn't work you will have to read the rest. Don't be surprised if it takes | |
41 quite a bit of work and experimenting to make Vim use all the multi-byte | |
42 features. Unfortunately, every system has its own way to deal with multibyte | |
43 languages and it is quite complicated. | |
44 | |
45 | |
46 COMPILING | |
47 | |
48 If you already have a compiled Vim program, check if the |+multi_byte| feature | |
49 is included. The |:version| command can be used for this. | |
50 | |
51 If +multi_byte is not included, you should compile Vim with "big" features. | |
52 You can further tune what features are included. See the INSTALL files in the | |
53 source directory. | |
54 | |
55 | |
56 LOCALE | |
57 | |
58 First of all, you must make sure your current locale is set correctly. If | |
59 your system has been installed to use the language, it probably works right | |
60 away. If not, you can often make it work by setting the $LANG environment | |
61 variable in your shell: > | |
62 | |
63 setenv LANG ja_JP.EUC | |
64 | |
65 Unfortunately, the name of the locale depends on your system. Japanese might | |
66 also be called "ja_JP.EUCjp" or just "ja". To see what is currently used: > | |
67 | |
68 :language | |
69 | |
70 To change the locale inside Vim use: > | |
71 | |
72 :language ja_JP.EUC | |
73 | |
74 Vim will give an error message if this doesn't work. This is a good way to | |
75 experiment and find the locale name you want to use. But it's always better | |
76 to set the locale in the shell, so that it is used right from the start. | |
77 | |
78 See |mbyte-locale| for details. | |
79 | |
80 | |
81 ENCODING | |
82 | |
83 If your locale works properly, Vim will try to set the 'encoding' option | |
84 accordingly. If this doesn't work you can overrule its value: > | |
85 | |
86 :set encoding=utf-8 | |
87 | |
88 See |encoding-values| for a list of acceptable values. | |
89 | |
90 The result is that all the text that is used inside Vim will be in this | |
91 encoding. Not only the text in the buffers, but also in registers, variables, | |
92 etc. This also means that changing the value of 'encoding' makes the existing | |
93 text invalid! The text doesn't change, but it will be displayed wrong. | |
94 | |
95 You can edit files in another encoding than what 'encoding' is set to. Vim | |
96 will convert the file when you read it and convert it back when you write it. | |
97 See 'fileencoding', 'fileencodings' and |++enc|. | |
98 | |
99 | |
100 DISPLAY AND FONTS | |
101 | |
102 If you are working in a terminal (emulator) you must make sure it accepts the | |
103 same encoding as which Vim is working with. If this is not the case, you can | |
104 use the 'termencoding' option to make Vim convert text automatically. | |
105 | |
106 For the GUI you must select fonts that work with the current 'encoding'. This | |
107 is the difficult part. It depends on the system you are using, the locale and | |
108 a few other things. See the chapters on fonts: |mbyte-fonts-X11| for | |
109 X-Windows and |mbyte-fonts-MSwin| for MS-Windows. | |
110 | |
111 For GTK+ 2, you can skip most of this section. The option 'guifontset' does | |
112 no longer exist. You only need to set 'guifont' and everything should "just | |
113 work". If your system comes with Xft2 and fontconfig and the current font | |
114 does not contain a certain glyph, a different font will be used automatically | |
115 if available. The 'guifontwide' option is still supported but usually you do | |
116 not need to set it. It is only necessary if the automatic font selection does | |
117 not suit your needs. | |
118 | |
119 For X11 you can set the 'guifontset' option to a list of fonts that together | |
120 cover the characters that are used. Example for Korean: > | |
121 | |
122 :set guifontset=k12,r12 | |
123 | |
124 Alternatively, you can set 'guifont' and 'guifontwide'. 'guifont' is used for | |
125 the single-width characters, 'guifontwide' for the double-width characters. | |
126 Thus the 'guifontwide' font must be exactly twice as wide as 'guifont'. | |
127 Example for UTF-8: > | |
128 | |
129 :set guifont=-misc-fixed-medium-r-normal-*-18-120-100-100-c-90-iso10646-1 | |
130 :set guifontwide=-misc-fixed-medium-r-normal-*-18-120-100-100-c-180-iso10646-1 | |
131 | |
132 You can also set 'guifont' alone, Vim will try to find a matching | |
133 'guifontwide' for you. | |
134 | |
135 | |
136 INPUT | |
137 | |
138 There are several ways to enter multi-byte characters: | |
139 - For X11 XIM can be used. See |XIM|. | |
140 - For MS-Windows IME can be used. See |IME|. | |
141 - For all systems keymaps can be used. See |mbyte-keymap|. | |
142 | |
143 The options 'iminsert', 'imsearch' and 'imcmdline' can be used to chose | |
144 the different input medhods or disable them temporarily. | |
145 | |
146 ============================================================================== | |
147 2. Locale *mbyte-locale* | |
148 | |
149 The easiest setup is when your whole system uses the locale you want to work | |
150 in. But it's also possible to set the locale for one shell you are working | |
151 in, or just use a certain locale inside Vim. | |
152 | |
153 | |
154 WHAT IS A LOCALE? *locale* | |
155 | |
156 There are many of languages in the world. And there are different cultures | |
157 and environments at least as much as the number of languages. A linguistic | |
158 environment corresponding to an area is called "locale". This includes | |
159 information about the used language, the charset, collating order for sorting, | |
160 date format, currency format and so on. For Vim only the language and charset | |
161 really matter. | |
162 | |
163 You can only use a locale if your system has support for it. Some systems | |
164 have only a few locales, especially in the USA. The language which you want | |
165 to use may not be on your system. In that case you might be able to install | |
166 it as an extra package. Check your system documentation for how to do that. | |
167 | |
168 The location in which the locales are installed varies from system to system. | |
169 For example, "/usr/share/locale" or "/usr/lib/locale". See your system's | |
170 setlocale() man page. | |
171 | |
172 Looking in these directories will show you the exact name of each locale. | |
173 Mostly upper/lowercase matters, thus "ja_JP.EUC" and "ja_jp.euc" are | |
174 different. Some systems have a locale.alias file, which allows translation | |
175 from a short name like "nl" to the full name "nl_NL.ISO_8859-1". | |
176 | |
177 Note that X-windows has its own locale stuff. And unfortunately uses locale | |
178 names different from what is used elsewhere. This is confusing! For Vim it | |
179 matters what the setlocale() function uses, which is generally NOT the | |
180 X-windows stuff. You might have to do some experiments to find out what | |
181 really works. | |
182 | |
183 *locale-name* | |
184 The (simplified) format of |locale| name is: | |
185 | |
186 language | |
187 or language_territory | |
188 or language_territory.codeset | |
189 | |
190 Territory means the country (or part of it), codeset means the |charset|. For | |
191 example, the locale name "ja_JP.eucJP" means: | |
192 ja the language is Japanese | |
193 JP the country is Japan | |
194 eucJP the codeset is EUC-JP | |
195 But it also could be "ja", "ja_JP.EUC", "ja_JP.ujis", etc. And unfortunately, | |
196 the locale name for a specific language, territory and codeset is not unified | |
197 and depends on your system. | |
198 | |
199 Examples of locale name: | |
200 charset language locale name ~ | |
201 GB2312 Chinese (simplified) zh_CN.EUC, zh_CN.GB2312 | |
202 Big5 Chinese (traditional) zh_TW.BIG5, zh_TW.Big5 | |
203 CNS-11643 Chinese (traditional) zh_TW | |
204 EUC-JP Japanese ja, ja_JP.EUC, ja_JP.ujis, ja_JP.eucJP | |
205 Shift_JIS Japanese ja_JP.SJIS, ja_JP.Shift_JIS | |
206 EUC-KR Korean ko, ko_KR.EUC | |
207 | |
208 | |
209 USING A LOCALE | |
210 | |
211 To start using a locale for the whole system, see the documentation of your | |
212 system. Mostly you need to set it in a configuration file in "/etc". | |
213 | |
214 To use a locale in a shell, set the $LANG environment value. When you want to | |
215 use Korean and the |locale| name is "ko", do this: | |
216 | |
217 sh: export LANG=ko | |
218 csh: setenv LANG ko | |
219 | |
220 You can put this in your ~/.profile or ~/.cshrc file to always use it. | |
221 | |
222 To use a locale in Vim only, use the |:language| command: > | |
223 | |
224 :language ko | |
225 | |
226 Put this in your ~/.vimrc file to use it always. | |
227 | |
228 Or specify $LANG when starting Vim: | |
229 | |
230 sh: LANG=ko vim {vim-arguments} | |
231 csh: env LANG=ko vim {vim-arguments} | |
232 | |
233 You could make a small shell script for this. | |
234 | |
235 ============================================================================== | |
236 3. Encoding *mbyte-encoding* | |
237 | |
238 Vim uses the 'encoding' option to specify how characters identified and | |
239 encoded when they are used inside Vim. This applies to all the places where | |
240 text is used, including buffers (files loaded into memory), registers and | |
241 variables. | |
242 | |
243 *charset* *codeset* | |
244 Charset is another name for encoding. There are subtle differences, but these | |
245 don't matter when using Vim. "codeset" is another similar name. | |
246 | |
247 Each character is encoded as one or more bytes. When all characters are | |
248 encoded with one byte, we call this a single-byte encoding. The most often | |
249 used one is called "latin1". This limits the number of characters to 256. | |
250 Some of these are control characters, thus even fewer can be used for text. | |
251 | |
252 When some characters use two or more bytes, we call this a multi-byte | |
253 encoding. This allows using much more than 256 characters, which is required | |
254 for most East Asian languages. | |
255 | |
256 Most multi-byte encodings use one byte for the first 127 characters. These | |
257 are equal to ASCII, which makes it easy to exchange plain-ASCII text, no | |
258 matter what language is used. Thus you might see the right text even when the | |
259 encoding was set wrong. | |
260 | |
261 *encoding-names* | |
262 Vim can use many different character encodings. There are three major groups: | |
263 | |
264 1 8bit Single-byte encodings, 256 different characters. Mostly used | |
265 in USA and Europe. Example: ISO-8859-1 (Latin1). All | |
266 characters occupy one screen cell only. | |
267 | |
268 2 2byte Double-byte encodings, over 10000 different characters. | |
269 Mostly used in Asian countries. Example: euc-kr (Korean) | |
270 The number of screen cells is equal to the number of bytes | |
271 (except for euc-jp when the first byte is 0x8e). | |
272 | |
273 u Unicode Universal encoding, can replace all others. ISO 10646. | |
274 Millions of different characters. Example: UTF-8. The | |
275 relation between bytes and screen cells is complex. | |
276 | |
277 Other encodings cannot be used by Vim internally. But files in other | |
278 encodings can be edited by using conversion, see 'fileencoding'. | |
279 Note that all encodings must use ASCII for the characters up to 128 (except | |
280 when compiled for EBCDIC). | |
281 | |
282 Supported 'encoding' values are: *encoding-values* | |
283 1 latin1 8-bit characters (ISO 8859-1) | |
284 1 iso-8859-n ISO_8859 variant (n = 2 to 15) | |
285 1 koi8-r Russian | |
286 1 koi8-u Ukrainian | |
287 1 macroman MacRoman (Macintosh encoding) | |
288 1 8bit-{name} any 8-bit encoding (Vim specific name) | |
289 1 cp{number} MS-Windows: any installed single-byte codepage | |
290 2 cp932 Japanese (Windows only) | |
291 2 euc-jp Japanese (Unix only) | |
292 2 sjis Japanese (Unix only) | |
293 2 cp949 Korean (Unix and Windows) | |
294 2 euc-kr Korean (Unix only) | |
295 2 cp936 simplified Chinese (Windows only) | |
296 2 euc-cn simplified Chinese (Unix only) | |
297 2 cp950 traditional Chinese (on Unix alias for big5) | |
298 2 big5 traditional Chinese (on Windows alias for cp950) | |
299 2 euc-tw traditional Chinese (Unix only) | |
300 2 2byte-{name} Unix: any double-byte encoding (Vim specific name) | |
301 2 cp{number} MS-Windows: any installed double-byte codepage | |
302 u utf-8 32 bit UTF-8 encoded Unicode (ISO/IEC 10646-1) | |
303 u ucs-2 16 bit UCS-2 encoded Unicode (ISO/IEC 10646-1) | |
304 u ucs-2le like ucs-2, little endian | |
305 u utf-16 ucs-2 extended with double-words for more characters | |
306 u utf-16le like utf-16, little endian | |
307 u ucs-4 32 bit UCS-4 encoded Unicode (ISO/IEC 10646-1) | |
308 u ucs-4le like ucs-4, little endian | |
309 | |
310 The {name} can be any encoding name that your system supports. It is passed | |
311 to iconv() to convert between the encoding of the file and the current locale. | |
312 For MS-Windows "cp{number}" means using codepage {number}. | |
313 Examples: > | |
314 :set encoding=8bit-cp1252 | |
315 :set encoding=2byte-cp932 | |
316 < | |
317 Several aliases can be used, they are translated to one of the names above. | |
318 An incomplete list: | |
319 | |
320 1 ansi same as latin1 (obsolete, for backward compatibility) | |
321 2 japan Japanese: on Unix "euc-jp", on MS-Windows cp932 | |
322 2 korea Korean: on Unix "euc-kr", on MS-Windows cp949 | |
323 2 prc simplified Chinese: on Unix "euc-cn", on MS-Windows cp936 | |
324 2 chinese same as "prc" | |
325 2 taiwan traditional Chinese: on Unix "euc-tw", on MS-Windows cp950 | |
326 u utf8 same as utf-8 | |
327 u unicode same as ucs-2 | |
328 u ucs2be same as ucs-2 (big endian) | |
329 u ucs-2be same as ucs-2 (big endian) | |
330 u ucs-4be same as ucs-4 (big endian) | |
331 | |
332 For the UCS codes the byte order matters. This is tricky, use UTF-8 whenever | |
333 you can. The default is to use big-endian (most significant byte comes | |
334 first): | |
335 name bytes char ~ | |
336 ucs-2 11 22 1122 | |
337 ucs-2le 22 11 1122 | |
338 ucs-4 11 22 33 44 11223344 | |
339 ucs-4le 44 33 22 11 11223344 | |
340 | |
341 On MS-Windows systems you often want to use "ucs-2le", because it uses little | |
342 endian UCS-2. | |
343 | |
344 There are a few encodings which are similar, but not exactly the same. Vim | |
345 treats them as if they were different encodings, so that conversion will be | |
346 done when needed. You might want to use the similar name to avoid conversion | |
347 or when conversion is not possible: | |
348 | |
349 cp932, shift-jis, sjis | |
350 cp936, euc-cn | |
351 | |
352 *encoding-table* | |
353 Normally 'encoding' is equal to your current locale and 'termencoding' is | |
354 empty. This means that your keyboard and display work with characters encoded | |
355 in your current locale, and Vim uses the same characters internally. | |
356 | |
357 You can make Vim use characters in a different encoding by setting the | |
358 'encoding' option to a different value. Since the keyboard and display still | |
359 use the current locale, conversion needs to be done. The 'termencoding' then | |
360 takes over the value of the current locale, so Vim converts between 'encoding' | |
361 and 'termencoding'. Example: > | |
362 :let &termencoding = &encoding | |
363 :set encoding=utf-8 | |
364 | |
365 However, not all combinations of values are possible. The table below tells | |
366 you how each of the nine combinations works. This is further restricted by | |
367 not all conversions being possible, iconv() being present, etc. Since this | |
368 depends on the system used, no detailed list can be given. | |
369 | |
370 ('tenc' is the short name for 'termencoding' and 'enc' short for 'encoding') | |
371 | |
372 'tenc' 'enc' remark ~ | |
373 | |
374 8bit 8bit Works. When 'termencoding' is different from | |
375 'encoding' typing and displaying may be wrong for some | |
376 characters, Vim does NOT perform conversion (set | |
377 'encoding' to "utf-8" to get this). | |
378 8bit 2byte MS-Windows: works for all codepages installed on your | |
379 system; you can only type 8bit characters; | |
380 Other systems: does NOT work. | |
381 8bit Unicode Works, but you can only type 8bit characters; in a | |
382 terminal you can only see 8bit characters; the GUI can | |
383 show all characters that the 'guifont' supports. | |
384 | |
385 2byte 8bit Works, but typing non-ASCII characters might | |
386 be a problem. | |
387 2byte 2byte MS-Windows: works for all codepages installed on your | |
388 system; typing characters might be a problem when | |
389 locale is different from 'encoding'. | |
390 Other systems: Only works when 'termencoding' is equal | |
391 to 'encoding', you might as well leave it empty. | |
392 2byte Unicode works, Vim will translate typed characters. | |
393 | |
394 Unicode 8bit works (unusual) | |
395 Unicode 2byte does NOT work | |
396 Unicode Unicode works very well (leaving 'termencoding' empty works | |
397 the same way, because all Unicode is handled | |
398 internally as UTF-8) | |
399 | |
400 CONVERSION *charset-conversion* | |
401 | |
402 Vim will automatically convert from one to another encoding in several places: | |
403 - When reading a file and 'fileencoding' is different from 'encoding' | |
404 - When writing a file and 'fileencoding' is different from 'encoding' | |
405 - When displaying characters and 'termencoding' is different from 'encoding' | |
406 - When reading input and 'termencoding' is different from 'encoding' | |
407 - When displaying messages and the encoding used for LC_MESSAGES differs from | |
408 'encoding' (requires a gettext version that supports this). | |
409 - When reading a Vim script where |:scriptencoding| is different from | |
410 'encoding'. | |
411 - When reading or writing a |viminfo| file. | |
412 Most of these require the |+iconv| feature. Conversion for reading and | |
413 writing files may also be specified with the 'charconvert' option. | |
414 | |
415 Useful utilities for converting the charset: | |
416 All: iconv | |
417 GNU iconv can convert most encodings. Unicode is used as the | |
418 intermediate encoding, which allows conversion from and to all other | |
419 encodings. See http://www.gnu.org/directory/libiconv.html. | |
420 | |
421 Japanese: nkf | |
422 Nkf is "Network Kanji code conversion Filter". One of the most unique | |
423 facility of nkf is the guess of the input Kanji code. So, you don't | |
424 need to know what the inputting file's |charset| is. When convert to | |
425 EUC-JP from ISO-2022-JP or Shift_JIS, simply do the following command | |
426 in Vim: | |
427 :%!nkf -e | |
428 Nkf can be found at: | |
429 http://www.sfc.wide.ad.jp/~max/FreeBSD/ports/distfiles/nkf-1.62.tar.gz | |
430 | |
431 Chinese: hc | |
432 Hc is "Hanzi Converter". Hc convert a GB file to a Big5 file, or Big5 | |
433 file to GB file. Hc can be found at: | |
434 ftp://ftp.cuhk.hk/pub/chinese/ifcss/software/unix/convert/hc-30.tar.gz | |
435 | |
436 Korean: hmconv | |
437 Hmconv is Korean code conversion utility especially for E-mail. It can | |
438 convert between EUC-KR and ISO-2022-KR. Hmconv can be found at: | |
439 ftp://ftp.kaist.ac.kr/pub/hangul/code/hmconv/ | |
440 | |
441 Multilingual: lv | |
442 Lv is a Powerful Multilingual File Viewer. And it can be worked as | |
443 |charset| converter. Supported |charset|: ISO-2022-CN, ISO-2022-JP, | |
444 ISO-2022-KR, EUC-CN, EUC-JP, EUC-KR, EUC-TW, UTF-7, UTF-8, ISO-8859 | |
445 series, Shift_JIS, Big5 and HZ. Lv can be found at: | |
446 http://www.ff.iij4u.or.jp/~nrt/freeware/lv4495.tar.gz | |
447 | |
448 | |
449 *mbyte-conversion* | |
450 When reading and writing files in an encoding different from 'encoding', | |
451 conversion needs to be done. These conversions are supported: | |
452 - All conversions between Latin-1 (ISO-8859-1), UTF-8, UCS-2 and UCS-4 are | |
453 handled internally. | |
454 - For MS-Windows, when 'encoding' is a Unicode encoding, conversion from and | |
455 to any codepage should work. | |
456 - Conversion specified with 'charconvert' | |
457 - Conversion with the iconv library, if it is available. | |
458 Old versions of GNU iconv() may cause the conversion to fail (they | |
459 request a very large buffer, more than Vim is willing to provide). | |
460 Try getting another iconv() implementation. | |
461 | |
462 ============================================================================== | |
463 4. Using a terminal *mbyte-terminal* | |
464 | |
465 The GUI fully supports multi-byte characters. It is also possible in a | |
466 terminal, if the terminal supports the same encoding that Vim uses. Thus this | |
467 is less flexible. | |
468 | |
469 For example, you can run Vim in a xterm with added multi-byte support and/or | |
470 |XIM|. Examples are kterm (Kanji term) and hanterm (for Korean), Eterm | |
471 (Enlightened terminal) and rxvt. | |
472 | |
473 If your terminal does not support the right encoding, you can set the | |
474 'termencoding' option. Vim will then convert the typed characters from | |
475 'termencoding' to 'encoding'. And displayed text will be converted from | |
476 'encoding' to 'termencoding'. If the encoding supported by the terminal | |
477 doesn't include all the characters that Vim uses, this leads to lost | |
478 characters. This may mess up the display. If you use a terminal that | |
479 supports Unicode, such as the xterm mentioned below, it should work just fine, | |
480 since nearly every character set can be converted to Unicode without loss of | |
481 information. | |
482 | |
483 | |
484 UTF-8 IN XFREE86 XTERM *UTF8-xterm* | |
485 | |
486 This is a short explanation of how to use UTF-8 character encoding in the | |
487 xterm that comes with XFree86 by Thomas Dickey (text by Markus Kuhn). | |
488 | |
489 Get the latest xterm version which has now UTF-8 support: | |
490 | |
491 http://invisible-island.net/xterm/xterm.html | |
492 | |
493 Compile it with "./configure --enable-wide-chars ; make" | |
494 | |
495 Also get the ISO 10646-1 version of various fonts, which is available on | |
496 | |
497 http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz | |
498 | |
499 and install the font as described in the README file. | |
500 | |
501 Now start xterm with > | |
502 | |
503 xterm -u8 -fn -misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1 | |
504 or, for bigger character: > | |
505 xterm -u8 -fn -misc-fixed-medium-r-normal--15-140-75-75-c-90-iso10646-1 | |
506 | |
507 and you will have a working UTF-8 terminal emulator. Try both > | |
508 | |
509 cat utf-8-demo.txt | |
510 vim utf-8-demo.txt | |
511 | |
512 with the demo text that comes with ucs-fonts.tar.gz in order to see | |
513 whether there are any problems with UTF-8 in your xterm. | |
514 | |
515 For Vim you may need to set 'encoding' to "utf-8". | |
516 | |
517 ============================================================================== | |
518 5. Fonts on X11 *mbyte-fonts-X11* | |
519 | |
520 Unfortunately, using fonts in X11 is complicated. The name of a single-byte | |
521 font is a long string. For multi-byte fonts we need several of these... | |
522 | |
523 Note: Most of this is no longer relevant for GTK+ 2. Selecting a font via | |
524 its XLFD is not supported anymore; see 'guifont' for an example of how to | |
525 set the font. Do yourself a favor and ignore the |XLFD| and |xfontset| | |
526 sections below. | |
527 | |
528 First of all, Vim only accepts fixed-width fonts for displaying text. You | |
529 cannot use proportionally spaced fonts. This excludes many of the available | |
530 (and nicer looking) fonts. However, for menus and tooltips any font can be | |
531 used. | |
532 | |
533 Note that Display and Input are independent. It is possible to see your | |
534 language even though you have no input method for it. | |
535 | |
536 You should get a default font for menus and tooltips that works, but it might | |
537 be ugly. Read the following to find out how to select a better font. | |
538 | |
539 | |
540 X LOGICAL FONT DESCRIPTION (XLFD) | |
541 *XLFD* | |
542 XLFD is the X font name and contains the information about the font size, | |
543 charset, etc. The name is in this format: | |
544 | |
545 FOUNDRY-FAMILY-WEIGHT-SLANT-WIDTH-STYLE-PIXEL-POINT-X-Y-SPACE-AVE-CR-CE | |
546 | |
547 Each field means: | |
548 | |
549 - FOUNDRY: FOUNDRY field. The company that created the font. | |
550 - FAMILY: FAMILY_NAME field. Basic font family name. (helvetica, gothic, | |
551 times, etc) | |
552 - WEIGHT: WEIGHT_NAME field. How thick the letters are. (light, medium, | |
553 bold, etc) | |
554 - SLANT: SLANT field. | |
555 r: Roman (no slant) | |
556 i: Italic | |
557 o: Oblique | |
558 ri: Reverse Italic | |
559 ro: Reverse Oblique | |
560 ot: Other | |
561 number: Scaled font | |
562 - WIDTH: SETWIDTH_NAME field. Width of characters. (normal, condensed, | |
563 narrow, double wide) | |
564 - STYLE: ADD_STYLE_NAME field. Extra info to describe font. (Serif, Sans | |
565 Serif, Informal, Decorated, etc) | |
566 - PIXEL: PIXEL_SIZE field. Height, in pixels, of characters. | |
567 - POINT: POINT_SIZE field. Ten times height of characters in points. | |
568 - X: RESOLUTION_X field. X resolution (dots per inch). | |
569 - Y: RESOLUTION_Y field. Y resolution (dots per inch). | |
570 - SPACE: SPACING field. | |
571 p: Proportional | |
572 m: Monospaced | |
573 c: CharCell | |
574 - AVE: AVERAGE_WIDTH field. Ten times average width in pixels. | |
575 - CR: CHARSET_REGISTRY field. The name of the charset group. | |
576 - CE: CHARSET_ENCODING field. The rest of the charset name. For some | |
577 charsets, such as JIS X 0208, if this field is 0, code points has | |
578 the same value as GL, and GR if 1. | |
579 | |
580 For example, in case of a 14 dots font corresponding to JIS X 0208, it is | |
581 written like: | |
582 -misc-fixed-medium-r-normal--16-110-100-100-c-160-jisx0208.1990-0 | |
583 | |
584 | |
585 X FONTSET | |
586 *fontset* *xfontset* | |
587 A single-byte charset is typically associated with one font. For multi-byte | |
588 charsets a combination of fonts is often used. This means that one group of | |
589 characters are used from one font and another group from another font (which | |
590 might be double wide). This collection of fonts is called a fontset. | |
591 | |
592 Which fonts are required in a fontset depends on the current locale. X | |
593 windows maintains a table of which groups of characters are required for a | |
594 locale. You have to specify all the fonts that a locale requires in the | |
595 'guifontset' option. | |
596 | |
597 NOTE: The fontset always uses the current locale, even though 'encoding' may | |
598 be set to use a different charset. In that situation you might want to use | |
599 'guifont' and 'guifontwide' instead of 'guifontset'. | |
600 | |
601 Example: | |
602 |charset| language "groups of characters" ~ | |
603 GB2312 Chinese (simplified) ISO-8859-1 and GB 2312 | |
604 Big5 Chinese (traditional) ISO-8859-1 and Big5 | |
605 CNS-11643 Chinese (traditional) ISO-8859-1, CNS 11643-1 and CNS 11643-2 | |
606 EUC-JP Japanese JIS X 0201 and JIS X 0208 | |
607 EUC-KR Korean ISO-8859-1 and KS C 5601 (KS X 1001) | |
608 | |
609 You can search for fonts using the xlsfonts command. For example, when you're | |
610 searching for a font for KS C 5601: > | |
611 xlsfonts | grep ksc5601 | |
612 | |
613 This is complicated and confusing. You might want to consult the X-Windows | |
614 documentation if there is something you don't understand. | |
615 | |
616 *base_font_name_list* | |
617 When you have found the names of the fonts you want to use, you need to set | |
618 the 'guifontset' option. You specify the list by concatenating the font names | |
619 and putting a comma in between them. | |
620 | |
621 For example, when you use the ja_JP.eucJP locale, this requires JIS X 0201 | |
622 and JIS X 0208. You could supply a list of fonts that explicitly specifies | |
623 the charsets, like: > | |
624 | |
625 :set guifontset=-misc-fixed-medium-r-normal--14-130-75-75-c-140-jisx0208.1983-0, | |
626 \-misc-fixed-medium-r-normal--14-130-75-75-c-70-jisx0201.1976-0 | |
627 | |
628 Alternatively, you can supply a base font name list that omits the charset | |
629 name, letting X-Windows select font characters required for the locale. For | |
630 example: > | |
631 | |
632 :set guifontset=-misc-fixed-medium-r-normal--14-130-75-75-c-140, | |
633 \-misc-fixed-medium-r-normal--14-130-75-75-c-70 | |
634 | |
635 Alternatively, you can supply a single base font name that allows X-Windows to | |
636 select from all available fonts. For example: > | |
637 | |
638 :set guifontset=-misc-fixed-medium-r-normal--14-* | |
639 | |
640 Alternatively, you can specify alias names. See the fonts.alias file in the | |
641 fonts directory (e.g., /usr/X11R6/lib/X11/fonts/). For example: > | |
642 | |
643 :set guifontset=k14,r14 | |
644 < | |
645 *E253* | |
646 Note that in East Asian fonts, the standard character cell is square. When | |
647 mixing a Latin font and an East Asian font, the East Asian font width should | |
648 be twice the Latin font width. | |
649 | |
650 If 'guifontset' is not empty, the "font" argument of the |:highlight| command | |
651 is also interpreted as a fontset. For example, you should use for | |
652 highlighting: > | |
653 :hi Comment font=english_font,your_font | |
654 If you use a wrong "font" argument you will get an error message. | |
655 Also make sure that you set 'guifontset' before setting fonts for highlight | |
656 groups. | |
657 | |
658 | |
659 USING RESOURCE FILES | |
660 | |
661 Instead of specifying 'guifontset', you can set X11 resources and Vim will | |
662 pick them up. This is only for people who know how X resource files work. | |
663 | |
664 For Motif and Athena insert these three lines in your $HOME/.Xdefaults file: | |
665 | |
666 Vim.font: |base_font_name_list| | |
667 Vim*fontSet: |base_font_name_list| | |
668 Vim*fontList: your_language_font | |
669 | |
670 Note: Vim.font is for text area. | |
671 Vim*fontSet is for menu. | |
672 Vim*fontList is for menu (for Motif GUI) | |
673 | |
674 For example, when you are using Japanese and a 14 dots font, > | |
675 | |
676 Vim.font: -misc-fixed-medium-r-normal--14-* | |
677 Vim*fontSet: -misc-fixed-medium-r-normal--14-* | |
678 Vim*fontList: -misc-fixed-medium-r-normal--14-* | |
679 < | |
680 or: > | |
681 | |
682 Vim*font: k14,r14 | |
683 Vim*fontSet: k14,r14 | |
684 Vim*fontList: k14,r14 | |
685 < | |
686 To have them take effect immediately you will have to do > | |
687 | |
688 xrdb -merge ~/.Xdefaults | |
689 | |
690 Otherwise you will have to stop and restart the X server before the changes | |
691 take effect. | |
692 | |
693 | |
694 The GTK+ version of GUI Vim does not use .Xdefaults, use ~/.gtkrc instead. | |
695 The default mostly works OK. But for the menus you might have to change | |
696 it. Example: > | |
697 | |
698 style "default" | |
699 { | |
700 fontset="-*-*-medium-r-normal--14-*-*-*-c-*-*-*" | |
701 } | |
702 widget_class "*" style "default" | |
703 | |
704 ============================================================================== | |
705 6. Fonts on MS-Windows *mbyte-fonts-MSwin* | |
706 | |
707 The simplest is to use the font dialog to select fonts and try them out. You | |
708 can find this at the "Edit/Select Font..." menu. Once you find a font name | |
709 that works well you can use this command to see its name: > | |
710 | |
711 :set guifont | |
712 | |
713 Then add a command to your |gvimrc| file to set 'guifont': > | |
714 | |
715 :set guifont=courier_new:h12 | |
716 | |
717 ============================================================================== | |
718 7. Input on X11 *mbyte-XIM* | |
719 | |
720 X INPUT METHOD (XIM) BACKGROUND *XIM* *xim* *x-input-method* | |
721 | |
722 XIM is an international input module for X. There are two kind of structures, | |
723 Xlib unit type and |IM-server| (Input-Method server) type. |IM-server| type | |
724 is suitable for complex input, such as CJK. | |
725 | |
726 - IM-server | |
727 *IM-server* | |
728 In |IM-server| type input structures, the input event is handled by either | |
729 of the two ways: FrontEnd system and BackEnd system. In the FrontEnd | |
730 system, input events are snatched by the |IM-server| first, then |IM-server| | |
731 give the application the result of input. On the other hand, the BackEnd | |
732 system works reverse order. MS Windows adopt BackEnd system. In X, most of | |
733 |IM-server|s adopt FrontEnd system. The demerit of BackEnd system is the | |
734 large overhead in communication, but it provides safe synchronization with | |
735 no restrictions on applications. | |
736 | |
737 For example, there are xwnmo and kinput2 Japanese |IM-server|, both are | |
738 FrontEnd system. Xwnmo is distributed with Wnn (see below), kinput2 can be | |
739 found at: ftp://ftp.sra.co.jp/pub/x11/kinput2/ | |
740 | |
741 For Chinese, there's a great XIM server named "xcin", you can input both | |
742 Traditional and Simplified Chinese characters. And it can accept other | |
743 locale if you make a correct input table. Xcin can be found at: | |
744 http://xcin.linux.org.tw/ | |
745 | |
746 - Conversion Server | |
747 *conversion-server* | |
748 Some system needs additional server: conversion server. Most of Japanese | |
749 |IM-server|s need it, Kana-Kanji conversion server. For Chinese inputting, | |
750 it depends on the method of inputting, in some methods, PinYin or ZhuYin to | |
751 HanZi conversion server is needed. For Korean inputting, if you want to | |
752 input Hanja, Hangul-Hanja conversion server is needed. | |
753 | |
754 For example, the Japanese inputting process is divided into 2 steps. First | |
755 we pre-input Hira-gana, second Kana-Kanji conversion. There are so many | |
756 Kanji characters (6349 Kanji characters are defined in JIS X 0208) and the | |
757 number of Hira-gana characters are 76. So, first, we pre-input text as | |
758 pronounced in Hira-gana, second, we convert Hira-gana to Kanji or Kata-Kana, | |
759 if needed. There are some Kana-Kanji conversion server: jserver | |
760 (distributed with Wnn, see below) and canna. Canna could be found at: | |
761 ftp://ftp.nec.co.jp/pub/Canna/ (no longer works). | |
762 | |
763 There is a good input system: Wnn4.2. Wnn 4.2 contains, | |
764 xwnmo (|IM-server|) | |
765 jserver (Japanese Kana-Kanji conversion server) | |
766 cserver (Chinese PinYin or ZhuYin to simplified HanZi conversion server) | |
767 tserver (Chinese PinYin or ZhuYin to traditional HanZi conversion server) | |
768 kserver (Hangul-Hanja conversion server) | |
769 Wnn 4.2 for several systems can be found at various places on the internet. | |
770 Use the RPM or port for your system. | |
771 | |
772 | |
773 - Input Style | |
774 *xim-input-style* | |
775 When inputting CJK, there are four areas: | |
776 1. The area to display of the input while it is being composed | |
777 2. The area to display the currently active input mode. | |
778 3. The area to display the next candidate for the selection. | |
779 4. The area to display other tools. | |
780 | |
781 The third area is needed when converting. For example, in Japanese | |
782 inputting, multiple Kanji characters could have the same pronunciation, so | |
783 a sequence of Hira-gana characters could map to a distinct sequence of Kanji | |
784 characters. | |
785 | |
786 The first and second areas are defined in international input of X with the | |
787 names of "Preedit Area", "Status Area" respectively. The third and fourth | |
788 areas are not defined and are left to be managed by the |IM-server|. In the | |
789 international input, four input styles have been defined using combinations | |
790 of Preedit Area and Status Area: |OnTheSpot|, |OffTheSpot|, |OverTheSpot| | |
791 and |Root|. | |
792 | |
793 Currently, GUI Vim support three style, |OverTheSpot|, |OffTheSpot| and | |
794 |Root|. | |
795 | |
796 *. on-the-spot *OnTheSpot* | |
797 Preedit Area and Status Area are performed by the client application in | |
798 the area of application. The client application is directed by the | |
799 |IM-server| to display all pre-edit data at the location of text | |
800 insertion. The client registers callbacks invoked by the input method | |
801 during pre-editing. | |
802 *. over-the-spot *OverTheSpot* | |
803 Status Area is created in a fixed position within the area of application, | |
804 in case of Vim, the position is the additional status line. Preedit Area | |
805 is made at present input position of application. The input method | |
806 displays pre-edit data in a window which it brings up directly over the | |
807 text insertion position. | |
808 *. off-the-spot *OffTheSpot* | |
809 Preedit Area and Status Area are performed in the area of application, in | |
810 case of Vim, the area is additional status line. The client application | |
811 provides display windows for the pre-edit data to the input method which | |
812 displays into them directly. | |
813 *. root-window *Root* | |
814 Preedit Area and Status Area are outside of the application. The input | |
815 method displays all pre-edit data in a separate area of the screen in a | |
816 window specific to the input method. | |
817 | |
818 | |
819 USING XIM *multibyte-input* *E284* *E286* *E287* *E288* | |
820 *E285* *E291* *E292* *E290* *E289* | |
821 | |
822 Note that Display and Input are independent. It is possible to see your | |
823 language even though you have no input method for it. But when your Display | |
824 method doesn't match your Input method, the text will be displayed wrong. | |
825 | |
826 Note: You can not use IM unless you specify 'guifontset'. | |
827 Therefore, Latin users, you have to also use 'guifontset' | |
828 if you use IM. | |
829 | |
830 To input your language you should run the |IM-server| which supports your | |
831 language and |conversion-server| if needed. | |
832 | |
833 The next 3 lines should be put in your ~/.Xdefaults file. They are common for | |
834 all X applications which uses |XIM|. If you already use |XIM|, you can skip | |
835 this. > | |
836 | |
837 *international: True | |
838 *.inputMethod: your_input_server_name | |
839 *.preeditType: your_input_style | |
840 < | |
841 input_server_name is your |IM-server| name (check your |IM-server| | |
842 manual). | |
843 your_input_style is one of |OverTheSpot|, |OffTheSpot|, |Root|. See | |
844 also |xim-input-style|. | |
845 | |
846 *international may not necessary if you use X11R6. | |
847 *.inputMethod and *.preeditType are optional if you use X11R6. | |
848 | |
849 For example, when you are using kinput2 as |IM-server|, > | |
850 | |
851 *international: True | |
852 *.inputMethod: kinput2 | |
853 *.preeditType: OverTheSpot | |
854 < | |
855 When using |OverTheSpot|, GUI Vim always connects to the IM Server even in | |
856 Normal mode, so you can input your language with commands like "f" and "r". | |
857 But when using one of the other two methods, GUI Vim connects to the IM Server | |
858 only if it is not in Normal mode. | |
859 | |
860 If your IM Server does not support |OverTheSpot|, and if you want to use your | |
861 language with some Normal mode command like "f" or "r", then you should use a | |
862 localized xterm or an xterm which supports |XIM| | |
863 | |
864 If needed, you can set the XMODIFIERS environment variable: | |
865 | |
866 sh: export XMODIFIERS="@im=input_server_name" | |
867 csh: setenv XMODIFIERS "@im=input_server_name" | |
868 | |
869 For example, when you are using kinput2 as |IM-server| and sh, > | |
870 | |
871 export XMODIFIERS="@im=kinput2" | |
872 < | |
873 | |
874 FULLY CONTROLLED XIM | |
875 | |
876 You can fully control XIM, like with IME of MS-Windows (see |multibyte-ime|). | |
877 This is currently only available for the GTK GUI. | |
878 | |
879 Before using fully controlled XIM, one setting is required. Set the | |
880 'imactivatekey' option to the key that is used for the activation of the input | |
881 method. For example, when you are using kinput2 + canna as IM Server, the | |
882 activation key is probably Shift+Space: > | |
883 | |
884 :set imactivatekey=S-space | |
885 | |
886 See 'imactivatekey' for the format. | |
887 | |
888 ============================================================================== | |
889 8. Input on MS-Windows *mbyte-IME* | |
890 | |
891 (Windows IME support) *multibyte-ime* *IME* | |
892 | |
893 {only works Windows GUI and compiled with the |+multi_byte_ime| feature} | |
894 | |
895 To input multibyte characters on Windows, you have to use Input Method Editor | |
896 (IME). In process of your editing text, you must switch status (on/off) of | |
897 IME many many many times. Because IME with status on is hooking all of your | |
898 key inputs, you cannot input 'j', 'k', or almost all of keys to Vim directly. | |
899 | |
900 This |+multi_byte_ime| feature help this. It reduce times of switch status of | |
901 IME manually. In normal mode, there are almost no need working IME, even | |
902 editing multibyte text. So exiting insert mode with ESC, Vim memorize last | |
903 status of IME and force turn off IME. When re-enter insert mode, Vim revert | |
904 IME status to that memorized automatically. | |
905 | |
906 This works on not only insert-normal mode, but also search-command input and | |
907 replace mode. | |
908 The options 'iminsert', 'imsearch' and 'imcmdline' can be used to chose | |
909 the different input medhods or disable them temporarily. | |
910 | |
911 WHAT IS IME | |
912 IME is a part of East asian version Windows. That helps you to input | |
913 multibyte character. English and other language version Windows does not | |
914 have any IME. (Also there are no need usually.) But there is one that | |
915 called Microsoft Global IME. Global IME is a part of Internet Explorer | |
916 4.0 or above. You can get more information about Global IME, at below | |
917 URL. | |
918 | |
919 WHAT IS GLOBAL IME *global-ime* | |
920 Global IME makes capability to input Chinese, Japanese, and Korean text | |
921 into Vim buffer on any language version of Windows 98, Windows 95, and | |
922 Windows NT 4.0. | |
923 On Windows 2000 and XP it should work as well (without downloading). On | |
924 Windows 2000 Professional, Global IME is built in, and the Input Locales | |
925 can be added through Control Panel/Regional Options/Input Locales. | |
926 Please see below URL for detail of Global IME. You can also find various | |
927 language version of Global IME at same place. | |
928 | |
929 - Global IME detailed information. | |
930 http://www.microsoft.com/windows/ie/features/ime.asp | |
931 | |
932 - Active Input Method Manager (Global IME) | |
933 http://msdn.microsoft.com/workshop/misc/AIMM/aimm.asp | |
934 | |
935 Support Global IME is a experimental feature. | |
936 | |
937 NOTE: For IME to work you must make sure the input locales of your language | |
938 are added to your system. The exact location of this depends on the version | |
939 of Windows you use. For example, on my W2P box: | |
940 1. Control Panel | |
941 2. Regional Options | |
942 3. Input Locales Tab | |
943 4. Add Installed input locales -> Chinese(PRC) | |
944 The default is still English (United Stated) | |
945 | |
946 | |
947 Cursor color when IME or XIM is on *CursorIM* | |
948 There is a little cute feature for IME. Cursor can indicate status of IME | |
949 by changing its color. Usually status of IME was indicated by little icon | |
950 at a corner of desktop (or taskbar). It is not easy to verify status of | |
951 IME. But this feature help this. | |
952 This works in the same way when using XIM. | |
953 | |
954 You can select cursor color when status is on by using highlight group | |
955 CursorIM. For example, add these lines to your _gvimrc: > | |
956 | |
957 if has('multi_byte_ime') | |
958 highlight Cursor guifg=NONE guibg=Green | |
959 highlight CursorIM guifg=NONE guibg=Purple | |
960 endif | |
961 < | |
962 Cursor color with off IME is green. And purple cursor indicates that | |
963 status is on. | |
964 | |
965 ============================================================================== | |
966 9. Input with a keymap *mbyte-keymap* | |
967 | |
968 When the keyboard doesn't produce the characters you want to enter in your | |
969 text, you can use the 'keymap' option. This will translate one or more | |
970 (English) characters to another (non-English) character. This only happens | |
971 when typing text, not when typing Vim commands. This avoids having to switch | |
972 between two keyboard settings. | |
973 | |
974 The value of the 'keymap' option specifies a keymap file to use. The name of | |
975 this file is one of these two: | |
976 | |
977 keymap/{keymap}_{encoding}.vim | |
978 keymap/{keymap}.vim | |
979 | |
980 Here {keymap} is the value of the 'keymap' option and {encoding} of the | |
981 'encoding' option. The file name with the {encoding} included is tried first. | |
982 | |
983 'runtimepath' is used to find these files. To see an overview of all | |
984 available keymap files, use this: > | |
985 :echo globpath(&rtp, "keymap/*.vim") | |
986 | |
987 In Insert and Command-line mode you can use CTRL-^ to toggle between using the | |
988 keyboard map or not. |i_CTRL-^| |c_CTRL-^| | |
989 This flag is remembered for Insert mode with the 'iminsert' option. When | |
990 leaving and entering Insert mode the previous value is used. The same value | |
991 is also used for commands that take a single character argument, like |f| and | |
992 |r|. | |
993 For Command-line mode the flag is NOT remembered. You are expected to type an | |
994 Ex command first, which is ASCII. | |
995 For typing search patterns the 'imsearch' option is used. It can be set to | |
996 use the same value as for 'iminsert'. | |
997 | |
998 It is possible to give the GUI cursor another color when the language mappings | |
999 are being used. This is disabled by default, to avoid that the cursor becomes | |
1000 invisible when you use a non-standard background color. Here is an example to | |
1001 use a brightly colored cursor: > | |
1002 :highlight Cursor guifg=NONE guibg=Green | |
1003 :highlight lCursor guifg=NONE guibg=Cyan | |
1004 < | |
1005 *keymap-file-format* *:loadk* *:loadkeymap* *E105* | |
1006 The keymap file looks something like this: > | |
1007 | |
1008 " Maintainer: name <email@address> | |
1009 " Last Changed: 2001 Jan 1 | |
1010 | |
1011 let b:keymap_name = "short" | |
1012 | |
1013 loadkeymap | |
1014 a A | |
1015 b B comment | |
1016 | |
1017 The lines starting with a " are comments and will be ignored. Blank lines are | |
1018 also ignored. The lines with the mappings may have a comment after the useful | |
1019 text. | |
1020 | |
1021 The "b:keymap_name" can be set to a short name, which will be shown in the | |
1022 status line. The idea is that this takes less room than the value of | |
1023 'keymap', which might be long to distinguish between different languages, | |
1024 keyboards and encodings. | |
1025 | |
1026 The actual mappings are in the lines below "loadkeymap". In the example "a" | |
1027 is mapped to "A" and "b" to "B". Thus the first item is mapped to the second | |
1028 item. This is done for each line, until the end of the file. | |
1029 These items are exactly the same as what can be used in a |:lnoremap| command, | |
1030 using "<buffer>" to make the mappings local to the buffer.. | |
1031 You can check the result with this command: > | |
1032 :lmap | |
1033 The two items must be separated by white space. You cannot include white | |
1034 space inside an item, use the special names "<Tab>" and "<Space>" instead. | |
1035 The length of the two items together must not exceed 200 bytes. | |
1036 | |
1037 It's possible to have more than one character in the first column. This works | |
1038 like a dead key. Example: > | |
1039 'a á | |
1040 Since Vim doesn't know if the next character after a quote is really an "a", | |
1041 it will wait for the next character. To be able to insert a single quote, | |
1042 also add this line: > | |
1043 '' ' | |
1044 Since the mapping is defined with |:lnoremap| the resulting quote will not be | |
1045 used for the start of another character. | |
1046 | |
1047 Although it's possible to have more than one character in the second column, | |
1048 this is unusual. But you can use various ways to specify the character: > | |
1049 A a literal character | |
1050 A <char-97> decimal value | |
1051 A <char-0x61> hexadecimal value | |
1052 A <char-0141> octal value | |
1053 x <Space> special key name | |
1054 | |
1055 The characters are assumed to be encoded for the current value of 'encoding'. | |
1056 It's possible to use ":scriptencoding" when all characters are given | |
1057 literally. That doesn't work when using the <char-> construct, because the | |
1058 conversion is done on the keymap file, not on the resulting character. | |
1059 | |
1060 The lines after "loadkeymap" are interpreted with 'cpoptions' set to "C". | |
1061 This means that continuation lines are not used and a backslash has a special | |
1062 meaning in the mappings. Examples: > | |
1063 | |
1064 " a comment line | |
1065 \" x maps " to x | |
1066 \\ y maps \ to y | |
1067 | |
1068 If you write a keymap file that will be useful for others, consider submitting | |
1069 it to the Vim maintainer for inclusion in the distribution: | |
1070 <maintainer@vim.org> | |
1071 | |
1072 | |
1073 HEBREW KEYMAP *keymap-hebrew* | |
1074 | |
1075 This file explains what characters are available in UTF-8 and CP1255 encodings, | |
1076 and what the keymaps are to get those characters: | |
1077 | |
1078 glyph encoding keymap ~ | |
1079 Char utf-8 cp1255 hebrew hebrewp name ~ | |
1080 א 0x5d0 0xe0 t a 'alef | |
1081 ב 0x5d1 0xe1 c b bet | |
1082 ג 0x5d2 0xe2 d g gimel | |
1083 ד 0x5d3 0xe3 s d dalet | |
1084 ה 0x5d4 0xe4 v h he | |
1085 ו 0x5d5 0xe5 u v vav | |
1086 ז 0x5d6 0xe6 z z zayin | |
1087 ח 0x5d7 0xe7 j j het | |
1088 ט 0x5d8 0xe8 y T tet | |
1089 י 0x5d9 0xe9 h y yod | |
1090 ך 0x5da 0xea l K kaf sofit | |
1091 כ 0x5db 0xeb f k kaf | |
1092 ל 0x5dc 0xec k l lamed | |
1093 ם 0x5dd 0xed o M mem sofit | |
1094 מ 0x5de 0xee n m mem | |
1095 ן 0x5df 0xef i N nun sofit | |
1096 נ 0x5e0 0xf0 b n nun | |
1097 ס 0x5e1 0xf1 x s samech | |
1098 ע 0x5e2 0xf2 g u `ayin | |
1099 ף 0x5e3 0xf3 ; P pe sofit | |
1100 פ 0x5e4 0xf4 p p pe | |
1101 ץ 0x5e5 0xf5 . X tsadi sofit | |
1102 צ 0x5e6 0xf6 m x tsadi | |
1103 ק 0x5e7 0xf7 e q qof | |
1104 ר 0x5e8 0xf8 r r resh | |
1105 ש 0x5e9 0xf9 a w shin | |
1106 ת 0x5ea 0xfa , t tav | |
1107 | |
1108 Vowel marks and special punctuation: | |
1109 הְ 0x5b0 0xc0 A: A: sheva | |
1110 הֱ 0x5b1 0xc1 HE HE hataf segol | |
1111 הֲ 0x5b2 0xc2 HA HA hataf patah | |
1112 הֳ 0x5b3 0xc3 HO HO hataf qamats | |
1113 הִ 0x5b4 0xc4 I I hiriq | |
1114 הֵ 0x5b5 0xc5 AY AY tsere | |
1115 הֶ 0x5b6 0xc6 E E segol | |
1116 הַ 0x5b7 0xc7 AA AA patah | |
1117 הָ 0x5b8 0xc8 AO AO qamats | |
1118 הֹ 0x5b9 0xc9 O O holam | |
1119 הֻ 0x5bb 0xcb U U qubuts | |
1120 כּ 0x5bc 0xcc D D dagesh | |
1121 הֽ 0x5bd 0xcd ]T ]T meteg | |
1122 ה־ 0x5be 0xce ]Q ]Q maqaf | |
1123 בֿ 0x5bf 0xcf ]R ]R rafe | |
1124 ב׀ 0x5c0 0xd0 ]p ]p paseq | |
1125 שׁ 0x5c1 0xd1 SR SR shin-dot | |
1126 שׂ 0x5c2 0xd2 SL SL sin-dot | |
1127 ׃ 0x5c3 0xd3 ]P ]P sof-pasuq | |
1128 װ 0x5f0 0xd4 VV VV double-vav | |
1129 ױ 0x5f1 0xd5 VY VY vav-yod | |
1130 ײ 0x5f2 0xd6 YY YY yod-yod | |
1131 | |
1132 The following are only available in utf-8 | |
1133 | |
1134 Cantillation marks: | |
1135 glyph | |
1136 Char utf-8 hebrew name | |
1137 ב֑ 0x591 C: etnahta | |
1138 ב֒ 0x592 Cs segol | |
1139 ב֓ 0x593 CS shalshelet | |
1140 ב֔ 0x594 Cz zaqef qatan | |
1141 ב֕ 0x595 CZ zaqef gadol | |
1142 ב֖ 0x596 Ct tipeha | |
1143 ב֗ 0x597 Cr revia | |
1144 ב֘ 0x598 Cq zarqa | |
1145 ב֙ 0x599 Cp pashta | |
1146 ב֚ 0x59a C! yetiv | |
1147 ב֛ 0x59b Cv tevir | |
1148 ב֜ 0x59c Cg geresh | |
1149 ב֝ 0x59d C* geresh qadim | |
1150 ב֞ 0x59e CG gershayim | |
1151 ב֟ 0x59f CP qarnei-parah | |
1152 ב֪ 0x5aa Cy yerach-ben-yomo | |
1153 ב֫ 0x5ab Co ole | |
1154 ב֬ 0x5ac Ci iluy | |
1155 ב֭ 0x5ad Cd dehi | |
1156 ב֮ 0x5ae Cn zinor | |
1157 ב֯ 0x5af CC masora circle | |
1158 | |
1159 Combining forms: | |
1160 ﬠ 0xfb20 X` Alternative `ayin | |
1161 ﬡ 0xfb21 X' Alternative 'alef | |
1162 ﬢ 0xfb22 X-d Alternative dalet | |
1163 ﬣ 0xfb23 X-h Alternative he | |
1164 ﬤ 0xfb24 X-k Alternative kaf | |
1165 ﬥ 0xfb25 X-l Alternative lamed | |
1166 ﬦ 0xfb26 X-m Alternative mem-sofit | |
1167 ﬧ 0xfb27 X-r Alternative resh | |
1168 ﬨ 0xfb28 X-t Alternative tav | |
1169 ﬩ 0xfb29 X-+ Alternative plus | |
1170 שׁ 0xfb2a XW shin+shin-dot | |
1171 שׂ 0xfb2b Xw shin+sin-dot | |
1172 שּׁ 0xfb2c X..W shin+shin-dot+dagesh | |
1173 שּׂ 0xfb2d X..w shin+sin-dot+dagesh | |
1174 אַ 0xfb2e XA alef+patah | |
1175 אָ 0xfb2f XO alef+qamats | |
1176 אּ 0xfb30 XI alef+hiriq (mapiq) | |
1177 בּ 0xfb31 X.b bet+dagesh | |
1178 גּ 0xfb32 X.g gimel+dagesh | |
1179 דּ 0xfb33 X.d dalet+dagesh | |
1180 הּ 0xfb34 X.h he+dagesh | |
1181 וּ 0xfb35 Xu vav+dagesh | |
1182 זּ 0xfb36 X.z zayin+dagesh | |
1183 טּ 0xfb38 X.T tet+dagesh | |
1184 יּ 0xfb39 X.y yud+dagesh | |
1185 ךּ 0xfb3a X.K kaf sofit+dagesh | |
1186 כּ 0xfb3b X.k kaf+dagesh | |
1187 לּ 0xfb3c X.l lamed+dagesh | |
1188 מּ 0xfb3e X.m mem+dagesh | |
1189 נּ 0xfb40 X.n nun+dagesh | |
1190 סּ 0xfb41 X.s samech+dagesh | |
1191 ףּ 0xfb43 X.P pe sofit+dagesh | |
1192 פּ 0xfb44 X.p pe+dagesh | |
1193 צּ 0xfb46 X.x tsadi+dagesh | |
1194 קּ 0xfb47 X.q qof+dagesh | |
1195 רּ 0xfb48 X.r resh+dagesh | |
1196 שּ 0xfb49 X.w shin+dagesh | |
1197 תּ 0xfb4a X.t tav+dagesh | |
1198 וֹ 0xfb4b Xo vav+holam | |
1199 בֿ 0xfb4c XRb bet+rafe | |
1200 כֿ 0xfb4d XRk kaf+rafe | |
1201 פֿ 0xfb4e XRp pe+rafe | |
1202 ﭏ 0xfb4f Xal alef-lamed | |
1203 | |
1204 ============================================================================== | |
1205 10. Using UTF-8 *mbyte-utf8* *UTF-8* *utf-8* *utf8* | |
1206 *Unicode* *unicode* | |
1207 The Unicode character set was designed to include all characters from other | |
1208 character sets. Therefore it is possible to write text in any language using | |
1209 Unicode (with a few rarely used languages excluded). And it's mostly possible | |
1210 to mix these languages in one file, which is impossible with other encodings. | |
1211 | |
1212 Unicode can be encoded in several ways. The two most popular ones are UCS-2, | |
1213 which uses 16-bit words and UTF-8, which uses one or more bytes for each | |
1214 character. Vim can support all of these encodings, but always uses UTF-8 | |
1215 internally. | |
1216 | |
1217 Vim has comprehensive UTF-8 support. It appears to work in: | |
1218 - xterm with utf-8 support enabled | |
1219 - Athena, Motif and GTK GUI | |
1220 - MS-Windows GUI | |
1221 | |
1222 Double-width characters are supported. This works best with 'guifontwide' or | |
1223 'guifontset'. When using only 'guifont' the wide characters are drawn in the | |
1224 normal width and a space to fill the gap. Note that the 'guifontset' option | |
1225 is no longer relevant in the GTK+ 2 GUI. | |
1226 | |
1227 Up to two combining characters can be used. The combining character is drawn | |
1228 on top of the preceding character. When editing text a composing character is | |
1229 mostly considered part of the preceding character. For example "x" will | |
1230 delete a character and its following composing characters by default. If the | |
1231 'delcombine' option is on, then pressing 'x' will delete the combining | |
1232 characters, one at a time, then the base character. But when inserting, you | |
1233 type the first character and the following composing characters separately, | |
1234 after which they will be joined. The "r" command will not allow you to type a | |
1235 combining character, because it doesn't know one is coming. Use "R" instead. | |
1236 | |
1237 Bytes which are not part of a valid UTF-8 byte sequence are handled like a | |
1238 single character and displayed as <xx>, where "xx" is the hex value of the | |
1239 byte. | |
1240 | |
1241 Overlong sequences are not handled specially and displayed like a valid | |
1242 character. However, search patterns may not match on an overlong sequence. | |
1243 (an overlong sequence is where more bytes are used than required for the | |
1244 character.) An exception is NUL (zero) which is displayed as "<00>". | |
1245 | |
1246 In the file and buffer the full range of Unicode characters can be used (31 | |
1247 bits). However, displaying only works for 16 bit characters, and only for the | |
1248 characters present in the selected font. | |
1249 | |
1250 Useful commands: | |
1251 - "ga" shows the decimal, hexadecimal and octal value of the character under | |
1252 the cursor. If there are composing characters these are shown too. (if the | |
1253 message is truncated, use ":messages"). | |
1254 - "g8" shows the bytes used in a UTF-8 character, also the composing | |
1255 characters, as hex numbers. | |
1256 - ":set encoding=utf-8 fileencodings=" forces using UTF-8 for all files. The | |
1257 default is to use the current locale for 'encoding' and set 'fileencodings' | |
1258 to automatically the encoding of a file. | |
1259 | |
1260 | |
1261 STARTING VIM | |
1262 | |
1263 If your current locale is in an utf-8 encoding, Vim will automatically start | |
1264 in utf-8 mode. | |
1265 | |
1266 If you are using another locale: > | |
1267 | |
1268 set encoding=utf-8 | |
1269 | |
1270 You might also want to select the font used for the menus. Unfortunately this | |
1271 doesn't always work. See the system specific remarks below, and 'langmenu'. | |
1272 | |
1273 | |
1274 USING UTF-8 IN X-Windows *utf-8-in-xwindows* | |
1275 | |
1276 Note: This section does not apply to the GTK+ 2 GUI. | |
1277 | |
1278 You need to specify a font to be used. For double-wide characters another | |
1279 font is required, which is exactly twice as wide. There are three ways to do | |
1280 this: | |
1281 | |
1282 1. Set 'guifont' and let Vim find a matching 'guifontwide' | |
1283 2. Set 'guifont' and 'guifontwide' | |
1284 3. Set 'guifontset' | |
1285 | |
1286 See the documentation for each option for details. Example: > | |
1287 | |
1288 :set guifont=-misc-fixed-medium-r-normal--15-140-75-75-c-90-iso10646-1 | |
1289 | |
1290 You might also want to set the font used for the menus. This only works for | |
1291 Motif. Use the ":hi Menu font={fontname}" command for this. |:highlight| | |
1292 | |
1293 | |
1294 TYPING UTF-8 *utf-8-typing* | |
1295 | |
1296 If you are using X-Windows, you should find an input method that supports | |
1297 utf-8. | |
1298 | |
1299 If your system does not provide support for typing utf-8, you can use the | |
1300 'keymap' feature. This allows writing a keymap file, which defines a utf-8 | |
1301 character as a sequence of ASCII characters. See |mbyte-keymap|. | |
1302 | |
1303 Another method is to set the current locale to the language you want to use | |
1304 and for which you have a XIM available. Then set 'termencoding' to that | |
1305 language and Vim will convert the typed characters to 'encoding' for you. | |
1306 | |
1307 If everything else fails, you can type any character as four hex bytes: > | |
1308 | |
1309 CTRL-V u 1234 | |
1310 | |
1311 "1234" is interpreted as a hex number. You must type four characters, prepend | |
1312 a zero if necessary. | |
1313 | |
1314 | |
1315 COMMAND ARGUMENTS *utf-8-char-arg* | |
1316 | |
1317 Commands like |f|, |F|, |t| and |r| take an argument of one character. For | |
1318 UTF-8 this argument may include one or two composing characters. These needs | |
1319 to be produced together with the base character, Vim doesn't wait for the next | |
1320 character to be typed to find out if it is a composing character or not. | |
1321 Using 'keymap' or |:lmap| is a nice way to type these characters. | |
1322 | |
1323 The commands that search for a character in a line handle composing characters | |
1324 as follows. When searching for a character without a composing character, | |
1325 this will find matches in the text with or without composing characters. When | |
1326 searching for a character with a composing character, this will only find | |
1327 matches with that composing character. It was implemented this way, because | |
1328 not everybody is able to type a composing character. | |
1329 | |
1330 | |
1331 ============================================================================== | |
1332 11. Overview of options *mbyte-options* | |
1333 | |
1334 These options are relevant for editing multi-byte files. Check the help in | |
1335 options.txt for detailed information. | |
1336 | |
1337 'encoding' Encoding used for the keyboard and display. It is also the | |
1338 default encoding for files. | |
1339 | |
1340 'fileencoding' Encoding of a file. When it's different from 'encoding' | |
1341 conversion is done when reading or writing the file. | |
1342 | |
1343 'fileencodings' List of possible encodings of a file. When opening a file | |
1344 these will be tried and the first one that doesn't cause an | |
1345 error is used for 'fileencoding'. | |
1346 | |
1347 'charconvert' Expression used to convert files from one encoding to another. | |
1348 | |
1349 'formatoptions' The 'm' flag can be included to have formatting break a line | |
1350 at a multibyte character of 256 or higher. Thus is useful for | |
1351 languages where a sequence of characters can be broken | |
1352 anywhere. | |
1353 | |
1354 'guifontset' The list of font names used for a multi-byte encoding. When | |
1355 this option is not empty, it replaces 'guifont'. | |
1356 | |
1357 'keymap' Specify the name of a keyboard mapping. | |
1358 | |
1359 ============================================================================== | |
1360 | |
1361 Contributions specifically for the multi-byte features by: | |
1362 Chi-Deok Hwang <hwang@mizi.co.kr> | |
1363 Nam SungHyun <namsh@lge.com> | |
1364 K.Nagano <nagano@atese.advantest.co.jp> | |
1365 Taro Muraoka <koron@tka.att.ne.jp> | |
1366 Yasuhiro Matsumoto <mattn@mail.goo.ne.jp> | |
1367 | |
1368 vim:tw=78:ts=8:ft=help:norl: |