Mercurial > vim
annotate runtime/doc/mbyte.txt @ 16993:0f93d3f72217
Added tag v8.1.1496 for changeset 0f2663c087cd9a8534f63d1aa1a78f4975305a53
author | Bram Moolenaar <Bram@vim.org> |
---|---|
date | Sat, 08 Jun 2019 17:30:06 +0200 |
parents | 0e473e9e70c2 |
children | b9bc47742df6 |
rev | line source |
---|---|
16553
0e473e9e70c2
patch 8.1.1280: remarks about functionality not in Vi clutters the help
Bram Moolenaar <Bram@vim.org>
parents:
16439
diff
changeset
|
1 *mbyte.txt* For Vim version 8.1. Last change: 2019 Apr 28 |
7 | 2 |
3 | |
4 VIM REFERENCE MANUAL by Bram Moolenaar et al. | |
5 | |
6 | |
7 Multi-byte support *multibyte* *multi-byte* | |
8 *Chinese* *Japanese* *Korean* | |
9 This is about editing text in languages which have many characters that can | |
10 not be represented using one byte (one octet). Examples are Chinese, Japanese | |
11 and Korean. Unicode is also covered here. | |
12 | |
13 For an introduction to the most common features, see |usr_45.txt| in the user | |
14 manual. | |
15 For changing the language of messages and menus see |mlang.txt|. | |
16 | |
17 1. Getting started |mbyte-first| | |
18 2. Locale |mbyte-locale| | |
19 3. Encoding |mbyte-encoding| | |
20 4. Using a terminal |mbyte-terminal| | |
21 5. Fonts on X11 |mbyte-fonts-X11| | |
22 6. Fonts on MS-Windows |mbyte-fonts-MSwin| | |
23 7. Input on X11 |mbyte-XIM| | |
24 8. Input on MS-Windows |mbyte-IME| | |
25 9. Input with a keymap |mbyte-keymap| | |
12920
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
26 10. Input with imactivatefunc() |mbyte-func| |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
27 11. Using UTF-8 |mbyte-utf8| |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
28 12. Overview of options |mbyte-options| |
7 | 29 |
30 NOTE: This file contains UTF-8 characters. These may show up as strange | |
31 characters or boxes when using another encoding. | |
32 | |
33 ============================================================================== | |
34 1. Getting started *mbyte-first* | |
35 | |
36 This is a summary of the multibyte features in Vim. If you are lucky it works | |
37 as described and you can start using Vim without much trouble. If something | |
38 doesn't work you will have to read the rest. Don't be surprised if it takes | |
39 quite a bit of work and experimenting to make Vim use all the multi-byte | |
40 features. Unfortunately, every system has its own way to deal with multibyte | |
41 languages and it is quite complicated. | |
42 | |
43 | |
44 LOCALE | |
45 | |
46 First of all, you must make sure your current locale is set correctly. If | |
47 your system has been installed to use the language, it probably works right | |
48 away. If not, you can often make it work by setting the $LANG environment | |
49 variable in your shell: > | |
50 | |
51 setenv LANG ja_JP.EUC | |
52 | |
53 Unfortunately, the name of the locale depends on your system. Japanese might | |
54 also be called "ja_JP.EUCjp" or just "ja". To see what is currently used: > | |
55 | |
56 :language | |
57 | |
58 To change the locale inside Vim use: > | |
59 | |
60 :language ja_JP.EUC | |
61 | |
62 Vim will give an error message if this doesn't work. This is a good way to | |
63 experiment and find the locale name you want to use. But it's always better | |
64 to set the locale in the shell, so that it is used right from the start. | |
65 | |
66 See |mbyte-locale| for details. | |
67 | |
68 | |
69 ENCODING | |
70 | |
71 If your locale works properly, Vim will try to set the 'encoding' option | |
72 accordingly. If this doesn't work you can overrule its value: > | |
73 | |
74 :set encoding=utf-8 | |
75 | |
76 See |encoding-values| for a list of acceptable values. | |
77 | |
78 The result is that all the text that is used inside Vim will be in this | |
79 encoding. Not only the text in the buffers, but also in registers, variables, | |
80 etc. This also means that changing the value of 'encoding' makes the existing | |
81 text invalid! The text doesn't change, but it will be displayed wrong. | |
82 | |
83 You can edit files in another encoding than what 'encoding' is set to. Vim | |
84 will convert the file when you read it and convert it back when you write it. | |
85 See 'fileencoding', 'fileencodings' and |++enc|. | |
86 | |
87 | |
88 DISPLAY AND FONTS | |
89 | |
90 If you are working in a terminal (emulator) you must make sure it accepts the | |
91 same encoding as which Vim is working with. If this is not the case, you can | |
92 use the 'termencoding' option to make Vim convert text automatically. | |
93 | |
94 For the GUI you must select fonts that work with the current 'encoding'. This | |
95 is the difficult part. It depends on the system you are using, the locale and | |
96 a few other things. See the chapters on fonts: |mbyte-fonts-X11| for | |
97 X-Windows and |mbyte-fonts-MSwin| for MS-Windows. | |
98 | |
99 For GTK+ 2, you can skip most of this section. The option 'guifontset' does | |
100 no longer exist. You only need to set 'guifont' and everything should "just | |
101 work". If your system comes with Xft2 and fontconfig and the current font | |
102 does not contain a certain glyph, a different font will be used automatically | |
103 if available. The 'guifontwide' option is still supported but usually you do | |
104 not need to set it. It is only necessary if the automatic font selection does | |
105 not suit your needs. | |
106 | |
107 For X11 you can set the 'guifontset' option to a list of fonts that together | |
108 cover the characters that are used. Example for Korean: > | |
109 | |
110 :set guifontset=k12,r12 | |
111 | |
112 Alternatively, you can set 'guifont' and 'guifontwide'. 'guifont' is used for | |
113 the single-width characters, 'guifontwide' for the double-width characters. | |
114 Thus the 'guifontwide' font must be exactly twice as wide as 'guifont'. | |
115 Example for UTF-8: > | |
116 | |
117 :set guifont=-misc-fixed-medium-r-normal-*-18-120-100-100-c-90-iso10646-1 | |
118 :set guifontwide=-misc-fixed-medium-r-normal-*-18-120-100-100-c-180-iso10646-1 | |
119 | |
120 You can also set 'guifont' alone, Vim will try to find a matching | |
121 'guifontwide' for you. | |
122 | |
123 | |
124 INPUT | |
125 | |
126 There are several ways to enter multi-byte characters: | |
127 - For X11 XIM can be used. See |XIM|. | |
128 - For MS-Windows IME can be used. See |IME|. | |
129 - For all systems keymaps can be used. See |mbyte-keymap|. | |
130 | |
131 The options 'iminsert', 'imsearch' and 'imcmdline' can be used to chose | |
9 | 132 the different input methods or disable them temporarily. |
7 | 133 |
134 ============================================================================== | |
135 2. Locale *mbyte-locale* | |
136 | |
137 The easiest setup is when your whole system uses the locale you want to work | |
138 in. But it's also possible to set the locale for one shell you are working | |
139 in, or just use a certain locale inside Vim. | |
140 | |
141 | |
142 WHAT IS A LOCALE? *locale* | |
143 | |
144 There are many of languages in the world. And there are different cultures | |
145 and environments at least as much as the number of languages. A linguistic | |
146 environment corresponding to an area is called "locale". This includes | |
147 information about the used language, the charset, collating order for sorting, | |
148 date format, currency format and so on. For Vim only the language and charset | |
149 really matter. | |
150 | |
151 You can only use a locale if your system has support for it. Some systems | |
152 have only a few locales, especially in the USA. The language which you want | |
153 to use may not be on your system. In that case you might be able to install | |
154 it as an extra package. Check your system documentation for how to do that. | |
155 | |
156 The location in which the locales are installed varies from system to system. | |
157 For example, "/usr/share/locale" or "/usr/lib/locale". See your system's | |
158 setlocale() man page. | |
159 | |
160 Looking in these directories will show you the exact name of each locale. | |
161 Mostly upper/lowercase matters, thus "ja_JP.EUC" and "ja_jp.euc" are | |
162 different. Some systems have a locale.alias file, which allows translation | |
163 from a short name like "nl" to the full name "nl_NL.ISO_8859-1". | |
164 | |
165 Note that X-windows has its own locale stuff. And unfortunately uses locale | |
166 names different from what is used elsewhere. This is confusing! For Vim it | |
167 matters what the setlocale() function uses, which is generally NOT the | |
168 X-windows stuff. You might have to do some experiments to find out what | |
169 really works. | |
170 | |
171 *locale-name* | |
172 The (simplified) format of |locale| name is: | |
173 | |
174 language | |
175 or language_territory | |
176 or language_territory.codeset | |
177 | |
178 Territory means the country (or part of it), codeset means the |charset|. For | |
179 example, the locale name "ja_JP.eucJP" means: | |
180 ja the language is Japanese | |
181 JP the country is Japan | |
182 eucJP the codeset is EUC-JP | |
183 But it also could be "ja", "ja_JP.EUC", "ja_JP.ujis", etc. And unfortunately, | |
184 the locale name for a specific language, territory and codeset is not unified | |
185 and depends on your system. | |
186 | |
187 Examples of locale name: | |
188 charset language locale name ~ | |
189 GB2312 Chinese (simplified) zh_CN.EUC, zh_CN.GB2312 | |
190 Big5 Chinese (traditional) zh_TW.BIG5, zh_TW.Big5 | |
191 CNS-11643 Chinese (traditional) zh_TW | |
192 EUC-JP Japanese ja, ja_JP.EUC, ja_JP.ujis, ja_JP.eucJP | |
193 Shift_JIS Japanese ja_JP.SJIS, ja_JP.Shift_JIS | |
194 EUC-KR Korean ko, ko_KR.EUC | |
195 | |
196 | |
197 USING A LOCALE | |
198 | |
199 To start using a locale for the whole system, see the documentation of your | |
200 system. Mostly you need to set it in a configuration file in "/etc". | |
201 | |
202 To use a locale in a shell, set the $LANG environment value. When you want to | |
203 use Korean and the |locale| name is "ko", do this: | |
204 | |
205 sh: export LANG=ko | |
206 csh: setenv LANG ko | |
207 | |
208 You can put this in your ~/.profile or ~/.cshrc file to always use it. | |
209 | |
210 To use a locale in Vim only, use the |:language| command: > | |
211 | |
212 :language ko | |
213 | |
214 Put this in your ~/.vimrc file to use it always. | |
215 | |
216 Or specify $LANG when starting Vim: | |
217 | |
218 sh: LANG=ko vim {vim-arguments} | |
219 csh: env LANG=ko vim {vim-arguments} | |
220 | |
221 You could make a small shell script for this. | |
222 | |
223 ============================================================================== | |
224 3. Encoding *mbyte-encoding* | |
225 | |
1621 | 226 Vim uses the 'encoding' option to specify how characters are identified and |
7 | 227 encoded when they are used inside Vim. This applies to all the places where |
228 text is used, including buffers (files loaded into memory), registers and | |
229 variables. | |
230 | |
231 *charset* *codeset* | |
232 Charset is another name for encoding. There are subtle differences, but these | |
233 don't matter when using Vim. "codeset" is another similar name. | |
234 | |
235 Each character is encoded as one or more bytes. When all characters are | |
236 encoded with one byte, we call this a single-byte encoding. The most often | |
237 used one is called "latin1". This limits the number of characters to 256. | |
238 Some of these are control characters, thus even fewer can be used for text. | |
239 | |
240 When some characters use two or more bytes, we call this a multi-byte | |
241 encoding. This allows using much more than 256 characters, which is required | |
242 for most East Asian languages. | |
243 | |
244 Most multi-byte encodings use one byte for the first 127 characters. These | |
245 are equal to ASCII, which makes it easy to exchange plain-ASCII text, no | |
246 matter what language is used. Thus you might see the right text even when the | |
247 encoding was set wrong. | |
248 | |
249 *encoding-names* | |
250 Vim can use many different character encodings. There are three major groups: | |
251 | |
252 1 8bit Single-byte encodings, 256 different characters. Mostly used | |
253 in USA and Europe. Example: ISO-8859-1 (Latin1). All | |
254 characters occupy one screen cell only. | |
255 | |
256 2 2byte Double-byte encodings, over 10000 different characters. | |
257 Mostly used in Asian countries. Example: euc-kr (Korean) | |
258 The number of screen cells is equal to the number of bytes | |
259 (except for euc-jp when the first byte is 0x8e). | |
260 | |
261 u Unicode Universal encoding, can replace all others. ISO 10646. | |
262 Millions of different characters. Example: UTF-8. The | |
263 relation between bytes and screen cells is complex. | |
264 | |
265 Other encodings cannot be used by Vim internally. But files in other | |
266 encodings can be edited by using conversion, see 'fileencoding'. | |
267 Note that all encodings must use ASCII for the characters up to 128 (except | |
268 when compiled for EBCDIC). | |
269 | |
270 Supported 'encoding' values are: *encoding-values* | |
2698
b6471224d2af
Updated runtime files and translations.
Bram Moolenaar <bram@vim.org>
parents:
2577
diff
changeset
|
271 1 latin1 8-bit characters (ISO 8859-1, also used for cp1252) |
7 | 272 1 iso-8859-n ISO_8859 variant (n = 2 to 15) |
273 1 koi8-r Russian | |
274 1 koi8-u Ukrainian | |
275 1 macroman MacRoman (Macintosh encoding) | |
276 1 8bit-{name} any 8-bit encoding (Vim specific name) | |
407 | 277 1 cp437 similar to iso-8859-1 |
278 1 cp737 similar to iso-8859-7 | |
279 1 cp775 Baltic | |
280 1 cp850 similar to iso-8859-4 | |
281 1 cp852 similar to iso-8859-1 | |
282 1 cp855 similar to iso-8859-2 | |
283 1 cp857 similar to iso-8859-5 | |
284 1 cp860 similar to iso-8859-9 | |
285 1 cp861 similar to iso-8859-1 | |
286 1 cp862 similar to iso-8859-1 | |
287 1 cp863 similar to iso-8859-8 | |
288 1 cp865 similar to iso-8859-1 | |
289 1 cp866 similar to iso-8859-5 | |
290 1 cp869 similar to iso-8859-7 | |
291 1 cp874 Thai | |
292 1 cp1250 Czech, Polish, etc. | |
293 1 cp1251 Cyrillic | |
294 1 cp1253 Greek | |
295 1 cp1254 Turkish | |
296 1 cp1255 Hebrew | |
297 1 cp1256 Arabic | |
298 1 cp1257 Baltic | |
299 1 cp1258 Vietnamese | |
7 | 300 1 cp{number} MS-Windows: any installed single-byte codepage |
301 2 cp932 Japanese (Windows only) | |
302 2 euc-jp Japanese (Unix only) | |
303 2 sjis Japanese (Unix only) | |
304 2 cp949 Korean (Unix and Windows) | |
305 2 euc-kr Korean (Unix only) | |
306 2 cp936 simplified Chinese (Windows only) | |
307 2 euc-cn simplified Chinese (Unix only) | |
308 2 cp950 traditional Chinese (on Unix alias for big5) | |
309 2 big5 traditional Chinese (on Windows alias for cp950) | |
310 2 euc-tw traditional Chinese (Unix only) | |
311 2 2byte-{name} Unix: any double-byte encoding (Vim specific name) | |
312 2 cp{number} MS-Windows: any installed double-byte codepage | |
313 u utf-8 32 bit UTF-8 encoded Unicode (ISO/IEC 10646-1) | |
314 u ucs-2 16 bit UCS-2 encoded Unicode (ISO/IEC 10646-1) | |
315 u ucs-2le like ucs-2, little endian | |
316 u utf-16 ucs-2 extended with double-words for more characters | |
317 u utf-16le like utf-16, little endian | |
318 u ucs-4 32 bit UCS-4 encoded Unicode (ISO/IEC 10646-1) | |
319 u ucs-4le like ucs-4, little endian | |
320 | |
321 The {name} can be any encoding name that your system supports. It is passed | |
322 to iconv() to convert between the encoding of the file and the current locale. | |
323 For MS-Windows "cp{number}" means using codepage {number}. | |
324 Examples: > | |
325 :set encoding=8bit-cp1252 | |
326 :set encoding=2byte-cp932 | |
2698
b6471224d2af
Updated runtime files and translations.
Bram Moolenaar <bram@vim.org>
parents:
2577
diff
changeset
|
327 |
b6471224d2af
Updated runtime files and translations.
Bram Moolenaar <bram@vim.org>
parents:
2577
diff
changeset
|
328 The MS-Windows codepage 1252 is very similar to latin1. For practical reasons |
b6471224d2af
Updated runtime files and translations.
Bram Moolenaar <bram@vim.org>
parents:
2577
diff
changeset
|
329 the same encoding is used and it's called latin1. 'isprint' can be used to |
b6471224d2af
Updated runtime files and translations.
Bram Moolenaar <bram@vim.org>
parents:
2577
diff
changeset
|
330 display the characters 0x80 - 0xA0 or not. |
b6471224d2af
Updated runtime files and translations.
Bram Moolenaar <bram@vim.org>
parents:
2577
diff
changeset
|
331 |
7 | 332 Several aliases can be used, they are translated to one of the names above. |
333 An incomplete list: | |
334 | |
335 1 ansi same as latin1 (obsolete, for backward compatibility) | |
336 2 japan Japanese: on Unix "euc-jp", on MS-Windows cp932 | |
337 2 korea Korean: on Unix "euc-kr", on MS-Windows cp949 | |
338 2 prc simplified Chinese: on Unix "euc-cn", on MS-Windows cp936 | |
339 2 chinese same as "prc" | |
340 2 taiwan traditional Chinese: on Unix "euc-tw", on MS-Windows cp950 | |
341 u utf8 same as utf-8 | |
342 u unicode same as ucs-2 | |
343 u ucs2be same as ucs-2 (big endian) | |
344 u ucs-2be same as ucs-2 (big endian) | |
345 u ucs-4be same as ucs-4 (big endian) | |
1621 | 346 u utf-32 same as ucs-4 |
347 u utf-32le same as ucs-4le | |
39 | 348 default stands for the default value of 'encoding', depends on the |
856 | 349 environment |
7 | 350 |
351 For the UCS codes the byte order matters. This is tricky, use UTF-8 whenever | |
352 you can. The default is to use big-endian (most significant byte comes | |
353 first): | |
354 name bytes char ~ | |
355 ucs-2 11 22 1122 | |
356 ucs-2le 22 11 1122 | |
357 ucs-4 11 22 33 44 11223344 | |
358 ucs-4le 44 33 22 11 11223344 | |
359 | |
360 On MS-Windows systems you often want to use "ucs-2le", because it uses little | |
361 endian UCS-2. | |
362 | |
363 There are a few encodings which are similar, but not exactly the same. Vim | |
364 treats them as if they were different encodings, so that conversion will be | |
365 done when needed. You might want to use the similar name to avoid conversion | |
366 or when conversion is not possible: | |
367 | |
368 cp932, shift-jis, sjis | |
369 cp936, euc-cn | |
370 | |
371 *encoding-table* | |
372 Normally 'encoding' is equal to your current locale and 'termencoding' is | |
373 empty. This means that your keyboard and display work with characters encoded | |
374 in your current locale, and Vim uses the same characters internally. | |
375 | |
376 You can make Vim use characters in a different encoding by setting the | |
377 'encoding' option to a different value. Since the keyboard and display still | |
378 use the current locale, conversion needs to be done. The 'termencoding' then | |
379 takes over the value of the current locale, so Vim converts between 'encoding' | |
380 and 'termencoding'. Example: > | |
381 :let &termencoding = &encoding | |
382 :set encoding=utf-8 | |
383 | |
384 However, not all combinations of values are possible. The table below tells | |
385 you how each of the nine combinations works. This is further restricted by | |
386 not all conversions being possible, iconv() being present, etc. Since this | |
387 depends on the system used, no detailed list can be given. | |
388 | |
389 ('tenc' is the short name for 'termencoding' and 'enc' short for 'encoding') | |
390 | |
391 'tenc' 'enc' remark ~ | |
392 | |
393 8bit 8bit Works. When 'termencoding' is different from | |
394 'encoding' typing and displaying may be wrong for some | |
395 characters, Vim does NOT perform conversion (set | |
396 'encoding' to "utf-8" to get this). | |
397 8bit 2byte MS-Windows: works for all codepages installed on your | |
398 system; you can only type 8bit characters; | |
399 Other systems: does NOT work. | |
1121 | 400 8bit Unicode Works, but only 8bit characters can be typed directly |
401 (others through digraphs, keymaps, etc.); in a | |
7 | 402 terminal you can only see 8bit characters; the GUI can |
403 show all characters that the 'guifont' supports. | |
404 | |
405 2byte 8bit Works, but typing non-ASCII characters might | |
406 be a problem. | |
407 2byte 2byte MS-Windows: works for all codepages installed on your | |
408 system; typing characters might be a problem when | |
409 locale is different from 'encoding'. | |
410 Other systems: Only works when 'termencoding' is equal | |
411 to 'encoding', you might as well leave it empty. | |
412 2byte Unicode works, Vim will translate typed characters. | |
413 | |
414 Unicode 8bit works (unusual) | |
415 Unicode 2byte does NOT work | |
416 Unicode Unicode works very well (leaving 'termencoding' empty works | |
417 the same way, because all Unicode is handled | |
418 internally as UTF-8) | |
419 | |
420 CONVERSION *charset-conversion* | |
421 | |
422 Vim will automatically convert from one to another encoding in several places: | |
423 - When reading a file and 'fileencoding' is different from 'encoding' | |
424 - When writing a file and 'fileencoding' is different from 'encoding' | |
425 - When displaying characters and 'termencoding' is different from 'encoding' | |
426 - When reading input and 'termencoding' is different from 'encoding' | |
427 - When displaying messages and the encoding used for LC_MESSAGES differs from | |
428 'encoding' (requires a gettext version that supports this). | |
429 - When reading a Vim script where |:scriptencoding| is different from | |
430 'encoding'. | |
431 - When reading or writing a |viminfo| file. | |
432 Most of these require the |+iconv| feature. Conversion for reading and | |
433 writing files may also be specified with the 'charconvert' option. | |
434 | |
435 Useful utilities for converting the charset: | |
436 All: iconv | |
437 GNU iconv can convert most encodings. Unicode is used as the | |
438 intermediate encoding, which allows conversion from and to all other | |
439 encodings. See http://www.gnu.org/directory/libiconv.html. | |
440 | |
441 Japanese: nkf | |
442 Nkf is "Network Kanji code conversion Filter". One of the most unique | |
443 facility of nkf is the guess of the input Kanji code. So, you don't | |
444 need to know what the inputting file's |charset| is. When convert to | |
445 EUC-JP from ISO-2022-JP or Shift_JIS, simply do the following command | |
446 in Vim: | |
447 :%!nkf -e | |
448 Nkf can be found at: | |
449 http://www.sfc.wide.ad.jp/~max/FreeBSD/ports/distfiles/nkf-1.62.tar.gz | |
450 | |
451 Chinese: hc | |
452 Hc is "Hanzi Converter". Hc convert a GB file to a Big5 file, or Big5 | |
453 file to GB file. Hc can be found at: | |
454 ftp://ftp.cuhk.hk/pub/chinese/ifcss/software/unix/convert/hc-30.tar.gz | |
455 | |
456 Korean: hmconv | |
236 | 457 Hmconv is Korean code conversion utility especially for E-mail. It can |
7 | 458 convert between EUC-KR and ISO-2022-KR. Hmconv can be found at: |
459 ftp://ftp.kaist.ac.kr/pub/hangul/code/hmconv/ | |
460 | |
461 Multilingual: lv | |
462 Lv is a Powerful Multilingual File Viewer. And it can be worked as | |
463 |charset| converter. Supported |charset|: ISO-2022-CN, ISO-2022-JP, | |
464 ISO-2022-KR, EUC-CN, EUC-JP, EUC-KR, EUC-TW, UTF-7, UTF-8, ISO-8859 | |
236 | 465 series, Shift_JIS, Big5 and HZ. Lv can be found at: |
3682 | 466 http://www.ff.iij4u.or.jp/~nrt/lv/index.html |
7 | 467 |
468 | |
469 *mbyte-conversion* | |
470 When reading and writing files in an encoding different from 'encoding', | |
471 conversion needs to be done. These conversions are supported: | |
472 - All conversions between Latin-1 (ISO-8859-1), UTF-8, UCS-2 and UCS-4 are | |
473 handled internally. | |
474 - For MS-Windows, when 'encoding' is a Unicode encoding, conversion from and | |
475 to any codepage should work. | |
476 - Conversion specified with 'charconvert' | |
477 - Conversion with the iconv library, if it is available. | |
478 Old versions of GNU iconv() may cause the conversion to fail (they | |
479 request a very large buffer, more than Vim is willing to provide). | |
480 Try getting another iconv() implementation. | |
481 | |
557 | 482 *iconv-dynamic* |
483 On MS-Windows Vim can be compiled with the |+iconv/dyn| feature. This means | |
484 Vim will search for the "iconv.dll" and "libiconv.dll" libraries. When | |
485 neither of them can be found Vim will still work but some conversions won't be | |
486 possible. | |
487 | |
7 | 488 ============================================================================== |
489 4. Using a terminal *mbyte-terminal* | |
490 | |
491 The GUI fully supports multi-byte characters. It is also possible in a | |
492 terminal, if the terminal supports the same encoding that Vim uses. Thus this | |
493 is less flexible. | |
494 | |
495 For example, you can run Vim in a xterm with added multi-byte support and/or | |
496 |XIM|. Examples are kterm (Kanji term) and hanterm (for Korean), Eterm | |
497 (Enlightened terminal) and rxvt. | |
498 | |
499 If your terminal does not support the right encoding, you can set the | |
500 'termencoding' option. Vim will then convert the typed characters from | |
501 'termencoding' to 'encoding'. And displayed text will be converted from | |
502 'encoding' to 'termencoding'. If the encoding supported by the terminal | |
503 doesn't include all the characters that Vim uses, this leads to lost | |
504 characters. This may mess up the display. If you use a terminal that | |
505 supports Unicode, such as the xterm mentioned below, it should work just fine, | |
506 since nearly every character set can be converted to Unicode without loss of | |
507 information. | |
508 | |
509 | |
510 UTF-8 IN XFREE86 XTERM *UTF8-xterm* | |
511 | |
512 This is a short explanation of how to use UTF-8 character encoding in the | |
513 xterm that comes with XFree86 by Thomas Dickey (text by Markus Kuhn). | |
514 | |
515 Get the latest xterm version which has now UTF-8 support: | |
516 | |
517 http://invisible-island.net/xterm/xterm.html | |
518 | |
519 Compile it with "./configure --enable-wide-chars ; make" | |
520 | |
521 Also get the ISO 10646-1 version of various fonts, which is available on | |
522 | |
523 http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz | |
524 | |
525 and install the font as described in the README file. | |
526 | |
527 Now start xterm with > | |
528 | |
529 xterm -u8 -fn -misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1 | |
530 or, for bigger character: > | |
531 xterm -u8 -fn -misc-fixed-medium-r-normal--15-140-75-75-c-90-iso10646-1 | |
532 | |
236 | 533 and you will have a working UTF-8 terminal emulator. Try both > |
7 | 534 |
535 cat utf-8-demo.txt | |
536 vim utf-8-demo.txt | |
537 | |
538 with the demo text that comes with ucs-fonts.tar.gz in order to see | |
539 whether there are any problems with UTF-8 in your xterm. | |
540 | |
541 For Vim you may need to set 'encoding' to "utf-8". | |
542 | |
543 ============================================================================== | |
544 5. Fonts on X11 *mbyte-fonts-X11* | |
545 | |
546 Unfortunately, using fonts in X11 is complicated. The name of a single-byte | |
547 font is a long string. For multi-byte fonts we need several of these... | |
548 | |
549 Note: Most of this is no longer relevant for GTK+ 2. Selecting a font via | |
2033
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
550 its XLFD is not supported; see 'guifont' for an example of how to |
7 | 551 set the font. Do yourself a favor and ignore the |XLFD| and |xfontset| |
552 sections below. | |
553 | |
554 First of all, Vim only accepts fixed-width fonts for displaying text. You | |
555 cannot use proportionally spaced fonts. This excludes many of the available | |
556 (and nicer looking) fonts. However, for menus and tooltips any font can be | |
557 used. | |
558 | |
559 Note that Display and Input are independent. It is possible to see your | |
560 language even though you have no input method for it. | |
561 | |
562 You should get a default font for menus and tooltips that works, but it might | |
563 be ugly. Read the following to find out how to select a better font. | |
564 | |
565 | |
566 X LOGICAL FONT DESCRIPTION (XLFD) | |
567 *XLFD* | |
568 XLFD is the X font name and contains the information about the font size, | |
569 charset, etc. The name is in this format: | |
570 | |
571 FOUNDRY-FAMILY-WEIGHT-SLANT-WIDTH-STYLE-PIXEL-POINT-X-Y-SPACE-AVE-CR-CE | |
572 | |
573 Each field means: | |
574 | |
575 - FOUNDRY: FOUNDRY field. The company that created the font. | |
576 - FAMILY: FAMILY_NAME field. Basic font family name. (helvetica, gothic, | |
577 times, etc) | |
578 - WEIGHT: WEIGHT_NAME field. How thick the letters are. (light, medium, | |
579 bold, etc) | |
580 - SLANT: SLANT field. | |
581 r: Roman (no slant) | |
582 i: Italic | |
583 o: Oblique | |
584 ri: Reverse Italic | |
585 ro: Reverse Oblique | |
586 ot: Other | |
587 number: Scaled font | |
588 - WIDTH: SETWIDTH_NAME field. Width of characters. (normal, condensed, | |
589 narrow, double wide) | |
590 - STYLE: ADD_STYLE_NAME field. Extra info to describe font. (Serif, Sans | |
591 Serif, Informal, Decorated, etc) | |
592 - PIXEL: PIXEL_SIZE field. Height, in pixels, of characters. | |
593 - POINT: POINT_SIZE field. Ten times height of characters in points. | |
594 - X: RESOLUTION_X field. X resolution (dots per inch). | |
595 - Y: RESOLUTION_Y field. Y resolution (dots per inch). | |
596 - SPACE: SPACING field. | |
597 p: Proportional | |
598 m: Monospaced | |
599 c: CharCell | |
600 - AVE: AVERAGE_WIDTH field. Ten times average width in pixels. | |
601 - CR: CHARSET_REGISTRY field. The name of the charset group. | |
602 - CE: CHARSET_ENCODING field. The rest of the charset name. For some | |
603 charsets, such as JIS X 0208, if this field is 0, code points has | |
604 the same value as GL, and GR if 1. | |
605 | |
3682 | 606 For example, in case of a 16 dots font corresponding to JIS X 0208, it is |
7 | 607 written like: |
608 -misc-fixed-medium-r-normal--16-110-100-100-c-160-jisx0208.1990-0 | |
609 | |
610 | |
611 X FONTSET | |
612 *fontset* *xfontset* | |
613 A single-byte charset is typically associated with one font. For multi-byte | |
614 charsets a combination of fonts is often used. This means that one group of | |
615 characters are used from one font and another group from another font (which | |
616 might be double wide). This collection of fonts is called a fontset. | |
617 | |
618 Which fonts are required in a fontset depends on the current locale. X | |
619 windows maintains a table of which groups of characters are required for a | |
620 locale. You have to specify all the fonts that a locale requires in the | |
621 'guifontset' option. | |
622 | |
16439
9d20e26dc13c
patch 8.1.1224: MS-Windows: cannot specify font weight
Bram Moolenaar <Bram@vim.org>
parents:
15878
diff
changeset
|
623 Setting the 'guifontset' option also means that all font names will be handled |
9d20e26dc13c
patch 8.1.1224: MS-Windows: cannot specify font weight
Bram Moolenaar <Bram@vim.org>
parents:
15878
diff
changeset
|
624 as a fontset name. Also the ones used for the "font" argument of the |
9d20e26dc13c
patch 8.1.1224: MS-Windows: cannot specify font weight
Bram Moolenaar <Bram@vim.org>
parents:
15878
diff
changeset
|
625 |:highlight| command. |
9d20e26dc13c
patch 8.1.1224: MS-Windows: cannot specify font weight
Bram Moolenaar <Bram@vim.org>
parents:
15878
diff
changeset
|
626 |
9d20e26dc13c
patch 8.1.1224: MS-Windows: cannot specify font weight
Bram Moolenaar <Bram@vim.org>
parents:
15878
diff
changeset
|
627 Note the difference between 'guifont' and 'guifontset': In 'guifont' |
9d20e26dc13c
patch 8.1.1224: MS-Windows: cannot specify font weight
Bram Moolenaar <Bram@vim.org>
parents:
15878
diff
changeset
|
628 the comma-separated names are alternative names, one of which will be |
9d20e26dc13c
patch 8.1.1224: MS-Windows: cannot specify font weight
Bram Moolenaar <Bram@vim.org>
parents:
15878
diff
changeset
|
629 used. In 'guifontset' the whole string is one fontset name, |
9d20e26dc13c
patch 8.1.1224: MS-Windows: cannot specify font weight
Bram Moolenaar <Bram@vim.org>
parents:
15878
diff
changeset
|
630 including the commas. It is not possible to specify alternative |
9d20e26dc13c
patch 8.1.1224: MS-Windows: cannot specify font weight
Bram Moolenaar <Bram@vim.org>
parents:
15878
diff
changeset
|
631 fontset names. |
9d20e26dc13c
patch 8.1.1224: MS-Windows: cannot specify font weight
Bram Moolenaar <Bram@vim.org>
parents:
15878
diff
changeset
|
632 This example works on many X11 systems: > |
9d20e26dc13c
patch 8.1.1224: MS-Windows: cannot specify font weight
Bram Moolenaar <Bram@vim.org>
parents:
15878
diff
changeset
|
633 :set guifontset=-*-*-medium-r-normal--16-*-*-*-c-*-*-* |
9d20e26dc13c
patch 8.1.1224: MS-Windows: cannot specify font weight
Bram Moolenaar <Bram@vim.org>
parents:
15878
diff
changeset
|
634 < |
9d20e26dc13c
patch 8.1.1224: MS-Windows: cannot specify font weight
Bram Moolenaar <Bram@vim.org>
parents:
15878
diff
changeset
|
635 The fonts must match with the current locale. If fonts for the character sets |
9d20e26dc13c
patch 8.1.1224: MS-Windows: cannot specify font weight
Bram Moolenaar <Bram@vim.org>
parents:
15878
diff
changeset
|
636 that the current locale uses are not included, setting 'guifontset' will fail. |
9d20e26dc13c
patch 8.1.1224: MS-Windows: cannot specify font weight
Bram Moolenaar <Bram@vim.org>
parents:
15878
diff
changeset
|
637 |
7 | 638 NOTE: The fontset always uses the current locale, even though 'encoding' may |
639 be set to use a different charset. In that situation you might want to use | |
640 'guifont' and 'guifontwide' instead of 'guifontset'. | |
641 | |
642 Example: | |
643 |charset| language "groups of characters" ~ | |
644 GB2312 Chinese (simplified) ISO-8859-1 and GB 2312 | |
645 Big5 Chinese (traditional) ISO-8859-1 and Big5 | |
646 CNS-11643 Chinese (traditional) ISO-8859-1, CNS 11643-1 and CNS 11643-2 | |
647 EUC-JP Japanese JIS X 0201 and JIS X 0208 | |
648 EUC-KR Korean ISO-8859-1 and KS C 5601 (KS X 1001) | |
649 | |
650 You can search for fonts using the xlsfonts command. For example, when you're | |
651 searching for a font for KS C 5601: > | |
652 xlsfonts | grep ksc5601 | |
653 | |
654 This is complicated and confusing. You might want to consult the X-Windows | |
655 documentation if there is something you don't understand. | |
656 | |
657 *base_font_name_list* | |
658 When you have found the names of the fonts you want to use, you need to set | |
659 the 'guifontset' option. You specify the list by concatenating the font names | |
660 and putting a comma in between them. | |
661 | |
662 For example, when you use the ja_JP.eucJP locale, this requires JIS X 0201 | |
663 and JIS X 0208. You could supply a list of fonts that explicitly specifies | |
664 the charsets, like: > | |
665 | |
666 :set guifontset=-misc-fixed-medium-r-normal--14-130-75-75-c-140-jisx0208.1983-0, | |
667 \-misc-fixed-medium-r-normal--14-130-75-75-c-70-jisx0201.1976-0 | |
668 | |
669 Alternatively, you can supply a base font name list that omits the charset | |
670 name, letting X-Windows select font characters required for the locale. For | |
671 example: > | |
672 | |
673 :set guifontset=-misc-fixed-medium-r-normal--14-130-75-75-c-140, | |
674 \-misc-fixed-medium-r-normal--14-130-75-75-c-70 | |
675 | |
676 Alternatively, you can supply a single base font name that allows X-Windows to | |
677 select from all available fonts. For example: > | |
678 | |
679 :set guifontset=-misc-fixed-medium-r-normal--14-* | |
680 | |
681 Alternatively, you can specify alias names. See the fonts.alias file in the | |
682 fonts directory (e.g., /usr/X11R6/lib/X11/fonts/). For example: > | |
683 | |
684 :set guifontset=k14,r14 | |
685 < | |
686 *E253* | |
687 Note that in East Asian fonts, the standard character cell is square. When | |
688 mixing a Latin font and an East Asian font, the East Asian font width should | |
689 be twice the Latin font width. | |
690 | |
691 If 'guifontset' is not empty, the "font" argument of the |:highlight| command | |
692 is also interpreted as a fontset. For example, you should use for | |
693 highlighting: > | |
694 :hi Comment font=english_font,your_font | |
695 If you use a wrong "font" argument you will get an error message. | |
696 Also make sure that you set 'guifontset' before setting fonts for highlight | |
697 groups. | |
698 | |
699 | |
700 USING RESOURCE FILES | |
701 | |
702 Instead of specifying 'guifontset', you can set X11 resources and Vim will | |
703 pick them up. This is only for people who know how X resource files work. | |
704 | |
705 For Motif and Athena insert these three lines in your $HOME/.Xdefaults file: | |
706 | |
707 Vim.font: |base_font_name_list| | |
708 Vim*fontSet: |base_font_name_list| | |
709 Vim*fontList: your_language_font | |
710 | |
711 Note: Vim.font is for text area. | |
712 Vim*fontSet is for menu. | |
713 Vim*fontList is for menu (for Motif GUI) | |
714 | |
715 For example, when you are using Japanese and a 14 dots font, > | |
716 | |
717 Vim.font: -misc-fixed-medium-r-normal--14-* | |
718 Vim*fontSet: -misc-fixed-medium-r-normal--14-* | |
719 Vim*fontList: -misc-fixed-medium-r-normal--14-* | |
720 < | |
721 or: > | |
722 | |
723 Vim*font: k14,r14 | |
724 Vim*fontSet: k14,r14 | |
725 Vim*fontList: k14,r14 | |
726 < | |
727 To have them take effect immediately you will have to do > | |
728 | |
729 xrdb -merge ~/.Xdefaults | |
730 | |
731 Otherwise you will have to stop and restart the X server before the changes | |
732 take effect. | |
733 | |
734 | |
735 The GTK+ version of GUI Vim does not use .Xdefaults, use ~/.gtkrc instead. | |
736 The default mostly works OK. But for the menus you might have to change | |
737 it. Example: > | |
738 | |
739 style "default" | |
740 { | |
741 fontset="-*-*-medium-r-normal--14-*-*-*-c-*-*-*" | |
742 } | |
743 widget_class "*" style "default" | |
744 | |
745 ============================================================================== | |
746 6. Fonts on MS-Windows *mbyte-fonts-MSwin* | |
747 | |
748 The simplest is to use the font dialog to select fonts and try them out. You | |
749 can find this at the "Edit/Select Font..." menu. Once you find a font name | |
750 that works well you can use this command to see its name: > | |
751 | |
752 :set guifont | |
753 | |
754 Then add a command to your |gvimrc| file to set 'guifont': > | |
755 | |
756 :set guifont=courier_new:h12 | |
757 | |
758 ============================================================================== | |
759 7. Input on X11 *mbyte-XIM* | |
760 | |
761 X INPUT METHOD (XIM) BACKGROUND *XIM* *xim* *x-input-method* | |
762 | |
2207
b17bbfa96fa0
Add the settabvar() and gettabvar() functions.
Bram Moolenaar <bram@vim.org>
parents:
2154
diff
changeset
|
763 XIM is an international input module for X. There are two kinds of structures, |
7 | 764 Xlib unit type and |IM-server| (Input-Method server) type. |IM-server| type |
765 is suitable for complex input, such as CJK. | |
766 | |
767 - IM-server | |
768 *IM-server* | |
769 In |IM-server| type input structures, the input event is handled by either | |
770 of the two ways: FrontEnd system and BackEnd system. In the FrontEnd | |
771 system, input events are snatched by the |IM-server| first, then |IM-server| | |
772 give the application the result of input. On the other hand, the BackEnd | |
773 system works reverse order. MS Windows adopt BackEnd system. In X, most of | |
774 |IM-server|s adopt FrontEnd system. The demerit of BackEnd system is the | |
775 large overhead in communication, but it provides safe synchronization with | |
776 no restrictions on applications. | |
777 | |
778 For example, there are xwnmo and kinput2 Japanese |IM-server|, both are | |
779 FrontEnd system. Xwnmo is distributed with Wnn (see below), kinput2 can be | |
780 found at: ftp://ftp.sra.co.jp/pub/x11/kinput2/ | |
781 | |
782 For Chinese, there's a great XIM server named "xcin", you can input both | |
783 Traditional and Simplified Chinese characters. And it can accept other | |
784 locale if you make a correct input table. Xcin can be found at: | |
2236
dc2e5ec0500d
Added the undofile() function. Updated runtime files.
Bram Moolenaar <bram@vim.org>
parents:
2207
diff
changeset
|
785 http://cle.linux.org.tw/xcin/ |
15 | 786 Others are scim: http://scim.freedesktop.org/ and fcitx: |
856 | 787 http://www.fcitx.org/ |
7 | 788 |
789 - Conversion Server | |
790 *conversion-server* | |
791 Some system needs additional server: conversion server. Most of Japanese | |
792 |IM-server|s need it, Kana-Kanji conversion server. For Chinese inputting, | |
793 it depends on the method of inputting, in some methods, PinYin or ZhuYin to | |
794 HanZi conversion server is needed. For Korean inputting, if you want to | |
795 input Hanja, Hangul-Hanja conversion server is needed. | |
796 | |
797 For example, the Japanese inputting process is divided into 2 steps. First | |
798 we pre-input Hira-gana, second Kana-Kanji conversion. There are so many | |
799 Kanji characters (6349 Kanji characters are defined in JIS X 0208) and the | |
800 number of Hira-gana characters are 76. So, first, we pre-input text as | |
801 pronounced in Hira-gana, second, we convert Hira-gana to Kanji or Kata-Kana, | |
802 if needed. There are some Kana-Kanji conversion server: jserver | |
3153 | 803 (distributed with Wnn, see below) and canna. Canna can be found at: |
804 http://canna.sourceforge.jp/ | |
7 | 805 |
806 There is a good input system: Wnn4.2. Wnn 4.2 contains, | |
807 xwnmo (|IM-server|) | |
808 jserver (Japanese Kana-Kanji conversion server) | |
809 cserver (Chinese PinYin or ZhuYin to simplified HanZi conversion server) | |
810 tserver (Chinese PinYin or ZhuYin to traditional HanZi conversion server) | |
811 kserver (Hangul-Hanja conversion server) | |
812 Wnn 4.2 for several systems can be found at various places on the internet. | |
813 Use the RPM or port for your system. | |
814 | |
815 | |
816 - Input Style | |
817 *xim-input-style* | |
818 When inputting CJK, there are four areas: | |
819 1. The area to display of the input while it is being composed | |
820 2. The area to display the currently active input mode. | |
821 3. The area to display the next candidate for the selection. | |
822 4. The area to display other tools. | |
823 | |
824 The third area is needed when converting. For example, in Japanese | |
825 inputting, multiple Kanji characters could have the same pronunciation, so | |
826 a sequence of Hira-gana characters could map to a distinct sequence of Kanji | |
827 characters. | |
828 | |
829 The first and second areas are defined in international input of X with the | |
830 names of "Preedit Area", "Status Area" respectively. The third and fourth | |
831 areas are not defined and are left to be managed by the |IM-server|. In the | |
832 international input, four input styles have been defined using combinations | |
833 of Preedit Area and Status Area: |OnTheSpot|, |OffTheSpot|, |OverTheSpot| | |
834 and |Root|. | |
835 | |
2207
b17bbfa96fa0
Add the settabvar() and gettabvar() functions.
Bram Moolenaar <bram@vim.org>
parents:
2154
diff
changeset
|
836 Currently, GUI Vim supports three styles, |OverTheSpot|, |OffTheSpot| and |
7 | 837 |Root|. |
12293
1ff5e5dfa9b0
patch 8.0.1026: GTK on-the-spot input has problems
Christian Brabandt <cb@256bit.org>
parents:
10198
diff
changeset
|
838 When compiled with |+GUI_GTK| feature, GUI Vim supports two styles, |
1ff5e5dfa9b0
patch 8.0.1026: GTK on-the-spot input has problems
Christian Brabandt <cb@256bit.org>
parents:
10198
diff
changeset
|
839 |OnTheSpot| and |OverTheSpot|. You can select the style with the 'imstyle' |
1ff5e5dfa9b0
patch 8.0.1026: GTK on-the-spot input has problems
Christian Brabandt <cb@256bit.org>
parents:
10198
diff
changeset
|
840 option. |
7 | 841 |
842 *. on-the-spot *OnTheSpot* | |
843 Preedit Area and Status Area are performed by the client application in | |
844 the area of application. The client application is directed by the | |
845 |IM-server| to display all pre-edit data at the location of text | |
236 | 846 insertion. The client registers callbacks invoked by the input method |
7 | 847 during pre-editing. |
848 *. over-the-spot *OverTheSpot* | |
849 Status Area is created in a fixed position within the area of application, | |
850 in case of Vim, the position is the additional status line. Preedit Area | |
851 is made at present input position of application. The input method | |
852 displays pre-edit data in a window which it brings up directly over the | |
853 text insertion position. | |
854 *. off-the-spot *OffTheSpot* | |
855 Preedit Area and Status Area are performed in the area of application, in | |
856 case of Vim, the area is additional status line. The client application | |
857 provides display windows for the pre-edit data to the input method which | |
858 displays into them directly. | |
859 *. root-window *Root* | |
860 Preedit Area and Status Area are outside of the application. The input | |
861 method displays all pre-edit data in a separate area of the screen in a | |
862 window specific to the input method. | |
863 | |
864 | |
865 USING XIM *multibyte-input* *E284* *E286* *E287* *E288* | |
3410
94601b379f38
Updated runtime files. Add Dutch translations.
Bram Moolenaar <bram@vim.org>
parents:
3153
diff
changeset
|
866 *E285* *E289* |
7 | 867 |
868 Note that Display and Input are independent. It is possible to see your | |
869 language even though you have no input method for it. But when your Display | |
870 method doesn't match your Input method, the text will be displayed wrong. | |
871 | |
872 Note: You can not use IM unless you specify 'guifontset'. | |
873 Therefore, Latin users, you have to also use 'guifontset' | |
874 if you use IM. | |
875 | |
876 To input your language you should run the |IM-server| which supports your | |
877 language and |conversion-server| if needed. | |
878 | |
879 The next 3 lines should be put in your ~/.Xdefaults file. They are common for | |
880 all X applications which uses |XIM|. If you already use |XIM|, you can skip | |
881 this. > | |
882 | |
883 *international: True | |
884 *.inputMethod: your_input_server_name | |
885 *.preeditType: your_input_style | |
886 < | |
887 input_server_name is your |IM-server| name (check your |IM-server| | |
888 manual). | |
889 your_input_style is one of |OverTheSpot|, |OffTheSpot|, |Root|. See | |
890 also |xim-input-style|. | |
891 | |
892 *international may not necessary if you use X11R6. | |
893 *.inputMethod and *.preeditType are optional if you use X11R6. | |
894 | |
895 For example, when you are using kinput2 as |IM-server|, > | |
896 | |
897 *international: True | |
898 *.inputMethod: kinput2 | |
899 *.preeditType: OverTheSpot | |
900 < | |
901 When using |OverTheSpot|, GUI Vim always connects to the IM Server even in | |
902 Normal mode, so you can input your language with commands like "f" and "r". | |
903 But when using one of the other two methods, GUI Vim connects to the IM Server | |
904 only if it is not in Normal mode. | |
905 | |
906 If your IM Server does not support |OverTheSpot|, and if you want to use your | |
907 language with some Normal mode command like "f" or "r", then you should use a | |
908 localized xterm or an xterm which supports |XIM| | |
909 | |
910 If needed, you can set the XMODIFIERS environment variable: | |
911 | |
912 sh: export XMODIFIERS="@im=input_server_name" | |
913 csh: setenv XMODIFIERS "@im=input_server_name" | |
914 | |
915 For example, when you are using kinput2 as |IM-server| and sh, > | |
916 | |
917 export XMODIFIERS="@im=kinput2" | |
918 < | |
919 | |
920 FULLY CONTROLLED XIM | |
921 | |
922 You can fully control XIM, like with IME of MS-Windows (see |multibyte-ime|). | |
923 This is currently only available for the GTK GUI. | |
924 | |
925 Before using fully controlled XIM, one setting is required. Set the | |
926 'imactivatekey' option to the key that is used for the activation of the input | |
927 method. For example, when you are using kinput2 + canna as IM Server, the | |
928 activation key is probably Shift+Space: > | |
929 | |
930 :set imactivatekey=S-space | |
931 | |
932 See 'imactivatekey' for the format. | |
933 | |
934 ============================================================================== | |
935 8. Input on MS-Windows *mbyte-IME* | |
936 | |
937 (Windows IME support) *multibyte-ime* *IME* | |
938 | |
939 {only works Windows GUI and compiled with the |+multi_byte_ime| feature} | |
940 | |
2415 | 941 To input multibyte characters on Windows, you can use an Input Method Editor |
7 | 942 (IME). In process of your editing text, you must switch status (on/off) of |
943 IME many many many times. Because IME with status on is hooking all of your | |
944 key inputs, you cannot input 'j', 'k', or almost all of keys to Vim directly. | |
945 | |
946 This |+multi_byte_ime| feature help this. It reduce times of switch status of | |
947 IME manually. In normal mode, there are almost no need working IME, even | |
948 editing multibyte text. So exiting insert mode with ESC, Vim memorize last | |
949 status of IME and force turn off IME. When re-enter insert mode, Vim revert | |
950 IME status to that memorized automatically. | |
951 | |
952 This works on not only insert-normal mode, but also search-command input and | |
953 replace mode. | |
954 The options 'iminsert', 'imsearch' and 'imcmdline' can be used to chose | |
9 | 955 the different input methods or disable them temporarily. |
7 | 956 |
957 WHAT IS IME | |
958 IME is a part of East asian version Windows. That helps you to input | |
959 multibyte character. English and other language version Windows does not | |
2355
84c7eeeb09e2
Fix typos in documentation. (Dominique Pelle)
Bram Moolenaar <bram@vim.org>
parents:
2345
diff
changeset
|
960 have any IME. (Also there is no need usually.) But there is one that |
7 | 961 called Microsoft Global IME. Global IME is a part of Internet Explorer |
962 4.0 or above. You can get more information about Global IME, at below | |
963 URL. | |
964 | |
965 WHAT IS GLOBAL IME *global-ime* | |
966 Global IME makes capability to input Chinese, Japanese, and Korean text | |
967 into Vim buffer on any language version of Windows 98, Windows 95, and | |
968 Windows NT 4.0. | |
969 On Windows 2000 and XP it should work as well (without downloading). On | |
970 Windows 2000 Professional, Global IME is built in, and the Input Locales | |
971 can be added through Control Panel/Regional Options/Input Locales. | |
972 Please see below URL for detail of Global IME. You can also find various | |
973 language version of Global IME at same place. | |
974 | |
975 - Global IME detailed information. | |
2236
dc2e5ec0500d
Added the undofile() function. Updated runtime files.
Bram Moolenaar <bram@vim.org>
parents:
2207
diff
changeset
|
976 http://search.microsoft.com/results.aspx?q=global+ime |
7 | 977 |
978 - Active Input Method Manager (Global IME) | |
2236
dc2e5ec0500d
Added the undofile() function. Updated runtime files.
Bram Moolenaar <bram@vim.org>
parents:
2207
diff
changeset
|
979 http://msdn.microsoft.com/en-us/library/aa741221(v=VS.85).aspx |
7 | 980 |
1621 | 981 Support for Global IME is an experimental feature. |
7 | 982 |
983 NOTE: For IME to work you must make sure the input locales of your language | |
984 are added to your system. The exact location of this depends on the version | |
1621 | 985 of Windows you use. For example, on my Windows 2000 box: |
7 | 986 1. Control Panel |
987 2. Regional Options | |
988 3. Input Locales Tab | |
989 4. Add Installed input locales -> Chinese(PRC) | |
990 The default is still English (United Stated) | |
991 | |
992 | |
993 Cursor color when IME or XIM is on *CursorIM* | |
994 There is a little cute feature for IME. Cursor can indicate status of IME | |
995 by changing its color. Usually status of IME was indicated by little icon | |
996 at a corner of desktop (or taskbar). It is not easy to verify status of | |
997 IME. But this feature help this. | |
998 This works in the same way when using XIM. | |
999 | |
1000 You can select cursor color when status is on by using highlight group | |
819 | 1001 CursorIM. For example, add these lines to your |gvimrc|: > |
7 | 1002 |
1003 if has('multi_byte_ime') | |
1004 highlight Cursor guifg=NONE guibg=Green | |
1005 highlight CursorIM guifg=NONE guibg=Purple | |
1006 endif | |
1007 < | |
1008 Cursor color with off IME is green. And purple cursor indicates that | |
1009 status is on. | |
1010 | |
1011 ============================================================================== | |
1012 9. Input with a keymap *mbyte-keymap* | |
1013 | |
1014 When the keyboard doesn't produce the characters you want to enter in your | |
1015 text, you can use the 'keymap' option. This will translate one or more | |
1016 (English) characters to another (non-English) character. This only happens | |
1017 when typing text, not when typing Vim commands. This avoids having to switch | |
1018 between two keyboard settings. | |
9644
9f7bcc2c3b97
commit https://github.com/vim/vim/commit/6f1d9a096bf22d50c727dca73abbfb8e3ff55176
Christian Brabandt <cb@256bit.org>
parents:
5294
diff
changeset
|
1019 {only available when compiled with the |+keymap| feature} |
7 | 1020 |
1021 The value of the 'keymap' option specifies a keymap file to use. The name of | |
1022 this file is one of these two: | |
1023 | |
1024 keymap/{keymap}_{encoding}.vim | |
1025 keymap/{keymap}.vim | |
1026 | |
1027 Here {keymap} is the value of the 'keymap' option and {encoding} of the | |
1028 'encoding' option. The file name with the {encoding} included is tried first. | |
1029 | |
1030 'runtimepath' is used to find these files. To see an overview of all | |
1031 available keymap files, use this: > | |
1032 :echo globpath(&rtp, "keymap/*.vim") | |
1033 | |
1034 In Insert and Command-line mode you can use CTRL-^ to toggle between using the | |
1035 keyboard map or not. |i_CTRL-^| |c_CTRL-^| | |
1036 This flag is remembered for Insert mode with the 'iminsert' option. When | |
1037 leaving and entering Insert mode the previous value is used. The same value | |
1038 is also used for commands that take a single character argument, like |f| and | |
1039 |r|. | |
1040 For Command-line mode the flag is NOT remembered. You are expected to type an | |
1041 Ex command first, which is ASCII. | |
1042 For typing search patterns the 'imsearch' option is used. It can be set to | |
1043 use the same value as for 'iminsert'. | |
2033
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1044 *lCursor* |
7 | 1045 It is possible to give the GUI cursor another color when the language mappings |
1046 are being used. This is disabled by default, to avoid that the cursor becomes | |
1047 invisible when you use a non-standard background color. Here is an example to | |
1048 use a brightly colored cursor: > | |
1049 :highlight Cursor guifg=NONE guibg=Green | |
1050 :highlight lCursor guifg=NONE guibg=Cyan | |
1051 < | |
839 | 1052 *keymap-file-format* *:loadk* *:loadkeymap* *E105* *E791* |
7 | 1053 The keymap file looks something like this: > |
1054 | |
1055 " Maintainer: name <email@address> | |
1056 " Last Changed: 2001 Jan 1 | |
1057 | |
1058 let b:keymap_name = "short" | |
1059 | |
1060 loadkeymap | |
1061 a A | |
1062 b B comment | |
1063 | |
1064 The lines starting with a " are comments and will be ignored. Blank lines are | |
1065 also ignored. The lines with the mappings may have a comment after the useful | |
1066 text. | |
1067 | |
1068 The "b:keymap_name" can be set to a short name, which will be shown in the | |
1069 status line. The idea is that this takes less room than the value of | |
1070 'keymap', which might be long to distinguish between different languages, | |
1071 keyboards and encodings. | |
1072 | |
1073 The actual mappings are in the lines below "loadkeymap". In the example "a" | |
1074 is mapped to "A" and "b" to "B". Thus the first item is mapped to the second | |
1075 item. This is done for each line, until the end of the file. | |
1076 These items are exactly the same as what can be used in a |:lnoremap| command, | |
4186 | 1077 using "<buffer>" to make the mappings local to the buffer. |
7 | 1078 You can check the result with this command: > |
1079 :lmap | |
1080 The two items must be separated by white space. You cannot include white | |
1081 space inside an item, use the special names "<Tab>" and "<Space>" instead. | |
1082 The length of the two items together must not exceed 200 bytes. | |
1083 | |
1084 It's possible to have more than one character in the first column. This works | |
1085 like a dead key. Example: > | |
1086 'a á | |
1087 Since Vim doesn't know if the next character after a quote is really an "a", | |
1088 it will wait for the next character. To be able to insert a single quote, | |
1089 also add this line: > | |
1090 '' ' | |
1091 Since the mapping is defined with |:lnoremap| the resulting quote will not be | |
1092 used for the start of another character. | |
818 | 1093 The "accents" keymap uses this. *keymap-accents* |
7 | 1094 |
3893 | 1095 The first column can also be in |<>| form: |
1096 <C-c> Ctrl-C | |
1097 <A-c> Alt-c | |
1098 <A-C> Alt-C | |
1099 Note that the Alt mappings may not work, depending on your keyboard and | |
1100 terminal. | |
1101 | |
7 | 1102 Although it's possible to have more than one character in the second column, |
1103 this is unusual. But you can use various ways to specify the character: > | |
1104 A a literal character | |
1105 A <char-97> decimal value | |
1106 A <char-0x61> hexadecimal value | |
1107 A <char-0141> octal value | |
1108 x <Space> special key name | |
1109 | |
1110 The characters are assumed to be encoded for the current value of 'encoding'. | |
1111 It's possible to use ":scriptencoding" when all characters are given | |
1112 literally. That doesn't work when using the <char-> construct, because the | |
1113 conversion is done on the keymap file, not on the resulting character. | |
1114 | |
1115 The lines after "loadkeymap" are interpreted with 'cpoptions' set to "C". | |
1116 This means that continuation lines are not used and a backslash has a special | |
1117 meaning in the mappings. Examples: > | |
1118 | |
1119 " a comment line | |
1120 \" x maps " to x | |
1121 \\ y maps \ to y | |
1122 | |
1123 If you write a keymap file that will be useful for others, consider submitting | |
1124 it to the Vim maintainer for inclusion in the distribution: | |
1125 <maintainer@vim.org> | |
1126 | |
1127 | |
1128 HEBREW KEYMAP *keymap-hebrew* | |
1129 | |
1130 This file explains what characters are available in UTF-8 and CP1255 encodings, | |
1131 and what the keymaps are to get those characters: | |
1132 | |
1133 glyph encoding keymap ~ | |
1134 Char utf-8 cp1255 hebrew hebrewp name ~ | |
1135 א 0x5d0 0xe0 t a 'alef | |
1136 ב 0x5d1 0xe1 c b bet | |
1137 ג 0x5d2 0xe2 d g gimel | |
1138 ד 0x5d3 0xe3 s d dalet | |
1139 ה 0x5d4 0xe4 v h he | |
1140 ו 0x5d5 0xe5 u v vav | |
1141 ז 0x5d6 0xe6 z z zayin | |
1142 ח 0x5d7 0xe7 j j het | |
1143 ט 0x5d8 0xe8 y T tet | |
1144 י 0x5d9 0xe9 h y yod | |
1145 ך 0x5da 0xea l K kaf sofit | |
1146 כ 0x5db 0xeb f k kaf | |
1147 ל 0x5dc 0xec k l lamed | |
1148 ם 0x5dd 0xed o M mem sofit | |
1149 מ 0x5de 0xee n m mem | |
1150 ן 0x5df 0xef i N nun sofit | |
1151 נ 0x5e0 0xf0 b n nun | |
1152 ס 0x5e1 0xf1 x s samech | |
1153 ע 0x5e2 0xf2 g u `ayin | |
1154 ף 0x5e3 0xf3 ; P pe sofit | |
1155 פ 0x5e4 0xf4 p p pe | |
1156 ץ 0x5e5 0xf5 . X tsadi sofit | |
1157 צ 0x5e6 0xf6 m x tsadi | |
1158 ק 0x5e7 0xf7 e q qof | |
1159 ר 0x5e8 0xf8 r r resh | |
1160 ש 0x5e9 0xf9 a w shin | |
1161 ת 0x5ea 0xfa , t tav | |
1162 | |
1163 Vowel marks and special punctuation: | |
1164 הְ 0x5b0 0xc0 A: A: sheva | |
1165 הֱ 0x5b1 0xc1 HE HE hataf segol | |
1166 הֲ 0x5b2 0xc2 HA HA hataf patah | |
1167 הֳ 0x5b3 0xc3 HO HO hataf qamats | |
1168 הִ 0x5b4 0xc4 I I hiriq | |
1169 הֵ 0x5b5 0xc5 AY AY tsere | |
1170 הֶ 0x5b6 0xc6 E E segol | |
1171 הַ 0x5b7 0xc7 AA AA patah | |
1172 הָ 0x5b8 0xc8 AO AO qamats | |
1173 הֹ 0x5b9 0xc9 O O holam | |
1174 הֻ 0x5bb 0xcb U U qubuts | |
1175 כּ 0x5bc 0xcc D D dagesh | |
1176 הֽ 0x5bd 0xcd ]T ]T meteg | |
1177 ה־ 0x5be 0xce ]Q ]Q maqaf | |
1178 בֿ 0x5bf 0xcf ]R ]R rafe | |
1179 ב׀ 0x5c0 0xd0 ]p ]p paseq | |
1180 שׁ 0x5c1 0xd1 SR SR shin-dot | |
1181 שׂ 0x5c2 0xd2 SL SL sin-dot | |
1182 ׃ 0x5c3 0xd3 ]P ]P sof-pasuq | |
1183 װ 0x5f0 0xd4 VV VV double-vav | |
1184 ױ 0x5f1 0xd5 VY VY vav-yod | |
1185 ײ 0x5f2 0xd6 YY YY yod-yod | |
1186 | |
1187 The following are only available in utf-8 | |
1188 | |
1189 Cantillation marks: | |
1190 glyph | |
1191 Char utf-8 hebrew name | |
1192 ב֑ 0x591 C: etnahta | |
1193 ב֒ 0x592 Cs segol | |
1194 ב֓ 0x593 CS shalshelet | |
1195 ב֔ 0x594 Cz zaqef qatan | |
1196 ב֕ 0x595 CZ zaqef gadol | |
1197 ב֖ 0x596 Ct tipeha | |
1198 ב֗ 0x597 Cr revia | |
1199 ב֘ 0x598 Cq zarqa | |
1200 ב֙ 0x599 Cp pashta | |
1201 ב֚ 0x59a C! yetiv | |
1202 ב֛ 0x59b Cv tevir | |
1203 ב֜ 0x59c Cg geresh | |
1204 ב֝ 0x59d C* geresh qadim | |
1205 ב֞ 0x59e CG gershayim | |
1206 ב֟ 0x59f CP qarnei-parah | |
1207 ב֪ 0x5aa Cy yerach-ben-yomo | |
1208 ב֫ 0x5ab Co ole | |
1209 ב֬ 0x5ac Ci iluy | |
1210 ב֭ 0x5ad Cd dehi | |
1211 ב֮ 0x5ae Cn zinor | |
1212 ב֯ 0x5af CC masora circle | |
1213 | |
1214 Combining forms: | |
1215 ﬠ 0xfb20 X` Alternative `ayin | |
1216 ﬡ 0xfb21 X' Alternative 'alef | |
1217 ﬢ 0xfb22 X-d Alternative dalet | |
1218 ﬣ 0xfb23 X-h Alternative he | |
1219 ﬤ 0xfb24 X-k Alternative kaf | |
1220 ﬥ 0xfb25 X-l Alternative lamed | |
1221 ﬦ 0xfb26 X-m Alternative mem-sofit | |
1222 ﬧ 0xfb27 X-r Alternative resh | |
1223 ﬨ 0xfb28 X-t Alternative tav | |
1224 ﬩ 0xfb29 X-+ Alternative plus | |
1225 שׁ 0xfb2a XW shin+shin-dot | |
1226 שׂ 0xfb2b Xw shin+sin-dot | |
1227 שּׁ 0xfb2c X..W shin+shin-dot+dagesh | |
1228 שּׂ 0xfb2d X..w shin+sin-dot+dagesh | |
1229 אַ 0xfb2e XA alef+patah | |
1230 אָ 0xfb2f XO alef+qamats | |
1231 אּ 0xfb30 XI alef+hiriq (mapiq) | |
1232 בּ 0xfb31 X.b bet+dagesh | |
1233 גּ 0xfb32 X.g gimel+dagesh | |
1234 דּ 0xfb33 X.d dalet+dagesh | |
1235 הּ 0xfb34 X.h he+dagesh | |
1236 וּ 0xfb35 Xu vav+dagesh | |
1237 זּ 0xfb36 X.z zayin+dagesh | |
1238 טּ 0xfb38 X.T tet+dagesh | |
1239 יּ 0xfb39 X.y yud+dagesh | |
1240 ךּ 0xfb3a X.K kaf sofit+dagesh | |
1241 כּ 0xfb3b X.k kaf+dagesh | |
1242 לּ 0xfb3c X.l lamed+dagesh | |
1243 מּ 0xfb3e X.m mem+dagesh | |
1244 נּ 0xfb40 X.n nun+dagesh | |
1245 סּ 0xfb41 X.s samech+dagesh | |
1246 ףּ 0xfb43 X.P pe sofit+dagesh | |
1247 פּ 0xfb44 X.p pe+dagesh | |
1248 צּ 0xfb46 X.x tsadi+dagesh | |
1249 קּ 0xfb47 X.q qof+dagesh | |
1250 רּ 0xfb48 X.r resh+dagesh | |
1251 שּ 0xfb49 X.w shin+dagesh | |
1252 תּ 0xfb4a X.t tav+dagesh | |
1253 וֹ 0xfb4b Xo vav+holam | |
1254 בֿ 0xfb4c XRb bet+rafe | |
1255 כֿ 0xfb4d XRk kaf+rafe | |
1256 פֿ 0xfb4e XRp pe+rafe | |
1257 ﭏ 0xfb4f Xal alef-lamed | |
1258 | |
1259 ============================================================================== | |
12920
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1260 10. Input with imactivatefunc() *mbyte-func* |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1261 |
12968 | 1262 Vim has the 'imactivatefunc' and 'imstatusfunc' options. These are useful to |
13125 | 1263 activate/deactivate the input method from Vim in any way, also with an external |
12920
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1264 command. For example, fcitx provide fcitx-remote command: > |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1265 |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1266 set iminsert=2 |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1267 set imsearch=2 |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1268 set imcmdline |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1269 |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1270 set imactivatefunc=ImActivate |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1271 function! ImActivate(active) |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1272 if a:active |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1273 call system('fcitx-remote -o') |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1274 else |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1275 call system('fcitx-remote -c') |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1276 endif |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1277 endfunction |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1278 |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1279 set imstatusfunc=ImStatus |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1280 function! ImStatus() |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1281 return system('fcitx-remote')[0] is# '2' |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1282 endfunction |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1283 |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1284 Using this script, you can activate/deactivate XIM via Vim even when it is not |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1285 compiled with |+xim|. |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1286 |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1287 ============================================================================== |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1288 11. Using UTF-8 *mbyte-utf8* *UTF-8* *utf-8* *utf8* |
7 | 1289 *Unicode* *unicode* |
1290 The Unicode character set was designed to include all characters from other | |
1291 character sets. Therefore it is possible to write text in any language using | |
1292 Unicode (with a few rarely used languages excluded). And it's mostly possible | |
1293 to mix these languages in one file, which is impossible with other encodings. | |
1294 | |
2033
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1295 Unicode can be encoded in several ways. The most popular one is UTF-8, which |
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1296 uses one or more bytes for each character and is backwards compatible with |
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1297 ASCII. On MS-Windows UTF-16 is also used (previously UCS-2), which uses |
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1298 16-bit words. Vim can support all of these encodings, but always uses UTF-8 |
7 | 1299 internally. |
1300 | |
2033
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1301 Vim has comprehensive UTF-8 support. It works well in: |
7 | 1302 - xterm with utf-8 support enabled |
1303 - Athena, Motif and GTK GUI | |
1304 - MS-Windows GUI | |
2033
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1305 - several other platforms |
7 | 1306 |
1307 Double-width characters are supported. This works best with 'guifontwide' or | |
1308 'guifontset'. When using only 'guifont' the wide characters are drawn in the | |
1309 normal width and a space to fill the gap. Note that the 'guifontset' option | |
1310 is no longer relevant in the GTK+ 2 GUI. | |
1311 | |
2033
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1312 *bom-bytes* |
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1313 When reading a file a BOM (Byte Order Mark) can be used to recognize the |
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1314 Unicode encoding: |
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1315 EF BB BF utf-8 |
2290
22529abcd646
Fixed ":s" message. Docs updates.
Bram Moolenaar <bram@vim.org>
parents:
2236
diff
changeset
|
1316 FE FF utf-16 big endian |
22529abcd646
Fixed ":s" message. Docs updates.
Bram Moolenaar <bram@vim.org>
parents:
2236
diff
changeset
|
1317 FF FE utf-16 little endian |
2033
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1318 00 00 FE FF utf-32 big endian |
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1319 FF FE 00 00 utf-32 little endian |
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1320 |
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1321 Utf-8 is the recommended encoding. Note that it's difficult to tell utf-16 |
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1322 and utf-32 apart. Utf-16 is often used on MS-Windows, utf-32 is not |
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1323 widespread as file format. |
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1324 |
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1325 |
714 | 1326 *mbyte-combining* *mbyte-composing* |
1327 A composing or combining character is used to change the meaning of the | |
1328 character before it. The combining characters are drawn on top of the | |
856 | 1329 preceding character. |
714 | 1330 Up to two combining characters can be used by default. This can be changed |
1331 with the 'maxcombine' option. | |
1332 When editing text a composing character is mostly considered part of the | |
1333 preceding character. For example "x" will delete a character and its | |
1334 following composing characters by default. | |
1335 If the 'delcombine' option is on, then pressing 'x' will delete the combining | |
7 | 1336 characters, one at a time, then the base character. But when inserting, you |
1337 type the first character and the following composing characters separately, | |
1338 after which they will be joined. The "r" command will not allow you to type a | |
1339 combining character, because it doesn't know one is coming. Use "R" instead. | |
1340 | |
1341 Bytes which are not part of a valid UTF-8 byte sequence are handled like a | |
1342 single character and displayed as <xx>, where "xx" is the hex value of the | |
1343 byte. | |
1344 | |
1345 Overlong sequences are not handled specially and displayed like a valid | |
1346 character. However, search patterns may not match on an overlong sequence. | |
1347 (an overlong sequence is where more bytes are used than required for the | |
1348 character.) An exception is NUL (zero) which is displayed as "<00>". | |
1349 | |
1350 In the file and buffer the full range of Unicode characters can be used (31 | |
2965 | 1351 bits). However, displaying only works for the characters present in the |
1352 selected font. | |
7 | 1353 |
1354 Useful commands: | |
1355 - "ga" shows the decimal, hexadecimal and octal value of the character under | |
236 | 1356 the cursor. If there are composing characters these are shown too. (If the |
7 | 1357 message is truncated, use ":messages"). |
1358 - "g8" shows the bytes used in a UTF-8 character, also the composing | |
1359 characters, as hex numbers. | |
1360 - ":set encoding=utf-8 fileencodings=" forces using UTF-8 for all files. The | |
1361 default is to use the current locale for 'encoding' and set 'fileencodings' | |
1621 | 1362 to automatically detect the encoding of a file. |
7 | 1363 |
1364 | |
1365 STARTING VIM | |
1366 | |
1367 If your current locale is in an utf-8 encoding, Vim will automatically start | |
1368 in utf-8 mode. | |
1369 | |
1370 If you are using another locale: > | |
1371 | |
1372 set encoding=utf-8 | |
1373 | |
1374 You might also want to select the font used for the menus. Unfortunately this | |
1375 doesn't always work. See the system specific remarks below, and 'langmenu'. | |
1376 | |
1377 | |
1378 USING UTF-8 IN X-Windows *utf-8-in-xwindows* | |
1379 | |
1380 Note: This section does not apply to the GTK+ 2 GUI. | |
1381 | |
1382 You need to specify a font to be used. For double-wide characters another | |
1383 font is required, which is exactly twice as wide. There are three ways to do | |
1384 this: | |
1385 | |
1386 1. Set 'guifont' and let Vim find a matching 'guifontwide' | |
1387 2. Set 'guifont' and 'guifontwide' | |
1388 3. Set 'guifontset' | |
1389 | |
1390 See the documentation for each option for details. Example: > | |
1391 | |
1392 :set guifont=-misc-fixed-medium-r-normal--15-140-75-75-c-90-iso10646-1 | |
1393 | |
1394 You might also want to set the font used for the menus. This only works for | |
1395 Motif. Use the ":hi Menu font={fontname}" command for this. |:highlight| | |
1396 | |
1397 | |
1398 TYPING UTF-8 *utf-8-typing* | |
1399 | |
1400 If you are using X-Windows, you should find an input method that supports | |
1401 utf-8. | |
1402 | |
1403 If your system does not provide support for typing utf-8, you can use the | |
1404 'keymap' feature. This allows writing a keymap file, which defines a utf-8 | |
1405 character as a sequence of ASCII characters. See |mbyte-keymap|. | |
1406 | |
1407 Another method is to set the current locale to the language you want to use | |
1408 and for which you have a XIM available. Then set 'termencoding' to that | |
1409 language and Vim will convert the typed characters to 'encoding' for you. | |
1410 | |
1411 If everything else fails, you can type any character as four hex bytes: > | |
1412 | |
1413 CTRL-V u 1234 | |
1414 | |
1415 "1234" is interpreted as a hex number. You must type four characters, prepend | |
1416 a zero if necessary. | |
1417 | |
1418 | |
1419 COMMAND ARGUMENTS *utf-8-char-arg* | |
1420 | |
1421 Commands like |f|, |F|, |t| and |r| take an argument of one character. For | |
167 | 1422 UTF-8 this argument may include one or two composing characters. These need |
7 | 1423 to be produced together with the base character, Vim doesn't wait for the next |
1424 character to be typed to find out if it is a composing character or not. | |
1425 Using 'keymap' or |:lmap| is a nice way to type these characters. | |
1426 | |
1427 The commands that search for a character in a line handle composing characters | |
1428 as follows. When searching for a character without a composing character, | |
1429 this will find matches in the text with or without composing characters. When | |
1430 searching for a character with a composing character, this will only find | |
1431 matches with that composing character. It was implemented this way, because | |
1432 not everybody is able to type a composing character. | |
1433 | |
1434 | |
1435 ============================================================================== | |
12920
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1436 12. Overview of options *mbyte-options* |
7 | 1437 |
1438 These options are relevant for editing multi-byte files. Check the help in | |
1439 options.txt for detailed information. | |
1440 | |
1441 'encoding' Encoding used for the keyboard and display. It is also the | |
1442 default encoding for files. | |
1443 | |
1444 'fileencoding' Encoding of a file. When it's different from 'encoding' | |
1445 conversion is done when reading or writing the file. | |
1446 | |
1447 'fileencodings' List of possible encodings of a file. When opening a file | |
1448 these will be tried and the first one that doesn't cause an | |
1449 error is used for 'fileencoding'. | |
1450 | |
1451 'charconvert' Expression used to convert files from one encoding to another. | |
1452 | |
1453 'formatoptions' The 'm' flag can be included to have formatting break a line | |
1454 at a multibyte character of 256 or higher. Thus is useful for | |
1455 languages where a sequence of characters can be broken | |
1456 anywhere. | |
1457 | |
1458 'guifontset' The list of font names used for a multi-byte encoding. When | |
1459 this option is not empty, it replaces 'guifont'. | |
1460 | |
1461 'keymap' Specify the name of a keyboard mapping. | |
1462 | |
1463 ============================================================================== | |
1464 | |
1465 Contributions specifically for the multi-byte features by: | |
1466 Chi-Deok Hwang <hwang@mizi.co.kr> | |
2033
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1467 SungHyun Nam <goweol@gmail.com> |
7 | 1468 K.Nagano <nagano@atese.advantest.co.jp> |
1469 Taro Muraoka <koron@tka.att.ne.jp> | |
1470 Yasuhiro Matsumoto <mattn@mail.goo.ne.jp> | |
1471 | |
14421 | 1472 vim:tw=78:ts=8:noet:ft=help:norl: |