Mercurial > vim
annotate runtime/doc/mbyte.txt @ 15247:336728a577f5 v8.1.0632
patch 8.1.0632: using sign group names is inefficient
commit https://github.com/vim/vim/commit/7a2d9892b7158edf8dc48e9bcaaae70a40787b37
Author: Bram Moolenaar <Bram@vim.org>
Date: Mon Dec 24 20:23:49 2018 +0100
patch 8.1.0632: using sign group names is inefficient
Problem: Using sign group names is inefficient.
Solution: Store group names in a hash table and use a reference to them.
Also remove unnecessary use of ":exe" from the tests. (Yegappan
Lakshmanan, closes #3715)
author | Bram Moolenaar <Bram@vim.org> |
---|---|
date | Mon, 24 Dec 2018 20:30:04 +0100 |
parents | 2f7e67dd088c |
children | 314694a2e74a |
rev | line source |
---|---|
13963 | 1 *mbyte.txt* For Vim version 8.1. Last change: 2018 Jan 21 |
7 | 2 |
3 | |
4 VIM REFERENCE MANUAL by Bram Moolenaar et al. | |
5 | |
6 | |
7 Multi-byte support *multibyte* *multi-byte* | |
8 *Chinese* *Japanese* *Korean* | |
9 This is about editing text in languages which have many characters that can | |
10 not be represented using one byte (one octet). Examples are Chinese, Japanese | |
11 and Korean. Unicode is also covered here. | |
12 | |
13 For an introduction to the most common features, see |usr_45.txt| in the user | |
14 manual. | |
15 For changing the language of messages and menus see |mlang.txt|. | |
16 | |
2570
71b56b4e7785
Make the references to features in the help more consistent. (Sylvain Hitier)
Bram Moolenaar <bram@vim.org>
parents:
2561
diff
changeset
|
17 {not available when compiled without the |+multi_byte| feature} |
7 | 18 |
19 | |
20 1. Getting started |mbyte-first| | |
21 2. Locale |mbyte-locale| | |
22 3. Encoding |mbyte-encoding| | |
23 4. Using a terminal |mbyte-terminal| | |
24 5. Fonts on X11 |mbyte-fonts-X11| | |
25 6. Fonts on MS-Windows |mbyte-fonts-MSwin| | |
26 7. Input on X11 |mbyte-XIM| | |
27 8. Input on MS-Windows |mbyte-IME| | |
28 9. Input with a keymap |mbyte-keymap| | |
12920
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
29 10. Input with imactivatefunc() |mbyte-func| |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
30 11. Using UTF-8 |mbyte-utf8| |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
31 12. Overview of options |mbyte-options| |
7 | 32 |
33 NOTE: This file contains UTF-8 characters. These may show up as strange | |
34 characters or boxes when using another encoding. | |
35 | |
36 ============================================================================== | |
37 1. Getting started *mbyte-first* | |
38 | |
39 This is a summary of the multibyte features in Vim. If you are lucky it works | |
40 as described and you can start using Vim without much trouble. If something | |
41 doesn't work you will have to read the rest. Don't be surprised if it takes | |
42 quite a bit of work and experimenting to make Vim use all the multi-byte | |
43 features. Unfortunately, every system has its own way to deal with multibyte | |
44 languages and it is quite complicated. | |
45 | |
46 | |
47 COMPILING | |
48 | |
49 If you already have a compiled Vim program, check if the |+multi_byte| feature | |
50 is included. The |:version| command can be used for this. | |
51 | |
4502
605c9ce57ec3
Updated runtime files, language files and translations.
Bram Moolenaar <bram@vim.org>
parents:
4186
diff
changeset
|
52 If +multi_byte is not included, you should compile Vim with "normal", "big" or |
605c9ce57ec3
Updated runtime files, language files and translations.
Bram Moolenaar <bram@vim.org>
parents:
4186
diff
changeset
|
53 "huge" features. You can further tune what features are included. See the |
605c9ce57ec3
Updated runtime files, language files and translations.
Bram Moolenaar <bram@vim.org>
parents:
4186
diff
changeset
|
54 INSTALL files in the source directory. |
7 | 55 |
56 | |
57 LOCALE | |
58 | |
59 First of all, you must make sure your current locale is set correctly. If | |
60 your system has been installed to use the language, it probably works right | |
61 away. If not, you can often make it work by setting the $LANG environment | |
62 variable in your shell: > | |
63 | |
64 setenv LANG ja_JP.EUC | |
65 | |
66 Unfortunately, the name of the locale depends on your system. Japanese might | |
67 also be called "ja_JP.EUCjp" or just "ja". To see what is currently used: > | |
68 | |
69 :language | |
70 | |
71 To change the locale inside Vim use: > | |
72 | |
73 :language ja_JP.EUC | |
74 | |
75 Vim will give an error message if this doesn't work. This is a good way to | |
76 experiment and find the locale name you want to use. But it's always better | |
77 to set the locale in the shell, so that it is used right from the start. | |
78 | |
79 See |mbyte-locale| for details. | |
80 | |
81 | |
82 ENCODING | |
83 | |
84 If your locale works properly, Vim will try to set the 'encoding' option | |
85 accordingly. If this doesn't work you can overrule its value: > | |
86 | |
87 :set encoding=utf-8 | |
88 | |
89 See |encoding-values| for a list of acceptable values. | |
90 | |
91 The result is that all the text that is used inside Vim will be in this | |
92 encoding. Not only the text in the buffers, but also in registers, variables, | |
93 etc. This also means that changing the value of 'encoding' makes the existing | |
94 text invalid! The text doesn't change, but it will be displayed wrong. | |
95 | |
96 You can edit files in another encoding than what 'encoding' is set to. Vim | |
97 will convert the file when you read it and convert it back when you write it. | |
98 See 'fileencoding', 'fileencodings' and |++enc|. | |
99 | |
100 | |
101 DISPLAY AND FONTS | |
102 | |
103 If you are working in a terminal (emulator) you must make sure it accepts the | |
104 same encoding as which Vim is working with. If this is not the case, you can | |
105 use the 'termencoding' option to make Vim convert text automatically. | |
106 | |
107 For the GUI you must select fonts that work with the current 'encoding'. This | |
108 is the difficult part. It depends on the system you are using, the locale and | |
109 a few other things. See the chapters on fonts: |mbyte-fonts-X11| for | |
110 X-Windows and |mbyte-fonts-MSwin| for MS-Windows. | |
111 | |
112 For GTK+ 2, you can skip most of this section. The option 'guifontset' does | |
113 no longer exist. You only need to set 'guifont' and everything should "just | |
114 work". If your system comes with Xft2 and fontconfig and the current font | |
115 does not contain a certain glyph, a different font will be used automatically | |
116 if available. The 'guifontwide' option is still supported but usually you do | |
117 not need to set it. It is only necessary if the automatic font selection does | |
118 not suit your needs. | |
119 | |
120 For X11 you can set the 'guifontset' option to a list of fonts that together | |
121 cover the characters that are used. Example for Korean: > | |
122 | |
123 :set guifontset=k12,r12 | |
124 | |
125 Alternatively, you can set 'guifont' and 'guifontwide'. 'guifont' is used for | |
126 the single-width characters, 'guifontwide' for the double-width characters. | |
127 Thus the 'guifontwide' font must be exactly twice as wide as 'guifont'. | |
128 Example for UTF-8: > | |
129 | |
130 :set guifont=-misc-fixed-medium-r-normal-*-18-120-100-100-c-90-iso10646-1 | |
131 :set guifontwide=-misc-fixed-medium-r-normal-*-18-120-100-100-c-180-iso10646-1 | |
132 | |
133 You can also set 'guifont' alone, Vim will try to find a matching | |
134 'guifontwide' for you. | |
135 | |
136 | |
137 INPUT | |
138 | |
139 There are several ways to enter multi-byte characters: | |
140 - For X11 XIM can be used. See |XIM|. | |
141 - For MS-Windows IME can be used. See |IME|. | |
142 - For all systems keymaps can be used. See |mbyte-keymap|. | |
143 | |
144 The options 'iminsert', 'imsearch' and 'imcmdline' can be used to chose | |
9 | 145 the different input methods or disable them temporarily. |
7 | 146 |
147 ============================================================================== | |
148 2. Locale *mbyte-locale* | |
149 | |
150 The easiest setup is when your whole system uses the locale you want to work | |
151 in. But it's also possible to set the locale for one shell you are working | |
152 in, or just use a certain locale inside Vim. | |
153 | |
154 | |
155 WHAT IS A LOCALE? *locale* | |
156 | |
157 There are many of languages in the world. And there are different cultures | |
158 and environments at least as much as the number of languages. A linguistic | |
159 environment corresponding to an area is called "locale". This includes | |
160 information about the used language, the charset, collating order for sorting, | |
161 date format, currency format and so on. For Vim only the language and charset | |
162 really matter. | |
163 | |
164 You can only use a locale if your system has support for it. Some systems | |
165 have only a few locales, especially in the USA. The language which you want | |
166 to use may not be on your system. In that case you might be able to install | |
167 it as an extra package. Check your system documentation for how to do that. | |
168 | |
169 The location in which the locales are installed varies from system to system. | |
170 For example, "/usr/share/locale" or "/usr/lib/locale". See your system's | |
171 setlocale() man page. | |
172 | |
173 Looking in these directories will show you the exact name of each locale. | |
174 Mostly upper/lowercase matters, thus "ja_JP.EUC" and "ja_jp.euc" are | |
175 different. Some systems have a locale.alias file, which allows translation | |
176 from a short name like "nl" to the full name "nl_NL.ISO_8859-1". | |
177 | |
178 Note that X-windows has its own locale stuff. And unfortunately uses locale | |
179 names different from what is used elsewhere. This is confusing! For Vim it | |
180 matters what the setlocale() function uses, which is generally NOT the | |
181 X-windows stuff. You might have to do some experiments to find out what | |
182 really works. | |
183 | |
184 *locale-name* | |
185 The (simplified) format of |locale| name is: | |
186 | |
187 language | |
188 or language_territory | |
189 or language_territory.codeset | |
190 | |
191 Territory means the country (or part of it), codeset means the |charset|. For | |
192 example, the locale name "ja_JP.eucJP" means: | |
193 ja the language is Japanese | |
194 JP the country is Japan | |
195 eucJP the codeset is EUC-JP | |
196 But it also could be "ja", "ja_JP.EUC", "ja_JP.ujis", etc. And unfortunately, | |
197 the locale name for a specific language, territory and codeset is not unified | |
198 and depends on your system. | |
199 | |
200 Examples of locale name: | |
201 charset language locale name ~ | |
202 GB2312 Chinese (simplified) zh_CN.EUC, zh_CN.GB2312 | |
203 Big5 Chinese (traditional) zh_TW.BIG5, zh_TW.Big5 | |
204 CNS-11643 Chinese (traditional) zh_TW | |
205 EUC-JP Japanese ja, ja_JP.EUC, ja_JP.ujis, ja_JP.eucJP | |
206 Shift_JIS Japanese ja_JP.SJIS, ja_JP.Shift_JIS | |
207 EUC-KR Korean ko, ko_KR.EUC | |
208 | |
209 | |
210 USING A LOCALE | |
211 | |
212 To start using a locale for the whole system, see the documentation of your | |
213 system. Mostly you need to set it in a configuration file in "/etc". | |
214 | |
215 To use a locale in a shell, set the $LANG environment value. When you want to | |
216 use Korean and the |locale| name is "ko", do this: | |
217 | |
218 sh: export LANG=ko | |
219 csh: setenv LANG ko | |
220 | |
221 You can put this in your ~/.profile or ~/.cshrc file to always use it. | |
222 | |
223 To use a locale in Vim only, use the |:language| command: > | |
224 | |
225 :language ko | |
226 | |
227 Put this in your ~/.vimrc file to use it always. | |
228 | |
229 Or specify $LANG when starting Vim: | |
230 | |
231 sh: LANG=ko vim {vim-arguments} | |
232 csh: env LANG=ko vim {vim-arguments} | |
233 | |
234 You could make a small shell script for this. | |
235 | |
236 ============================================================================== | |
237 3. Encoding *mbyte-encoding* | |
238 | |
1621 | 239 Vim uses the 'encoding' option to specify how characters are identified and |
7 | 240 encoded when they are used inside Vim. This applies to all the places where |
241 text is used, including buffers (files loaded into memory), registers and | |
242 variables. | |
243 | |
244 *charset* *codeset* | |
245 Charset is another name for encoding. There are subtle differences, but these | |
246 don't matter when using Vim. "codeset" is another similar name. | |
247 | |
248 Each character is encoded as one or more bytes. When all characters are | |
249 encoded with one byte, we call this a single-byte encoding. The most often | |
250 used one is called "latin1". This limits the number of characters to 256. | |
251 Some of these are control characters, thus even fewer can be used for text. | |
252 | |
253 When some characters use two or more bytes, we call this a multi-byte | |
254 encoding. This allows using much more than 256 characters, which is required | |
255 for most East Asian languages. | |
256 | |
257 Most multi-byte encodings use one byte for the first 127 characters. These | |
258 are equal to ASCII, which makes it easy to exchange plain-ASCII text, no | |
259 matter what language is used. Thus you might see the right text even when the | |
260 encoding was set wrong. | |
261 | |
262 *encoding-names* | |
263 Vim can use many different character encodings. There are three major groups: | |
264 | |
265 1 8bit Single-byte encodings, 256 different characters. Mostly used | |
266 in USA and Europe. Example: ISO-8859-1 (Latin1). All | |
267 characters occupy one screen cell only. | |
268 | |
269 2 2byte Double-byte encodings, over 10000 different characters. | |
270 Mostly used in Asian countries. Example: euc-kr (Korean) | |
271 The number of screen cells is equal to the number of bytes | |
272 (except for euc-jp when the first byte is 0x8e). | |
273 | |
274 u Unicode Universal encoding, can replace all others. ISO 10646. | |
275 Millions of different characters. Example: UTF-8. The | |
276 relation between bytes and screen cells is complex. | |
277 | |
278 Other encodings cannot be used by Vim internally. But files in other | |
279 encodings can be edited by using conversion, see 'fileencoding'. | |
280 Note that all encodings must use ASCII for the characters up to 128 (except | |
281 when compiled for EBCDIC). | |
282 | |
283 Supported 'encoding' values are: *encoding-values* | |
2698
b6471224d2af
Updated runtime files and translations.
Bram Moolenaar <bram@vim.org>
parents:
2577
diff
changeset
|
284 1 latin1 8-bit characters (ISO 8859-1, also used for cp1252) |
7 | 285 1 iso-8859-n ISO_8859 variant (n = 2 to 15) |
286 1 koi8-r Russian | |
287 1 koi8-u Ukrainian | |
288 1 macroman MacRoman (Macintosh encoding) | |
289 1 8bit-{name} any 8-bit encoding (Vim specific name) | |
407 | 290 1 cp437 similar to iso-8859-1 |
291 1 cp737 similar to iso-8859-7 | |
292 1 cp775 Baltic | |
293 1 cp850 similar to iso-8859-4 | |
294 1 cp852 similar to iso-8859-1 | |
295 1 cp855 similar to iso-8859-2 | |
296 1 cp857 similar to iso-8859-5 | |
297 1 cp860 similar to iso-8859-9 | |
298 1 cp861 similar to iso-8859-1 | |
299 1 cp862 similar to iso-8859-1 | |
300 1 cp863 similar to iso-8859-8 | |
301 1 cp865 similar to iso-8859-1 | |
302 1 cp866 similar to iso-8859-5 | |
303 1 cp869 similar to iso-8859-7 | |
304 1 cp874 Thai | |
305 1 cp1250 Czech, Polish, etc. | |
306 1 cp1251 Cyrillic | |
307 1 cp1253 Greek | |
308 1 cp1254 Turkish | |
309 1 cp1255 Hebrew | |
310 1 cp1256 Arabic | |
311 1 cp1257 Baltic | |
312 1 cp1258 Vietnamese | |
7 | 313 1 cp{number} MS-Windows: any installed single-byte codepage |
314 2 cp932 Japanese (Windows only) | |
315 2 euc-jp Japanese (Unix only) | |
316 2 sjis Japanese (Unix only) | |
317 2 cp949 Korean (Unix and Windows) | |
318 2 euc-kr Korean (Unix only) | |
319 2 cp936 simplified Chinese (Windows only) | |
320 2 euc-cn simplified Chinese (Unix only) | |
321 2 cp950 traditional Chinese (on Unix alias for big5) | |
322 2 big5 traditional Chinese (on Windows alias for cp950) | |
323 2 euc-tw traditional Chinese (Unix only) | |
324 2 2byte-{name} Unix: any double-byte encoding (Vim specific name) | |
325 2 cp{number} MS-Windows: any installed double-byte codepage | |
326 u utf-8 32 bit UTF-8 encoded Unicode (ISO/IEC 10646-1) | |
327 u ucs-2 16 bit UCS-2 encoded Unicode (ISO/IEC 10646-1) | |
328 u ucs-2le like ucs-2, little endian | |
329 u utf-16 ucs-2 extended with double-words for more characters | |
330 u utf-16le like utf-16, little endian | |
331 u ucs-4 32 bit UCS-4 encoded Unicode (ISO/IEC 10646-1) | |
332 u ucs-4le like ucs-4, little endian | |
333 | |
334 The {name} can be any encoding name that your system supports. It is passed | |
335 to iconv() to convert between the encoding of the file and the current locale. | |
336 For MS-Windows "cp{number}" means using codepage {number}. | |
337 Examples: > | |
338 :set encoding=8bit-cp1252 | |
339 :set encoding=2byte-cp932 | |
2698
b6471224d2af
Updated runtime files and translations.
Bram Moolenaar <bram@vim.org>
parents:
2577
diff
changeset
|
340 |
b6471224d2af
Updated runtime files and translations.
Bram Moolenaar <bram@vim.org>
parents:
2577
diff
changeset
|
341 The MS-Windows codepage 1252 is very similar to latin1. For practical reasons |
b6471224d2af
Updated runtime files and translations.
Bram Moolenaar <bram@vim.org>
parents:
2577
diff
changeset
|
342 the same encoding is used and it's called latin1. 'isprint' can be used to |
b6471224d2af
Updated runtime files and translations.
Bram Moolenaar <bram@vim.org>
parents:
2577
diff
changeset
|
343 display the characters 0x80 - 0xA0 or not. |
b6471224d2af
Updated runtime files and translations.
Bram Moolenaar <bram@vim.org>
parents:
2577
diff
changeset
|
344 |
7 | 345 Several aliases can be used, they are translated to one of the names above. |
346 An incomplete list: | |
347 | |
348 1 ansi same as latin1 (obsolete, for backward compatibility) | |
349 2 japan Japanese: on Unix "euc-jp", on MS-Windows cp932 | |
350 2 korea Korean: on Unix "euc-kr", on MS-Windows cp949 | |
351 2 prc simplified Chinese: on Unix "euc-cn", on MS-Windows cp936 | |
352 2 chinese same as "prc" | |
353 2 taiwan traditional Chinese: on Unix "euc-tw", on MS-Windows cp950 | |
354 u utf8 same as utf-8 | |
355 u unicode same as ucs-2 | |
356 u ucs2be same as ucs-2 (big endian) | |
357 u ucs-2be same as ucs-2 (big endian) | |
358 u ucs-4be same as ucs-4 (big endian) | |
1621 | 359 u utf-32 same as ucs-4 |
360 u utf-32le same as ucs-4le | |
39 | 361 default stands for the default value of 'encoding', depends on the |
856 | 362 environment |
7 | 363 |
364 For the UCS codes the byte order matters. This is tricky, use UTF-8 whenever | |
365 you can. The default is to use big-endian (most significant byte comes | |
366 first): | |
367 name bytes char ~ | |
368 ucs-2 11 22 1122 | |
369 ucs-2le 22 11 1122 | |
370 ucs-4 11 22 33 44 11223344 | |
371 ucs-4le 44 33 22 11 11223344 | |
372 | |
373 On MS-Windows systems you often want to use "ucs-2le", because it uses little | |
374 endian UCS-2. | |
375 | |
376 There are a few encodings which are similar, but not exactly the same. Vim | |
377 treats them as if they were different encodings, so that conversion will be | |
378 done when needed. You might want to use the similar name to avoid conversion | |
379 or when conversion is not possible: | |
380 | |
381 cp932, shift-jis, sjis | |
382 cp936, euc-cn | |
383 | |
384 *encoding-table* | |
385 Normally 'encoding' is equal to your current locale and 'termencoding' is | |
386 empty. This means that your keyboard and display work with characters encoded | |
387 in your current locale, and Vim uses the same characters internally. | |
388 | |
389 You can make Vim use characters in a different encoding by setting the | |
390 'encoding' option to a different value. Since the keyboard and display still | |
391 use the current locale, conversion needs to be done. The 'termencoding' then | |
392 takes over the value of the current locale, so Vim converts between 'encoding' | |
393 and 'termencoding'. Example: > | |
394 :let &termencoding = &encoding | |
395 :set encoding=utf-8 | |
396 | |
397 However, not all combinations of values are possible. The table below tells | |
398 you how each of the nine combinations works. This is further restricted by | |
399 not all conversions being possible, iconv() being present, etc. Since this | |
400 depends on the system used, no detailed list can be given. | |
401 | |
402 ('tenc' is the short name for 'termencoding' and 'enc' short for 'encoding') | |
403 | |
404 'tenc' 'enc' remark ~ | |
405 | |
406 8bit 8bit Works. When 'termencoding' is different from | |
407 'encoding' typing and displaying may be wrong for some | |
408 characters, Vim does NOT perform conversion (set | |
409 'encoding' to "utf-8" to get this). | |
410 8bit 2byte MS-Windows: works for all codepages installed on your | |
411 system; you can only type 8bit characters; | |
412 Other systems: does NOT work. | |
1121 | 413 8bit Unicode Works, but only 8bit characters can be typed directly |
414 (others through digraphs, keymaps, etc.); in a | |
7 | 415 terminal you can only see 8bit characters; the GUI can |
416 show all characters that the 'guifont' supports. | |
417 | |
418 2byte 8bit Works, but typing non-ASCII characters might | |
419 be a problem. | |
420 2byte 2byte MS-Windows: works for all codepages installed on your | |
421 system; typing characters might be a problem when | |
422 locale is different from 'encoding'. | |
423 Other systems: Only works when 'termencoding' is equal | |
424 to 'encoding', you might as well leave it empty. | |
425 2byte Unicode works, Vim will translate typed characters. | |
426 | |
427 Unicode 8bit works (unusual) | |
428 Unicode 2byte does NOT work | |
429 Unicode Unicode works very well (leaving 'termencoding' empty works | |
430 the same way, because all Unicode is handled | |
431 internally as UTF-8) | |
432 | |
433 CONVERSION *charset-conversion* | |
434 | |
435 Vim will automatically convert from one to another encoding in several places: | |
436 - When reading a file and 'fileencoding' is different from 'encoding' | |
437 - When writing a file and 'fileencoding' is different from 'encoding' | |
438 - When displaying characters and 'termencoding' is different from 'encoding' | |
439 - When reading input and 'termencoding' is different from 'encoding' | |
440 - When displaying messages and the encoding used for LC_MESSAGES differs from | |
441 'encoding' (requires a gettext version that supports this). | |
442 - When reading a Vim script where |:scriptencoding| is different from | |
443 'encoding'. | |
444 - When reading or writing a |viminfo| file. | |
445 Most of these require the |+iconv| feature. Conversion for reading and | |
446 writing files may also be specified with the 'charconvert' option. | |
447 | |
448 Useful utilities for converting the charset: | |
449 All: iconv | |
450 GNU iconv can convert most encodings. Unicode is used as the | |
451 intermediate encoding, which allows conversion from and to all other | |
452 encodings. See http://www.gnu.org/directory/libiconv.html. | |
453 | |
454 Japanese: nkf | |
455 Nkf is "Network Kanji code conversion Filter". One of the most unique | |
456 facility of nkf is the guess of the input Kanji code. So, you don't | |
457 need to know what the inputting file's |charset| is. When convert to | |
458 EUC-JP from ISO-2022-JP or Shift_JIS, simply do the following command | |
459 in Vim: | |
460 :%!nkf -e | |
461 Nkf can be found at: | |
462 http://www.sfc.wide.ad.jp/~max/FreeBSD/ports/distfiles/nkf-1.62.tar.gz | |
463 | |
464 Chinese: hc | |
465 Hc is "Hanzi Converter". Hc convert a GB file to a Big5 file, or Big5 | |
466 file to GB file. Hc can be found at: | |
467 ftp://ftp.cuhk.hk/pub/chinese/ifcss/software/unix/convert/hc-30.tar.gz | |
468 | |
469 Korean: hmconv | |
236 | 470 Hmconv is Korean code conversion utility especially for E-mail. It can |
7 | 471 convert between EUC-KR and ISO-2022-KR. Hmconv can be found at: |
472 ftp://ftp.kaist.ac.kr/pub/hangul/code/hmconv/ | |
473 | |
474 Multilingual: lv | |
475 Lv is a Powerful Multilingual File Viewer. And it can be worked as | |
476 |charset| converter. Supported |charset|: ISO-2022-CN, ISO-2022-JP, | |
477 ISO-2022-KR, EUC-CN, EUC-JP, EUC-KR, EUC-TW, UTF-7, UTF-8, ISO-8859 | |
236 | 478 series, Shift_JIS, Big5 and HZ. Lv can be found at: |
3682 | 479 http://www.ff.iij4u.or.jp/~nrt/lv/index.html |
7 | 480 |
481 | |
482 *mbyte-conversion* | |
483 When reading and writing files in an encoding different from 'encoding', | |
484 conversion needs to be done. These conversions are supported: | |
485 - All conversions between Latin-1 (ISO-8859-1), UTF-8, UCS-2 and UCS-4 are | |
486 handled internally. | |
487 - For MS-Windows, when 'encoding' is a Unicode encoding, conversion from and | |
488 to any codepage should work. | |
489 - Conversion specified with 'charconvert' | |
490 - Conversion with the iconv library, if it is available. | |
491 Old versions of GNU iconv() may cause the conversion to fail (they | |
492 request a very large buffer, more than Vim is willing to provide). | |
493 Try getting another iconv() implementation. | |
494 | |
557 | 495 *iconv-dynamic* |
496 On MS-Windows Vim can be compiled with the |+iconv/dyn| feature. This means | |
497 Vim will search for the "iconv.dll" and "libiconv.dll" libraries. When | |
498 neither of them can be found Vim will still work but some conversions won't be | |
499 possible. | |
500 | |
7 | 501 ============================================================================== |
502 4. Using a terminal *mbyte-terminal* | |
503 | |
504 The GUI fully supports multi-byte characters. It is also possible in a | |
505 terminal, if the terminal supports the same encoding that Vim uses. Thus this | |
506 is less flexible. | |
507 | |
508 For example, you can run Vim in a xterm with added multi-byte support and/or | |
509 |XIM|. Examples are kterm (Kanji term) and hanterm (for Korean), Eterm | |
510 (Enlightened terminal) and rxvt. | |
511 | |
512 If your terminal does not support the right encoding, you can set the | |
513 'termencoding' option. Vim will then convert the typed characters from | |
514 'termencoding' to 'encoding'. And displayed text will be converted from | |
515 'encoding' to 'termencoding'. If the encoding supported by the terminal | |
516 doesn't include all the characters that Vim uses, this leads to lost | |
517 characters. This may mess up the display. If you use a terminal that | |
518 supports Unicode, such as the xterm mentioned below, it should work just fine, | |
519 since nearly every character set can be converted to Unicode without loss of | |
520 information. | |
521 | |
522 | |
523 UTF-8 IN XFREE86 XTERM *UTF8-xterm* | |
524 | |
525 This is a short explanation of how to use UTF-8 character encoding in the | |
526 xterm that comes with XFree86 by Thomas Dickey (text by Markus Kuhn). | |
527 | |
528 Get the latest xterm version which has now UTF-8 support: | |
529 | |
530 http://invisible-island.net/xterm/xterm.html | |
531 | |
532 Compile it with "./configure --enable-wide-chars ; make" | |
533 | |
534 Also get the ISO 10646-1 version of various fonts, which is available on | |
535 | |
536 http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz | |
537 | |
538 and install the font as described in the README file. | |
539 | |
540 Now start xterm with > | |
541 | |
542 xterm -u8 -fn -misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1 | |
543 or, for bigger character: > | |
544 xterm -u8 -fn -misc-fixed-medium-r-normal--15-140-75-75-c-90-iso10646-1 | |
545 | |
236 | 546 and you will have a working UTF-8 terminal emulator. Try both > |
7 | 547 |
548 cat utf-8-demo.txt | |
549 vim utf-8-demo.txt | |
550 | |
551 with the demo text that comes with ucs-fonts.tar.gz in order to see | |
552 whether there are any problems with UTF-8 in your xterm. | |
553 | |
554 For Vim you may need to set 'encoding' to "utf-8". | |
555 | |
556 ============================================================================== | |
557 5. Fonts on X11 *mbyte-fonts-X11* | |
558 | |
559 Unfortunately, using fonts in X11 is complicated. The name of a single-byte | |
560 font is a long string. For multi-byte fonts we need several of these... | |
561 | |
562 Note: Most of this is no longer relevant for GTK+ 2. Selecting a font via | |
2033
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
563 its XLFD is not supported; see 'guifont' for an example of how to |
7 | 564 set the font. Do yourself a favor and ignore the |XLFD| and |xfontset| |
565 sections below. | |
566 | |
567 First of all, Vim only accepts fixed-width fonts for displaying text. You | |
568 cannot use proportionally spaced fonts. This excludes many of the available | |
569 (and nicer looking) fonts. However, for menus and tooltips any font can be | |
570 used. | |
571 | |
572 Note that Display and Input are independent. It is possible to see your | |
573 language even though you have no input method for it. | |
574 | |
575 You should get a default font for menus and tooltips that works, but it might | |
576 be ugly. Read the following to find out how to select a better font. | |
577 | |
578 | |
579 X LOGICAL FONT DESCRIPTION (XLFD) | |
580 *XLFD* | |
581 XLFD is the X font name and contains the information about the font size, | |
582 charset, etc. The name is in this format: | |
583 | |
584 FOUNDRY-FAMILY-WEIGHT-SLANT-WIDTH-STYLE-PIXEL-POINT-X-Y-SPACE-AVE-CR-CE | |
585 | |
586 Each field means: | |
587 | |
588 - FOUNDRY: FOUNDRY field. The company that created the font. | |
589 - FAMILY: FAMILY_NAME field. Basic font family name. (helvetica, gothic, | |
590 times, etc) | |
591 - WEIGHT: WEIGHT_NAME field. How thick the letters are. (light, medium, | |
592 bold, etc) | |
593 - SLANT: SLANT field. | |
594 r: Roman (no slant) | |
595 i: Italic | |
596 o: Oblique | |
597 ri: Reverse Italic | |
598 ro: Reverse Oblique | |
599 ot: Other | |
600 number: Scaled font | |
601 - WIDTH: SETWIDTH_NAME field. Width of characters. (normal, condensed, | |
602 narrow, double wide) | |
603 - STYLE: ADD_STYLE_NAME field. Extra info to describe font. (Serif, Sans | |
604 Serif, Informal, Decorated, etc) | |
605 - PIXEL: PIXEL_SIZE field. Height, in pixels, of characters. | |
606 - POINT: POINT_SIZE field. Ten times height of characters in points. | |
607 - X: RESOLUTION_X field. X resolution (dots per inch). | |
608 - Y: RESOLUTION_Y field. Y resolution (dots per inch). | |
609 - SPACE: SPACING field. | |
610 p: Proportional | |
611 m: Monospaced | |
612 c: CharCell | |
613 - AVE: AVERAGE_WIDTH field. Ten times average width in pixels. | |
614 - CR: CHARSET_REGISTRY field. The name of the charset group. | |
615 - CE: CHARSET_ENCODING field. The rest of the charset name. For some | |
616 charsets, such as JIS X 0208, if this field is 0, code points has | |
617 the same value as GL, and GR if 1. | |
618 | |
3682 | 619 For example, in case of a 16 dots font corresponding to JIS X 0208, it is |
7 | 620 written like: |
621 -misc-fixed-medium-r-normal--16-110-100-100-c-160-jisx0208.1990-0 | |
622 | |
623 | |
624 X FONTSET | |
625 *fontset* *xfontset* | |
626 A single-byte charset is typically associated with one font. For multi-byte | |
627 charsets a combination of fonts is often used. This means that one group of | |
628 characters are used from one font and another group from another font (which | |
629 might be double wide). This collection of fonts is called a fontset. | |
630 | |
631 Which fonts are required in a fontset depends on the current locale. X | |
632 windows maintains a table of which groups of characters are required for a | |
633 locale. You have to specify all the fonts that a locale requires in the | |
634 'guifontset' option. | |
635 | |
636 NOTE: The fontset always uses the current locale, even though 'encoding' may | |
637 be set to use a different charset. In that situation you might want to use | |
638 'guifont' and 'guifontwide' instead of 'guifontset'. | |
639 | |
640 Example: | |
641 |charset| language "groups of characters" ~ | |
642 GB2312 Chinese (simplified) ISO-8859-1 and GB 2312 | |
643 Big5 Chinese (traditional) ISO-8859-1 and Big5 | |
644 CNS-11643 Chinese (traditional) ISO-8859-1, CNS 11643-1 and CNS 11643-2 | |
645 EUC-JP Japanese JIS X 0201 and JIS X 0208 | |
646 EUC-KR Korean ISO-8859-1 and KS C 5601 (KS X 1001) | |
647 | |
648 You can search for fonts using the xlsfonts command. For example, when you're | |
649 searching for a font for KS C 5601: > | |
650 xlsfonts | grep ksc5601 | |
651 | |
652 This is complicated and confusing. You might want to consult the X-Windows | |
653 documentation if there is something you don't understand. | |
654 | |
655 *base_font_name_list* | |
656 When you have found the names of the fonts you want to use, you need to set | |
657 the 'guifontset' option. You specify the list by concatenating the font names | |
658 and putting a comma in between them. | |
659 | |
660 For example, when you use the ja_JP.eucJP locale, this requires JIS X 0201 | |
661 and JIS X 0208. You could supply a list of fonts that explicitly specifies | |
662 the charsets, like: > | |
663 | |
664 :set guifontset=-misc-fixed-medium-r-normal--14-130-75-75-c-140-jisx0208.1983-0, | |
665 \-misc-fixed-medium-r-normal--14-130-75-75-c-70-jisx0201.1976-0 | |
666 | |
667 Alternatively, you can supply a base font name list that omits the charset | |
668 name, letting X-Windows select font characters required for the locale. For | |
669 example: > | |
670 | |
671 :set guifontset=-misc-fixed-medium-r-normal--14-130-75-75-c-140, | |
672 \-misc-fixed-medium-r-normal--14-130-75-75-c-70 | |
673 | |
674 Alternatively, you can supply a single base font name that allows X-Windows to | |
675 select from all available fonts. For example: > | |
676 | |
677 :set guifontset=-misc-fixed-medium-r-normal--14-* | |
678 | |
679 Alternatively, you can specify alias names. See the fonts.alias file in the | |
680 fonts directory (e.g., /usr/X11R6/lib/X11/fonts/). For example: > | |
681 | |
682 :set guifontset=k14,r14 | |
683 < | |
684 *E253* | |
685 Note that in East Asian fonts, the standard character cell is square. When | |
686 mixing a Latin font and an East Asian font, the East Asian font width should | |
687 be twice the Latin font width. | |
688 | |
689 If 'guifontset' is not empty, the "font" argument of the |:highlight| command | |
690 is also interpreted as a fontset. For example, you should use for | |
691 highlighting: > | |
692 :hi Comment font=english_font,your_font | |
693 If you use a wrong "font" argument you will get an error message. | |
694 Also make sure that you set 'guifontset' before setting fonts for highlight | |
695 groups. | |
696 | |
697 | |
698 USING RESOURCE FILES | |
699 | |
700 Instead of specifying 'guifontset', you can set X11 resources and Vim will | |
701 pick them up. This is only for people who know how X resource files work. | |
702 | |
703 For Motif and Athena insert these three lines in your $HOME/.Xdefaults file: | |
704 | |
705 Vim.font: |base_font_name_list| | |
706 Vim*fontSet: |base_font_name_list| | |
707 Vim*fontList: your_language_font | |
708 | |
709 Note: Vim.font is for text area. | |
710 Vim*fontSet is for menu. | |
711 Vim*fontList is for menu (for Motif GUI) | |
712 | |
713 For example, when you are using Japanese and a 14 dots font, > | |
714 | |
715 Vim.font: -misc-fixed-medium-r-normal--14-* | |
716 Vim*fontSet: -misc-fixed-medium-r-normal--14-* | |
717 Vim*fontList: -misc-fixed-medium-r-normal--14-* | |
718 < | |
719 or: > | |
720 | |
721 Vim*font: k14,r14 | |
722 Vim*fontSet: k14,r14 | |
723 Vim*fontList: k14,r14 | |
724 < | |
725 To have them take effect immediately you will have to do > | |
726 | |
727 xrdb -merge ~/.Xdefaults | |
728 | |
729 Otherwise you will have to stop and restart the X server before the changes | |
730 take effect. | |
731 | |
732 | |
733 The GTK+ version of GUI Vim does not use .Xdefaults, use ~/.gtkrc instead. | |
734 The default mostly works OK. But for the menus you might have to change | |
735 it. Example: > | |
736 | |
737 style "default" | |
738 { | |
739 fontset="-*-*-medium-r-normal--14-*-*-*-c-*-*-*" | |
740 } | |
741 widget_class "*" style "default" | |
742 | |
743 ============================================================================== | |
744 6. Fonts on MS-Windows *mbyte-fonts-MSwin* | |
745 | |
746 The simplest is to use the font dialog to select fonts and try them out. You | |
747 can find this at the "Edit/Select Font..." menu. Once you find a font name | |
748 that works well you can use this command to see its name: > | |
749 | |
750 :set guifont | |
751 | |
752 Then add a command to your |gvimrc| file to set 'guifont': > | |
753 | |
754 :set guifont=courier_new:h12 | |
755 | |
756 ============================================================================== | |
757 7. Input on X11 *mbyte-XIM* | |
758 | |
759 X INPUT METHOD (XIM) BACKGROUND *XIM* *xim* *x-input-method* | |
760 | |
2207
b17bbfa96fa0
Add the settabvar() and gettabvar() functions.
Bram Moolenaar <bram@vim.org>
parents:
2154
diff
changeset
|
761 XIM is an international input module for X. There are two kinds of structures, |
7 | 762 Xlib unit type and |IM-server| (Input-Method server) type. |IM-server| type |
763 is suitable for complex input, such as CJK. | |
764 | |
765 - IM-server | |
766 *IM-server* | |
767 In |IM-server| type input structures, the input event is handled by either | |
768 of the two ways: FrontEnd system and BackEnd system. In the FrontEnd | |
769 system, input events are snatched by the |IM-server| first, then |IM-server| | |
770 give the application the result of input. On the other hand, the BackEnd | |
771 system works reverse order. MS Windows adopt BackEnd system. In X, most of | |
772 |IM-server|s adopt FrontEnd system. The demerit of BackEnd system is the | |
773 large overhead in communication, but it provides safe synchronization with | |
774 no restrictions on applications. | |
775 | |
776 For example, there are xwnmo and kinput2 Japanese |IM-server|, both are | |
777 FrontEnd system. Xwnmo is distributed with Wnn (see below), kinput2 can be | |
778 found at: ftp://ftp.sra.co.jp/pub/x11/kinput2/ | |
779 | |
780 For Chinese, there's a great XIM server named "xcin", you can input both | |
781 Traditional and Simplified Chinese characters. And it can accept other | |
782 locale if you make a correct input table. Xcin can be found at: | |
2236
dc2e5ec0500d
Added the undofile() function. Updated runtime files.
Bram Moolenaar <bram@vim.org>
parents:
2207
diff
changeset
|
783 http://cle.linux.org.tw/xcin/ |
15 | 784 Others are scim: http://scim.freedesktop.org/ and fcitx: |
856 | 785 http://www.fcitx.org/ |
7 | 786 |
787 - Conversion Server | |
788 *conversion-server* | |
789 Some system needs additional server: conversion server. Most of Japanese | |
790 |IM-server|s need it, Kana-Kanji conversion server. For Chinese inputting, | |
791 it depends on the method of inputting, in some methods, PinYin or ZhuYin to | |
792 HanZi conversion server is needed. For Korean inputting, if you want to | |
793 input Hanja, Hangul-Hanja conversion server is needed. | |
794 | |
795 For example, the Japanese inputting process is divided into 2 steps. First | |
796 we pre-input Hira-gana, second Kana-Kanji conversion. There are so many | |
797 Kanji characters (6349 Kanji characters are defined in JIS X 0208) and the | |
798 number of Hira-gana characters are 76. So, first, we pre-input text as | |
799 pronounced in Hira-gana, second, we convert Hira-gana to Kanji or Kata-Kana, | |
800 if needed. There are some Kana-Kanji conversion server: jserver | |
3153 | 801 (distributed with Wnn, see below) and canna. Canna can be found at: |
802 http://canna.sourceforge.jp/ | |
7 | 803 |
804 There is a good input system: Wnn4.2. Wnn 4.2 contains, | |
805 xwnmo (|IM-server|) | |
806 jserver (Japanese Kana-Kanji conversion server) | |
807 cserver (Chinese PinYin or ZhuYin to simplified HanZi conversion server) | |
808 tserver (Chinese PinYin or ZhuYin to traditional HanZi conversion server) | |
809 kserver (Hangul-Hanja conversion server) | |
810 Wnn 4.2 for several systems can be found at various places on the internet. | |
811 Use the RPM or port for your system. | |
812 | |
813 | |
814 - Input Style | |
815 *xim-input-style* | |
816 When inputting CJK, there are four areas: | |
817 1. The area to display of the input while it is being composed | |
818 2. The area to display the currently active input mode. | |
819 3. The area to display the next candidate for the selection. | |
820 4. The area to display other tools. | |
821 | |
822 The third area is needed when converting. For example, in Japanese | |
823 inputting, multiple Kanji characters could have the same pronunciation, so | |
824 a sequence of Hira-gana characters could map to a distinct sequence of Kanji | |
825 characters. | |
826 | |
827 The first and second areas are defined in international input of X with the | |
828 names of "Preedit Area", "Status Area" respectively. The third and fourth | |
829 areas are not defined and are left to be managed by the |IM-server|. In the | |
830 international input, four input styles have been defined using combinations | |
831 of Preedit Area and Status Area: |OnTheSpot|, |OffTheSpot|, |OverTheSpot| | |
832 and |Root|. | |
833 | |
2207
b17bbfa96fa0
Add the settabvar() and gettabvar() functions.
Bram Moolenaar <bram@vim.org>
parents:
2154
diff
changeset
|
834 Currently, GUI Vim supports three styles, |OverTheSpot|, |OffTheSpot| and |
7 | 835 |Root|. |
12293
1ff5e5dfa9b0
patch 8.0.1026: GTK on-the-spot input has problems
Christian Brabandt <cb@256bit.org>
parents:
10198
diff
changeset
|
836 When compiled with |+GUI_GTK| feature, GUI Vim supports two styles, |
1ff5e5dfa9b0
patch 8.0.1026: GTK on-the-spot input has problems
Christian Brabandt <cb@256bit.org>
parents:
10198
diff
changeset
|
837 |OnTheSpot| and |OverTheSpot|. You can select the style with the 'imstyle' |
1ff5e5dfa9b0
patch 8.0.1026: GTK on-the-spot input has problems
Christian Brabandt <cb@256bit.org>
parents:
10198
diff
changeset
|
838 option. |
7 | 839 |
840 *. on-the-spot *OnTheSpot* | |
841 Preedit Area and Status Area are performed by the client application in | |
842 the area of application. The client application is directed by the | |
843 |IM-server| to display all pre-edit data at the location of text | |
236 | 844 insertion. The client registers callbacks invoked by the input method |
7 | 845 during pre-editing. |
846 *. over-the-spot *OverTheSpot* | |
847 Status Area is created in a fixed position within the area of application, | |
848 in case of Vim, the position is the additional status line. Preedit Area | |
849 is made at present input position of application. The input method | |
850 displays pre-edit data in a window which it brings up directly over the | |
851 text insertion position. | |
852 *. off-the-spot *OffTheSpot* | |
853 Preedit Area and Status Area are performed in the area of application, in | |
854 case of Vim, the area is additional status line. The client application | |
855 provides display windows for the pre-edit data to the input method which | |
856 displays into them directly. | |
857 *. root-window *Root* | |
858 Preedit Area and Status Area are outside of the application. The input | |
859 method displays all pre-edit data in a separate area of the screen in a | |
860 window specific to the input method. | |
861 | |
862 | |
863 USING XIM *multibyte-input* *E284* *E286* *E287* *E288* | |
3410
94601b379f38
Updated runtime files. Add Dutch translations.
Bram Moolenaar <bram@vim.org>
parents:
3153
diff
changeset
|
864 *E285* *E289* |
7 | 865 |
866 Note that Display and Input are independent. It is possible to see your | |
867 language even though you have no input method for it. But when your Display | |
868 method doesn't match your Input method, the text will be displayed wrong. | |
869 | |
870 Note: You can not use IM unless you specify 'guifontset'. | |
871 Therefore, Latin users, you have to also use 'guifontset' | |
872 if you use IM. | |
873 | |
874 To input your language you should run the |IM-server| which supports your | |
875 language and |conversion-server| if needed. | |
876 | |
877 The next 3 lines should be put in your ~/.Xdefaults file. They are common for | |
878 all X applications which uses |XIM|. If you already use |XIM|, you can skip | |
879 this. > | |
880 | |
881 *international: True | |
882 *.inputMethod: your_input_server_name | |
883 *.preeditType: your_input_style | |
884 < | |
885 input_server_name is your |IM-server| name (check your |IM-server| | |
886 manual). | |
887 your_input_style is one of |OverTheSpot|, |OffTheSpot|, |Root|. See | |
888 also |xim-input-style|. | |
889 | |
890 *international may not necessary if you use X11R6. | |
891 *.inputMethod and *.preeditType are optional if you use X11R6. | |
892 | |
893 For example, when you are using kinput2 as |IM-server|, > | |
894 | |
895 *international: True | |
896 *.inputMethod: kinput2 | |
897 *.preeditType: OverTheSpot | |
898 < | |
899 When using |OverTheSpot|, GUI Vim always connects to the IM Server even in | |
900 Normal mode, so you can input your language with commands like "f" and "r". | |
901 But when using one of the other two methods, GUI Vim connects to the IM Server | |
902 only if it is not in Normal mode. | |
903 | |
904 If your IM Server does not support |OverTheSpot|, and if you want to use your | |
905 language with some Normal mode command like "f" or "r", then you should use a | |
906 localized xterm or an xterm which supports |XIM| | |
907 | |
908 If needed, you can set the XMODIFIERS environment variable: | |
909 | |
910 sh: export XMODIFIERS="@im=input_server_name" | |
911 csh: setenv XMODIFIERS "@im=input_server_name" | |
912 | |
913 For example, when you are using kinput2 as |IM-server| and sh, > | |
914 | |
915 export XMODIFIERS="@im=kinput2" | |
916 < | |
917 | |
918 FULLY CONTROLLED XIM | |
919 | |
920 You can fully control XIM, like with IME of MS-Windows (see |multibyte-ime|). | |
921 This is currently only available for the GTK GUI. | |
922 | |
923 Before using fully controlled XIM, one setting is required. Set the | |
924 'imactivatekey' option to the key that is used for the activation of the input | |
925 method. For example, when you are using kinput2 + canna as IM Server, the | |
926 activation key is probably Shift+Space: > | |
927 | |
928 :set imactivatekey=S-space | |
929 | |
930 See 'imactivatekey' for the format. | |
931 | |
932 ============================================================================== | |
933 8. Input on MS-Windows *mbyte-IME* | |
934 | |
935 (Windows IME support) *multibyte-ime* *IME* | |
936 | |
937 {only works Windows GUI and compiled with the |+multi_byte_ime| feature} | |
938 | |
2415 | 939 To input multibyte characters on Windows, you can use an Input Method Editor |
7 | 940 (IME). In process of your editing text, you must switch status (on/off) of |
941 IME many many many times. Because IME with status on is hooking all of your | |
942 key inputs, you cannot input 'j', 'k', or almost all of keys to Vim directly. | |
943 | |
944 This |+multi_byte_ime| feature help this. It reduce times of switch status of | |
945 IME manually. In normal mode, there are almost no need working IME, even | |
946 editing multibyte text. So exiting insert mode with ESC, Vim memorize last | |
947 status of IME and force turn off IME. When re-enter insert mode, Vim revert | |
948 IME status to that memorized automatically. | |
949 | |
950 This works on not only insert-normal mode, but also search-command input and | |
951 replace mode. | |
952 The options 'iminsert', 'imsearch' and 'imcmdline' can be used to chose | |
9 | 953 the different input methods or disable them temporarily. |
7 | 954 |
955 WHAT IS IME | |
956 IME is a part of East asian version Windows. That helps you to input | |
957 multibyte character. English and other language version Windows does not | |
2355
84c7eeeb09e2
Fix typos in documentation. (Dominique Pelle)
Bram Moolenaar <bram@vim.org>
parents:
2345
diff
changeset
|
958 have any IME. (Also there is no need usually.) But there is one that |
7 | 959 called Microsoft Global IME. Global IME is a part of Internet Explorer |
960 4.0 or above. You can get more information about Global IME, at below | |
961 URL. | |
962 | |
963 WHAT IS GLOBAL IME *global-ime* | |
964 Global IME makes capability to input Chinese, Japanese, and Korean text | |
965 into Vim buffer on any language version of Windows 98, Windows 95, and | |
966 Windows NT 4.0. | |
967 On Windows 2000 and XP it should work as well (without downloading). On | |
968 Windows 2000 Professional, Global IME is built in, and the Input Locales | |
969 can be added through Control Panel/Regional Options/Input Locales. | |
970 Please see below URL for detail of Global IME. You can also find various | |
971 language version of Global IME at same place. | |
972 | |
973 - Global IME detailed information. | |
2236
dc2e5ec0500d
Added the undofile() function. Updated runtime files.
Bram Moolenaar <bram@vim.org>
parents:
2207
diff
changeset
|
974 http://search.microsoft.com/results.aspx?q=global+ime |
7 | 975 |
976 - Active Input Method Manager (Global IME) | |
2236
dc2e5ec0500d
Added the undofile() function. Updated runtime files.
Bram Moolenaar <bram@vim.org>
parents:
2207
diff
changeset
|
977 http://msdn.microsoft.com/en-us/library/aa741221(v=VS.85).aspx |
7 | 978 |
1621 | 979 Support for Global IME is an experimental feature. |
7 | 980 |
981 NOTE: For IME to work you must make sure the input locales of your language | |
982 are added to your system. The exact location of this depends on the version | |
1621 | 983 of Windows you use. For example, on my Windows 2000 box: |
7 | 984 1. Control Panel |
985 2. Regional Options | |
986 3. Input Locales Tab | |
987 4. Add Installed input locales -> Chinese(PRC) | |
988 The default is still English (United Stated) | |
989 | |
990 | |
991 Cursor color when IME or XIM is on *CursorIM* | |
992 There is a little cute feature for IME. Cursor can indicate status of IME | |
993 by changing its color. Usually status of IME was indicated by little icon | |
994 at a corner of desktop (or taskbar). It is not easy to verify status of | |
995 IME. But this feature help this. | |
996 This works in the same way when using XIM. | |
997 | |
998 You can select cursor color when status is on by using highlight group | |
819 | 999 CursorIM. For example, add these lines to your |gvimrc|: > |
7 | 1000 |
1001 if has('multi_byte_ime') | |
1002 highlight Cursor guifg=NONE guibg=Green | |
1003 highlight CursorIM guifg=NONE guibg=Purple | |
1004 endif | |
1005 < | |
1006 Cursor color with off IME is green. And purple cursor indicates that | |
1007 status is on. | |
1008 | |
1009 ============================================================================== | |
1010 9. Input with a keymap *mbyte-keymap* | |
1011 | |
1012 When the keyboard doesn't produce the characters you want to enter in your | |
1013 text, you can use the 'keymap' option. This will translate one or more | |
1014 (English) characters to another (non-English) character. This only happens | |
1015 when typing text, not when typing Vim commands. This avoids having to switch | |
1016 between two keyboard settings. | |
9644
9f7bcc2c3b97
commit https://github.com/vim/vim/commit/6f1d9a096bf22d50c727dca73abbfb8e3ff55176
Christian Brabandt <cb@256bit.org>
parents:
5294
diff
changeset
|
1017 {only available when compiled with the |+keymap| feature} |
7 | 1018 |
1019 The value of the 'keymap' option specifies a keymap file to use. The name of | |
1020 this file is one of these two: | |
1021 | |
1022 keymap/{keymap}_{encoding}.vim | |
1023 keymap/{keymap}.vim | |
1024 | |
1025 Here {keymap} is the value of the 'keymap' option and {encoding} of the | |
1026 'encoding' option. The file name with the {encoding} included is tried first. | |
1027 | |
1028 'runtimepath' is used to find these files. To see an overview of all | |
1029 available keymap files, use this: > | |
1030 :echo globpath(&rtp, "keymap/*.vim") | |
1031 | |
1032 In Insert and Command-line mode you can use CTRL-^ to toggle between using the | |
1033 keyboard map or not. |i_CTRL-^| |c_CTRL-^| | |
1034 This flag is remembered for Insert mode with the 'iminsert' option. When | |
1035 leaving and entering Insert mode the previous value is used. The same value | |
1036 is also used for commands that take a single character argument, like |f| and | |
1037 |r|. | |
1038 For Command-line mode the flag is NOT remembered. You are expected to type an | |
1039 Ex command first, which is ASCII. | |
1040 For typing search patterns the 'imsearch' option is used. It can be set to | |
1041 use the same value as for 'iminsert'. | |
2033
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1042 *lCursor* |
7 | 1043 It is possible to give the GUI cursor another color when the language mappings |
1044 are being used. This is disabled by default, to avoid that the cursor becomes | |
1045 invisible when you use a non-standard background color. Here is an example to | |
1046 use a brightly colored cursor: > | |
1047 :highlight Cursor guifg=NONE guibg=Green | |
1048 :highlight lCursor guifg=NONE guibg=Cyan | |
1049 < | |
839 | 1050 *keymap-file-format* *:loadk* *:loadkeymap* *E105* *E791* |
7 | 1051 The keymap file looks something like this: > |
1052 | |
1053 " Maintainer: name <email@address> | |
1054 " Last Changed: 2001 Jan 1 | |
1055 | |
1056 let b:keymap_name = "short" | |
1057 | |
1058 loadkeymap | |
1059 a A | |
1060 b B comment | |
1061 | |
1062 The lines starting with a " are comments and will be ignored. Blank lines are | |
1063 also ignored. The lines with the mappings may have a comment after the useful | |
1064 text. | |
1065 | |
1066 The "b:keymap_name" can be set to a short name, which will be shown in the | |
1067 status line. The idea is that this takes less room than the value of | |
1068 'keymap', which might be long to distinguish between different languages, | |
1069 keyboards and encodings. | |
1070 | |
1071 The actual mappings are in the lines below "loadkeymap". In the example "a" | |
1072 is mapped to "A" and "b" to "B". Thus the first item is mapped to the second | |
1073 item. This is done for each line, until the end of the file. | |
1074 These items are exactly the same as what can be used in a |:lnoremap| command, | |
4186 | 1075 using "<buffer>" to make the mappings local to the buffer. |
7 | 1076 You can check the result with this command: > |
1077 :lmap | |
1078 The two items must be separated by white space. You cannot include white | |
1079 space inside an item, use the special names "<Tab>" and "<Space>" instead. | |
1080 The length of the two items together must not exceed 200 bytes. | |
1081 | |
1082 It's possible to have more than one character in the first column. This works | |
1083 like a dead key. Example: > | |
1084 'a á | |
1085 Since Vim doesn't know if the next character after a quote is really an "a", | |
1086 it will wait for the next character. To be able to insert a single quote, | |
1087 also add this line: > | |
1088 '' ' | |
1089 Since the mapping is defined with |:lnoremap| the resulting quote will not be | |
1090 used for the start of another character. | |
818 | 1091 The "accents" keymap uses this. *keymap-accents* |
7 | 1092 |
3893 | 1093 The first column can also be in |<>| form: |
1094 <C-c> Ctrl-C | |
1095 <A-c> Alt-c | |
1096 <A-C> Alt-C | |
1097 Note that the Alt mappings may not work, depending on your keyboard and | |
1098 terminal. | |
1099 | |
7 | 1100 Although it's possible to have more than one character in the second column, |
1101 this is unusual. But you can use various ways to specify the character: > | |
1102 A a literal character | |
1103 A <char-97> decimal value | |
1104 A <char-0x61> hexadecimal value | |
1105 A <char-0141> octal value | |
1106 x <Space> special key name | |
1107 | |
1108 The characters are assumed to be encoded for the current value of 'encoding'. | |
1109 It's possible to use ":scriptencoding" when all characters are given | |
1110 literally. That doesn't work when using the <char-> construct, because the | |
1111 conversion is done on the keymap file, not on the resulting character. | |
1112 | |
1113 The lines after "loadkeymap" are interpreted with 'cpoptions' set to "C". | |
1114 This means that continuation lines are not used and a backslash has a special | |
1115 meaning in the mappings. Examples: > | |
1116 | |
1117 " a comment line | |
1118 \" x maps " to x | |
1119 \\ y maps \ to y | |
1120 | |
1121 If you write a keymap file that will be useful for others, consider submitting | |
1122 it to the Vim maintainer for inclusion in the distribution: | |
1123 <maintainer@vim.org> | |
1124 | |
1125 | |
1126 HEBREW KEYMAP *keymap-hebrew* | |
1127 | |
1128 This file explains what characters are available in UTF-8 and CP1255 encodings, | |
1129 and what the keymaps are to get those characters: | |
1130 | |
1131 glyph encoding keymap ~ | |
1132 Char utf-8 cp1255 hebrew hebrewp name ~ | |
1133 א 0x5d0 0xe0 t a 'alef | |
1134 ב 0x5d1 0xe1 c b bet | |
1135 ג 0x5d2 0xe2 d g gimel | |
1136 ד 0x5d3 0xe3 s d dalet | |
1137 ה 0x5d4 0xe4 v h he | |
1138 ו 0x5d5 0xe5 u v vav | |
1139 ז 0x5d6 0xe6 z z zayin | |
1140 ח 0x5d7 0xe7 j j het | |
1141 ט 0x5d8 0xe8 y T tet | |
1142 י 0x5d9 0xe9 h y yod | |
1143 ך 0x5da 0xea l K kaf sofit | |
1144 כ 0x5db 0xeb f k kaf | |
1145 ל 0x5dc 0xec k l lamed | |
1146 ם 0x5dd 0xed o M mem sofit | |
1147 מ 0x5de 0xee n m mem | |
1148 ן 0x5df 0xef i N nun sofit | |
1149 נ 0x5e0 0xf0 b n nun | |
1150 ס 0x5e1 0xf1 x s samech | |
1151 ע 0x5e2 0xf2 g u `ayin | |
1152 ף 0x5e3 0xf3 ; P pe sofit | |
1153 פ 0x5e4 0xf4 p p pe | |
1154 ץ 0x5e5 0xf5 . X tsadi sofit | |
1155 צ 0x5e6 0xf6 m x tsadi | |
1156 ק 0x5e7 0xf7 e q qof | |
1157 ר 0x5e8 0xf8 r r resh | |
1158 ש 0x5e9 0xf9 a w shin | |
1159 ת 0x5ea 0xfa , t tav | |
1160 | |
1161 Vowel marks and special punctuation: | |
1162 הְ 0x5b0 0xc0 A: A: sheva | |
1163 הֱ 0x5b1 0xc1 HE HE hataf segol | |
1164 הֲ 0x5b2 0xc2 HA HA hataf patah | |
1165 הֳ 0x5b3 0xc3 HO HO hataf qamats | |
1166 הִ 0x5b4 0xc4 I I hiriq | |
1167 הֵ 0x5b5 0xc5 AY AY tsere | |
1168 הֶ 0x5b6 0xc6 E E segol | |
1169 הַ 0x5b7 0xc7 AA AA patah | |
1170 הָ 0x5b8 0xc8 AO AO qamats | |
1171 הֹ 0x5b9 0xc9 O O holam | |
1172 הֻ 0x5bb 0xcb U U qubuts | |
1173 כּ 0x5bc 0xcc D D dagesh | |
1174 הֽ 0x5bd 0xcd ]T ]T meteg | |
1175 ה־ 0x5be 0xce ]Q ]Q maqaf | |
1176 בֿ 0x5bf 0xcf ]R ]R rafe | |
1177 ב׀ 0x5c0 0xd0 ]p ]p paseq | |
1178 שׁ 0x5c1 0xd1 SR SR shin-dot | |
1179 שׂ 0x5c2 0xd2 SL SL sin-dot | |
1180 ׃ 0x5c3 0xd3 ]P ]P sof-pasuq | |
1181 װ 0x5f0 0xd4 VV VV double-vav | |
1182 ױ 0x5f1 0xd5 VY VY vav-yod | |
1183 ײ 0x5f2 0xd6 YY YY yod-yod | |
1184 | |
1185 The following are only available in utf-8 | |
1186 | |
1187 Cantillation marks: | |
1188 glyph | |
1189 Char utf-8 hebrew name | |
1190 ב֑ 0x591 C: etnahta | |
1191 ב֒ 0x592 Cs segol | |
1192 ב֓ 0x593 CS shalshelet | |
1193 ב֔ 0x594 Cz zaqef qatan | |
1194 ב֕ 0x595 CZ zaqef gadol | |
1195 ב֖ 0x596 Ct tipeha | |
1196 ב֗ 0x597 Cr revia | |
1197 ב֘ 0x598 Cq zarqa | |
1198 ב֙ 0x599 Cp pashta | |
1199 ב֚ 0x59a C! yetiv | |
1200 ב֛ 0x59b Cv tevir | |
1201 ב֜ 0x59c Cg geresh | |
1202 ב֝ 0x59d C* geresh qadim | |
1203 ב֞ 0x59e CG gershayim | |
1204 ב֟ 0x59f CP qarnei-parah | |
1205 ב֪ 0x5aa Cy yerach-ben-yomo | |
1206 ב֫ 0x5ab Co ole | |
1207 ב֬ 0x5ac Ci iluy | |
1208 ב֭ 0x5ad Cd dehi | |
1209 ב֮ 0x5ae Cn zinor | |
1210 ב֯ 0x5af CC masora circle | |
1211 | |
1212 Combining forms: | |
1213 ﬠ 0xfb20 X` Alternative `ayin | |
1214 ﬡ 0xfb21 X' Alternative 'alef | |
1215 ﬢ 0xfb22 X-d Alternative dalet | |
1216 ﬣ 0xfb23 X-h Alternative he | |
1217 ﬤ 0xfb24 X-k Alternative kaf | |
1218 ﬥ 0xfb25 X-l Alternative lamed | |
1219 ﬦ 0xfb26 X-m Alternative mem-sofit | |
1220 ﬧ 0xfb27 X-r Alternative resh | |
1221 ﬨ 0xfb28 X-t Alternative tav | |
1222 ﬩ 0xfb29 X-+ Alternative plus | |
1223 שׁ 0xfb2a XW shin+shin-dot | |
1224 שׂ 0xfb2b Xw shin+sin-dot | |
1225 שּׁ 0xfb2c X..W shin+shin-dot+dagesh | |
1226 שּׂ 0xfb2d X..w shin+sin-dot+dagesh | |
1227 אַ 0xfb2e XA alef+patah | |
1228 אָ 0xfb2f XO alef+qamats | |
1229 אּ 0xfb30 XI alef+hiriq (mapiq) | |
1230 בּ 0xfb31 X.b bet+dagesh | |
1231 גּ 0xfb32 X.g gimel+dagesh | |
1232 דּ 0xfb33 X.d dalet+dagesh | |
1233 הּ 0xfb34 X.h he+dagesh | |
1234 וּ 0xfb35 Xu vav+dagesh | |
1235 זּ 0xfb36 X.z zayin+dagesh | |
1236 טּ 0xfb38 X.T tet+dagesh | |
1237 יּ 0xfb39 X.y yud+dagesh | |
1238 ךּ 0xfb3a X.K kaf sofit+dagesh | |
1239 כּ 0xfb3b X.k kaf+dagesh | |
1240 לּ 0xfb3c X.l lamed+dagesh | |
1241 מּ 0xfb3e X.m mem+dagesh | |
1242 נּ 0xfb40 X.n nun+dagesh | |
1243 סּ 0xfb41 X.s samech+dagesh | |
1244 ףּ 0xfb43 X.P pe sofit+dagesh | |
1245 פּ 0xfb44 X.p pe+dagesh | |
1246 צּ 0xfb46 X.x tsadi+dagesh | |
1247 קּ 0xfb47 X.q qof+dagesh | |
1248 רּ 0xfb48 X.r resh+dagesh | |
1249 שּ 0xfb49 X.w shin+dagesh | |
1250 תּ 0xfb4a X.t tav+dagesh | |
1251 וֹ 0xfb4b Xo vav+holam | |
1252 בֿ 0xfb4c XRb bet+rafe | |
1253 כֿ 0xfb4d XRk kaf+rafe | |
1254 פֿ 0xfb4e XRp pe+rafe | |
1255 ﭏ 0xfb4f Xal alef-lamed | |
1256 | |
1257 ============================================================================== | |
12920
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1258 10. Input with imactivatefunc() *mbyte-func* |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1259 |
12968 | 1260 Vim has the 'imactivatefunc' and 'imstatusfunc' options. These are useful to |
13125 | 1261 activate/deactivate the input method from Vim in any way, also with an external |
12920
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1262 command. For example, fcitx provide fcitx-remote command: > |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1263 |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1264 set iminsert=2 |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1265 set imsearch=2 |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1266 set imcmdline |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1267 |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1268 set imactivatefunc=ImActivate |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1269 function! ImActivate(active) |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1270 if a:active |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1271 call system('fcitx-remote -o') |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1272 else |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1273 call system('fcitx-remote -c') |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1274 endif |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1275 endfunction |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1276 |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1277 set imstatusfunc=ImStatus |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1278 function! ImStatus() |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1279 return system('fcitx-remote')[0] is# '2' |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1280 endfunction |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1281 |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1282 Using this script, you can activate/deactivate XIM via Vim even when it is not |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1283 compiled with |+xim|. |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1284 |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1285 ============================================================================== |
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1286 11. Using UTF-8 *mbyte-utf8* *UTF-8* *utf-8* *utf8* |
7 | 1287 *Unicode* *unicode* |
1288 The Unicode character set was designed to include all characters from other | |
1289 character sets. Therefore it is possible to write text in any language using | |
1290 Unicode (with a few rarely used languages excluded). And it's mostly possible | |
1291 to mix these languages in one file, which is impossible with other encodings. | |
1292 | |
2033
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1293 Unicode can be encoded in several ways. The most popular one is UTF-8, which |
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1294 uses one or more bytes for each character and is backwards compatible with |
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1295 ASCII. On MS-Windows UTF-16 is also used (previously UCS-2), which uses |
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1296 16-bit words. Vim can support all of these encodings, but always uses UTF-8 |
7 | 1297 internally. |
1298 | |
2033
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1299 Vim has comprehensive UTF-8 support. It works well in: |
7 | 1300 - xterm with utf-8 support enabled |
1301 - Athena, Motif and GTK GUI | |
1302 - MS-Windows GUI | |
2033
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1303 - several other platforms |
7 | 1304 |
1305 Double-width characters are supported. This works best with 'guifontwide' or | |
1306 'guifontset'. When using only 'guifont' the wide characters are drawn in the | |
1307 normal width and a space to fill the gap. Note that the 'guifontset' option | |
1308 is no longer relevant in the GTK+ 2 GUI. | |
1309 | |
2033
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1310 *bom-bytes* |
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1311 When reading a file a BOM (Byte Order Mark) can be used to recognize the |
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1312 Unicode encoding: |
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1313 EF BB BF utf-8 |
2290
22529abcd646
Fixed ":s" message. Docs updates.
Bram Moolenaar <bram@vim.org>
parents:
2236
diff
changeset
|
1314 FE FF utf-16 big endian |
22529abcd646
Fixed ":s" message. Docs updates.
Bram Moolenaar <bram@vim.org>
parents:
2236
diff
changeset
|
1315 FF FE utf-16 little endian |
2033
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1316 00 00 FE FF utf-32 big endian |
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1317 FF FE 00 00 utf-32 little endian |
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1318 |
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1319 Utf-8 is the recommended encoding. Note that it's difficult to tell utf-16 |
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1320 and utf-32 apart. Utf-16 is often used on MS-Windows, utf-32 is not |
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1321 widespread as file format. |
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1322 |
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1323 |
714 | 1324 *mbyte-combining* *mbyte-composing* |
1325 A composing or combining character is used to change the meaning of the | |
1326 character before it. The combining characters are drawn on top of the | |
856 | 1327 preceding character. |
714 | 1328 Up to two combining characters can be used by default. This can be changed |
1329 with the 'maxcombine' option. | |
1330 When editing text a composing character is mostly considered part of the | |
1331 preceding character. For example "x" will delete a character and its | |
1332 following composing characters by default. | |
1333 If the 'delcombine' option is on, then pressing 'x' will delete the combining | |
7 | 1334 characters, one at a time, then the base character. But when inserting, you |
1335 type the first character and the following composing characters separately, | |
1336 after which they will be joined. The "r" command will not allow you to type a | |
1337 combining character, because it doesn't know one is coming. Use "R" instead. | |
1338 | |
1339 Bytes which are not part of a valid UTF-8 byte sequence are handled like a | |
1340 single character and displayed as <xx>, where "xx" is the hex value of the | |
1341 byte. | |
1342 | |
1343 Overlong sequences are not handled specially and displayed like a valid | |
1344 character. However, search patterns may not match on an overlong sequence. | |
1345 (an overlong sequence is where more bytes are used than required for the | |
1346 character.) An exception is NUL (zero) which is displayed as "<00>". | |
1347 | |
1348 In the file and buffer the full range of Unicode characters can be used (31 | |
2965 | 1349 bits). However, displaying only works for the characters present in the |
1350 selected font. | |
7 | 1351 |
1352 Useful commands: | |
1353 - "ga" shows the decimal, hexadecimal and octal value of the character under | |
236 | 1354 the cursor. If there are composing characters these are shown too. (If the |
7 | 1355 message is truncated, use ":messages"). |
1356 - "g8" shows the bytes used in a UTF-8 character, also the composing | |
1357 characters, as hex numbers. | |
1358 - ":set encoding=utf-8 fileencodings=" forces using UTF-8 for all files. The | |
1359 default is to use the current locale for 'encoding' and set 'fileencodings' | |
1621 | 1360 to automatically detect the encoding of a file. |
7 | 1361 |
1362 | |
1363 STARTING VIM | |
1364 | |
1365 If your current locale is in an utf-8 encoding, Vim will automatically start | |
1366 in utf-8 mode. | |
1367 | |
1368 If you are using another locale: > | |
1369 | |
1370 set encoding=utf-8 | |
1371 | |
1372 You might also want to select the font used for the menus. Unfortunately this | |
1373 doesn't always work. See the system specific remarks below, and 'langmenu'. | |
1374 | |
1375 | |
1376 USING UTF-8 IN X-Windows *utf-8-in-xwindows* | |
1377 | |
1378 Note: This section does not apply to the GTK+ 2 GUI. | |
1379 | |
1380 You need to specify a font to be used. For double-wide characters another | |
1381 font is required, which is exactly twice as wide. There are three ways to do | |
1382 this: | |
1383 | |
1384 1. Set 'guifont' and let Vim find a matching 'guifontwide' | |
1385 2. Set 'guifont' and 'guifontwide' | |
1386 3. Set 'guifontset' | |
1387 | |
1388 See the documentation for each option for details. Example: > | |
1389 | |
1390 :set guifont=-misc-fixed-medium-r-normal--15-140-75-75-c-90-iso10646-1 | |
1391 | |
1392 You might also want to set the font used for the menus. This only works for | |
1393 Motif. Use the ":hi Menu font={fontname}" command for this. |:highlight| | |
1394 | |
1395 | |
1396 TYPING UTF-8 *utf-8-typing* | |
1397 | |
1398 If you are using X-Windows, you should find an input method that supports | |
1399 utf-8. | |
1400 | |
1401 If your system does not provide support for typing utf-8, you can use the | |
1402 'keymap' feature. This allows writing a keymap file, which defines a utf-8 | |
1403 character as a sequence of ASCII characters. See |mbyte-keymap|. | |
1404 | |
1405 Another method is to set the current locale to the language you want to use | |
1406 and for which you have a XIM available. Then set 'termencoding' to that | |
1407 language and Vim will convert the typed characters to 'encoding' for you. | |
1408 | |
1409 If everything else fails, you can type any character as four hex bytes: > | |
1410 | |
1411 CTRL-V u 1234 | |
1412 | |
1413 "1234" is interpreted as a hex number. You must type four characters, prepend | |
1414 a zero if necessary. | |
1415 | |
1416 | |
1417 COMMAND ARGUMENTS *utf-8-char-arg* | |
1418 | |
1419 Commands like |f|, |F|, |t| and |r| take an argument of one character. For | |
167 | 1420 UTF-8 this argument may include one or two composing characters. These need |
7 | 1421 to be produced together with the base character, Vim doesn't wait for the next |
1422 character to be typed to find out if it is a composing character or not. | |
1423 Using 'keymap' or |:lmap| is a nice way to type these characters. | |
1424 | |
1425 The commands that search for a character in a line handle composing characters | |
1426 as follows. When searching for a character without a composing character, | |
1427 this will find matches in the text with or without composing characters. When | |
1428 searching for a character with a composing character, this will only find | |
1429 matches with that composing character. It was implemented this way, because | |
1430 not everybody is able to type a composing character. | |
1431 | |
1432 | |
1433 ============================================================================== | |
12920
327e1264b9bf
patch 8.0.1336: cannot use imactivatefunc() unless compiled with +xim
Christian Brabandt <cb@256bit.org>
parents:
12293
diff
changeset
|
1434 12. Overview of options *mbyte-options* |
7 | 1435 |
1436 These options are relevant for editing multi-byte files. Check the help in | |
1437 options.txt for detailed information. | |
1438 | |
1439 'encoding' Encoding used for the keyboard and display. It is also the | |
1440 default encoding for files. | |
1441 | |
1442 'fileencoding' Encoding of a file. When it's different from 'encoding' | |
1443 conversion is done when reading or writing the file. | |
1444 | |
1445 'fileencodings' List of possible encodings of a file. When opening a file | |
1446 these will be tried and the first one that doesn't cause an | |
1447 error is used for 'fileencoding'. | |
1448 | |
1449 'charconvert' Expression used to convert files from one encoding to another. | |
1450 | |
1451 'formatoptions' The 'm' flag can be included to have formatting break a line | |
1452 at a multibyte character of 256 or higher. Thus is useful for | |
1453 languages where a sequence of characters can be broken | |
1454 anywhere. | |
1455 | |
1456 'guifontset' The list of font names used for a multi-byte encoding. When | |
1457 this option is not empty, it replaces 'guifont'. | |
1458 | |
1459 'keymap' Specify the name of a keyboard mapping. | |
1460 | |
1461 ============================================================================== | |
1462 | |
1463 Contributions specifically for the multi-byte features by: | |
1464 Chi-Deok Hwang <hwang@mizi.co.kr> | |
2033
de5a43c5eedc
Update documentation files.
Bram Moolenaar <bram@zimbu.org>
parents:
1702
diff
changeset
|
1465 SungHyun Nam <goweol@gmail.com> |
7 | 1466 K.Nagano <nagano@atese.advantest.co.jp> |
1467 Taro Muraoka <koron@tka.att.ne.jp> | |
1468 Yasuhiro Matsumoto <mattn@mail.goo.ne.jp> | |
1469 | |
14421 | 1470 vim:tw=78:ts=8:noet:ft=help:norl: |