Mercurial > vim
comparison runtime/doc/eval.txt @ 32307:8d6f53a07ffd v9.0.1485
patch 9.0.1485: no functions for converting from/to UTF-16 index
Commit: https://github.com/vim/vim/commit/67672ef097dd708244ff042a8364994da2b91e75
Author: Christian Brabandt <cb@256bit.org>
Date: Mon Apr 24 21:09:54 2023 +0100
patch 9.0.1485: no functions for converting from/to UTF-16 index
Problem: no functions for converting from/to UTF-16 index.
Solution: Add UTF-16 flag to existing funtions and add strutf16len() and
utf16idx(). (Yegappan Lakshmanan, closes #12216)
author | Bram Moolenaar <Bram@vim.org> |
---|---|
date | Mon, 24 Apr 2023 22:15:05 +0200 |
parents | b2e8663e6dcc |
children | 2a17771529af |
comparison
equal
deleted
inserted
replaced
32306:6d5e523b5b6a | 32307:8d6f53a07ffd |
---|---|
1578 < Hello, Peter! ~ | 1578 < Hello, Peter! ~ |
1579 > | 1579 > |
1580 echo $"The square root of {{9}} is {sqrt(9)}" | 1580 echo $"The square root of {{9}} is {sqrt(9)}" |
1581 < The square root of {9} is 3.0 ~ | 1581 < The square root of {9} is 3.0 ~ |
1582 | 1582 |
1583 *string-offset-encoding* | |
1584 A string consists of multiple characters. How the characters are stored | |
1585 depends on 'encoding'. Most common is UTF-8, which uses one byte for ASCII | |
1586 characters, two bytes for other latin characters and more bytes for other | |
1587 characters. | |
1588 | |
1589 A string offset can count characters or bytes. Other programs may use | |
1590 UTF-16 encoding (16-bit words) and an offset of UTF-16 words. Some functions | |
1591 use byte offsets, usually for UTF-8 encoding. Other functions use character | |
1592 offsets, in which case the encoding doesn't matter. | |
1593 | |
1594 The different offsets for the string "a©😊" are below: | |
1595 | |
1596 UTF-8 offsets: | |
1597 [0]: 61, [1]: C2, [2]: A9, [3]: F0, [4]: 9F, [5]: 98, [6]: 8A | |
1598 UTF-16 offsets: | |
1599 [0]: 0061, [1]: 00A9, [2]: D83D, [3]: DE0A | |
1600 UTF-32 (character) offsets: | |
1601 [0]: 00000061, [1]: 000000A9, [2]: 0001F60A | |
1602 | |
1603 You can use the "g8" and "ga" commands on a character to see the | |
1604 decimal/hex/octal values. | |
1605 | |
1606 The functions |byteidx()|, |utf16idx()| and |charidx()| can be used to convert | |
1607 between these indices. The functions |strlen()|, |strutf16len()| and | |
1608 |strcharlen()| return the number of bytes, UTF-16 code units and characters in | |
1609 a string respectively. | |
1583 | 1610 |
1584 option *expr-option* *E112* *E113* | 1611 option *expr-option* *E112* *E113* |
1585 ------ | 1612 ------ |
1586 &option option value, local value if possible | 1613 &option option value, local value if possible |
1587 &g:option global option value | 1614 &g:option global option value |