JavaScript: String.fromCharCode vs String.fromCodePoint

neotam Avatar

String.fromCharCode vs String.fromCodePoint
Posted on :

Tags :

String.fromCodePoint was introduced in ES6 to address shortcomings of String.fromCharCode to deal with UNICODE. Discussing String.fromCharCode vs String.fromCodePoint, both methods receive sequence of code points but String.fromCharCode require surrogate pair for supplementary characters while String.fromCodePoint also can receive directly code points (equivalent UTF-32 code unit) as arguments instead of surrogates

Both String.fromCharCode and String.fromCodePoint are static methods. Which are called directly on String rather than String Object.

String.fromCharCode Method

String.fromCharCode uses the UTF-16 by which most unicode characters can be represented by a single 16-bit value called code unit. Using 16-bits only unicode characters till code point 0xFFFF can be represented. All unicode characters beyond 0xFFFF, from 0x010000 (65536) to 0x10FFFF (1114111) are called supplementary characters. Sine code points of these supplementary characters require more than 16 bits, in UTF-16 encoding supplementary characters are represented by two 16-bit code units referred as surrogates. A surrogates pair (pair of 16-bit units) is a unique pair of code units that represent the supplementary character.

String.fromCharCode cannot return the UNICODE supplementary characters by specifying their code point directly which ranges from 0x10000 0x10FFFF. Instead, it takes two surrogates (16-bit code units) to return a supplementary character.

For example take unicode character “Sunrise Over Mountains” which is a supplementary character and it’s code point is “U+1F304”. As said String.fromCharCode cannot return a supplementary character given it’s code point

> String.fromCharCode(0x1F304)  // Given code point of supplementary char, returns invalid character
"๏Œ„"

Here is the valid way to pass supplementary characters using surrogate pair to String.fromCharCode as follows

> String.fromCharCode(0xD83C, 0xDF04) 
๐ŸŒ„

Since String.fromCharcode only works with UTF-16 encoding, to get supplementary characters it is required to pass surrogate pair. Given character “Sunrise Over Mountains” (code point “U+1F304” ) surrogates are 0xD83C and 0xDF04. This surrogate pair is passed as String.fromCharCode(0xD83C, 0xDF04) .

Unicode escape sequence characters \u also require surrogates for supplementary characters. i.e, Character U+1F304 is represented as “\uD83C\uDF04”

String.fromCodePoint Method

The static method String.fromCodePoint take sequence of Unicode code points and returns the string. Unlike String.fromCharCode, the method String.fromCodePoint can return 4 byte Unicode characters by taking code point which is UTF-32 code unit for supplementary characters. This method also works with surrogate pairs. As it can take both code points and surrogates as arguments, it is smart enough to differentiate surrogate pair and code point.

Here is simple example that shows String.fromCodePoint returns character by receiving coding point as an argument which is equivalent to UTF-32 code unit

> String.fromCodePoint(0x1F304) 
๐ŸŒ„

String.fromCodePoint can take both surrogates and code point as arguments for supplementary characters

> String.fromCodePoint(0xD83C, 0xDF04, 0x1F308)  // 0xD83C and 0xDF04 are surrogate pair of ๐ŸŒ„ while 0x1F308 is code point of ๐ŸŒˆ
"๐ŸŒ„๐ŸŒˆ"

Using String.fromCharCode for supplementary characters require extra step to lookup the surrogate pair as a result it is efficient to use String.fromCodePoint

In Unicode “code points” are numbers that represent the characters while “code units” are numbers that encode these code points so that characters can be stored or transmitted. At least one code unit is required to encode code point. Depends on encoding, two or more code units are used to encode single code point. Well know encoding of Unicode are UTF-8, UTF-16 and UTF-32

String.fromCharCode vs String.fromCodePoint
String.fromCharCode vs String.fromCodePoint

There are other related string methods String.prototype.charCodeAt() and String.prototype.codePointAt() that also follow the same convention.

Leave a Reply

Your email address will not be published. Required fields are marked *