Char Code Pt UTF-8 UTF-16 ──── ─────── ───── ────── H U+0048 48 0048 e U+0065 65 0065 l U+006C 6C 006C l U+006C 6C 006C o U+006F 6F 006F SP U+0020 20 0020 世 U+4E16 E4 B8 96 4E16 界 U+754C E7 95 8C 754C SP U+0020 20 0020 👋 U+1F44B F0 9F 91 8B D83D DC4B 10 code points · 17 UTF-8 bytes · 11 UTF-16 units
About UTF-8 Byte Inspector — Code Point & Byte Viewer
The UTF Byte Inspector breaks any string into individual characters and shows each one's Unicode code point alongside its exact byte sequences in UTF-8 and UTF-16. It's built for developers debugging encoding issues — mojibake, mismatched byte counts, unexpected emoji widths, or surrogate pairs that throw off string length.
Paste a word, an emoji, or a tricky combining sequence and see precisely how it is stored: which code points it contains, how many UTF-8 bytes each takes, and how characters above U+FFFF split into UTF-16 surrogate pairs.
Everything runs entirely in your browser using the built-in TextEncoder and string APIs. Nothing you type is ever uploaded — the inspection happens offline on your device.
Features
- Per-character breakdown: code point, UTF-8 bytes, and UTF-16 code units
- Correct handling of emoji and supplementary characters (surrogate pairs)
- Totals for code points and UTF-8 byte length at a glance
- Readable labels for whitespace and control characters; works fully offline
How to use
- Type or paste any text into the input pane.
- Read the per-character table: each row shows the character, its code point, and its UTF-8 / UTF-16 bytes.
- Check the totals line for the code-point count and overall UTF-8 byte length.
- Copy the breakdown from the output pane to share or paste into a bug report.
Frequently asked questions
Why does an emoji count as more than one byte?
Most emoji are encoded above U+FFFF, so they take 4 bytes in UTF-8 and a 2-unit surrogate pair in UTF-16. That is why a single emoji can report a string length of 2 in JavaScript — the inspector shows exactly how it splits.
What is the difference between a code point and a byte?
A code point is the abstract Unicode number for a character (e.g. U+00E9 for é). A byte is how that code point is physically stored. UTF-8 uses 1–4 bytes per code point; UTF-16 uses one or two 16-bit code units. One character can map to several bytes.
Why does "é" sometimes show up as two entries?
There are two ways to write é: a single precomposed code point (U+00E9) or the letter e followed by a combining acute accent (U+0301). The inspector iterates code point by code point, so the combining form appears as two separate rows.
Is my text sent to a server?
No. All inspection happens locally in your browser using the built-in TextEncoder and string APIs. Your input never leaves your device.
Related tools
Everything runs locally in your browser — your input is never uploaded.