What is it?
Base64 encoding is a common way to represent binary data in environments that only support text.
It is often used to include images inline in an HTML page:
<img src="" />
or include a profile picture or similar in a JSON API:
{
"username": "smiling_face",
"picture": "iVBORw0KGgoAAAANSUhEUgAAABkAAAAZCAMAAADzN3VRAAAA+VBMVEVmRQBoRgFpRwGCXg6EYA+GYRCIYxGLZxOPaRWTbBaTbReUbheXcBmYcRmZchq9kiy+kizGmjDImzHJnTLMnzPrukPwv0Xxv0bywEbzwUbzwUf1wkf8yUv9ykz+ykz+y0z/zE3/zE7/zVD/zVL/zVP/zlX/zlb/z1f/0Fv/0WD/0mH/0mL/0mP/02b/02j/1Gj/1W3/2Xv/2n3/2n//2oD/24L/3Yj/3Yn/3oz/4Zb/4Zf/4pr/453/6K3/6K7/6bP/7L3/7L7/8c7/8c//9d3/9t7/9t//+ej/+u7/++//+/H/+/L//PT//PX//fb//vv///3///7////BnZiOAAABB0lEQVQoz3WS61LCMBCF17vUewUVRVhQFESkoCgKSlUu9Vai7/8wzslCJ+mU86PnbL7MNpMN/c0TGVl9BCqBDHuNInOx0RvaJOhwpE5gkEGVDVUHERmV2VJ5NCWfNY7p+kvII4p8VhazeXyfNBmXkFPkwlxyYKUxyDNihmgFvkqUgb+ANGPkAN4EudL9N2hPum3LGUDOdCycyAmOC9rOQS44QZcg9SRyA3IvXWZrEh5AfB131tOHzEfptV1dvoKEFcScQ7SwSOTkUFVCfTt9vevU3Vxa3tqXo/Xl3tRt/P93ajqFb88G3k80ubBtgnZoTPvXj2ZU85X9Qibv3Zbntbpvk4RXZesf0tV5+8YLzGEAAAAASUVORK5CYII="
}
or in MIME emails (remove the new lines to decode this using the visual decoder below):
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary=boundary
--boundary
Content-Type: image/png
Content-Transfer-Encoding: base64
iVBORw0KGgoAAAANSUhEUgAAABkAAAAZCAMAAADzN3VRAAAA+VBMVEVmRQBoRgFpRwGCXg6E
YA+GYRCIYxGLZxOPaRWTbBaTbReUbheXcBmYcRmZchq9kiy+kizGmjDImzHJnTLMnzPrukPw
v0Xxv0bywEbzwUbzwUf1wkf8yUv9ykz+ykz+y0z/zE3/zE7/zVD/zVL/zVP/zlX/zlb/z1f/
0Fv/0WD/0mH/0mL/0mP/02b/02j/1Gj/1W3/2Xv/2n3/2n//2oD/24L/3Yj/3Yn/3oz/4Zb/
4Zf/4pr/453/6K3/6K7/6bP/7L3/7L7/8c7/8c//9d3/9t7/9t//+ej/+u7/++//+/H/+/L/
/PT//PX//fb//vv///3///7////BnZiOAAABB0lEQVQoz3WS61LCMBCF17vUewUVRVhQFESk
oCgKSlUu9Vai7/8wzslCJ+mU86PnbL7MNpMN/c0TGVl9BCqBDHuNInOx0RvaJOhwpE5gkEGV
DVUHERmV2VJ5NCWfNY7p+kvII4p8VhazeXyfNBmXkFPkwlxyYKUxyDNihmgFvkqUgb+ANGPk
AN4EudL9N2hPum3LGUDOdCycyAmOC9rOQS44QZcg9SRyA3IvXWZrEh5AfB131tOHzEfptV1d
voKEFcScQ7SwSOTkUFVCfTt9vevU3Vxa3tqXo/Xl3tRt/P93ajqFb88G3k80ubBtgnZoTPvX
j2ZU85X9Qibv3Zbntbpvk4RXZesf0tV5+8YLzGEAAAAASUVORK5CYII=
--boundary--
Short origin story
It appears that the first standard using Base64 was RFC989 in 1987 for what is now MIME Base64. That RFC was superseded by RFC1421.
Since then it has appeared in various forms with explicit padding (=
), without padding and using different characters.
Advantages
The 64 in base64 is a power of 2 which makes it easy to write fast encoding and decoding libraries.
Compared with alternatives like base62 (doesn’t include punctuation) or base58 (additionally doesn’t include 0
, I
, O
or l
) it is more efficient and much more common.
Compared with base122 it is less efficient but much more common, less likely to break and suitable for environments that don’t support UTF-8.
Disadvantages
While it is very useful to be able to represent binary data safely in text documents there are some disadvantages to this.
A layer of obfuscation
It can enable content to hide from security tools like firewalls. While most can decode base64 data it is yet another obstacle for them. Security tools often don’t have the luxury of knowing the data format used by the application so must guess or use heuristics which in turn leads to imperfections.
You may notice that in the example above the base64 data almost looks like a strange URL (3Yn/3oz/4Zb/4Zf/4pr/453
). All of those characters are valid in a URL path. A firewall may decide to check this section as base64 and as a URL path but not every firewall does.
Additional work
The data has to be decoded and while the algorithm is simple it still takes additional CPU.
Uses more space
Base64 encoding has an overhead of ~33%. Every 3 bytes of unencoded data use 4 bytes after base64 encoding. With only 64 characters (or 6 bits), 2 bits are effectively wasted in each byte. Try the visual tools below to better understand this.
Which base64?
It isn’t enough to tell someone that a field is base64 encoded. Really we need to specify which alphabet we are using (common choices being known as “standard” and “URL safe”) and if we are including padding or not.
How does it work?
Let’s assume “standard” base64 encoding with padding. The alphabet used is:
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
Each encoded character represents 6 bits according to its index in the alphabet.
The encoded character A
represents an input of 000000
.
The encoded character B
represents an input of 000001
.
The encoded character C
represents an input of 000010
. Etc.
The next 6 bits of the input sequence are looked up in the alphabet. And so on.
We can simplify our thinking about it by realising that base64 encoding works in groups of 3 input bytes/4 output bytes ie 24 bits.
Visual encoder
Here is a visual explanation of base64 encoding.
To give yourself more space, you may open them in a new tab.
Type anything you like into the input box or try an emoji (eg 🌴).
Visual decoder
Here is a visual explanation of base64 decoding. In the explanation I have deliberately arranged it in a “reverse decoding” order to illustrate the process better.
Try copying the output from above into the input here. Then try changing an input character or use one of the examples. You could also try copying the picture
value in quotes from the JSON example.
Both with and without padding are permitted by this decoder. Unlike most decoders, it is tolerant of many errors and, if you hover your mouse over a red highlighted box, then it will display the error.
Why did I build these tools?
I was working with base64 encoded data from a security standpoint. We could not use standard encoding/decoding functions because of our unusual requirements. When deciding on a design I wanted to consider some edge cases and run some tests without writing any code. I’m a visual person so I drew diagrams like in the tools and did the base64 decoding by hand.
After completing the project, I decided to build the tools I wish I had before. My hope is that these can help others better understand base64 encoding. Maybe even myself in future or someone else who also has unusual decoding requirements.