Base64 encoding is used to represent binary data in an ASCII string format.

The term Base64 originates from a specific MIME content transfer encoding.

Text content B a s e
ASCII 66 97 115 101
Binary 0 1 0 0 0 0 1 0 0 1 1 0 0 0 0 1 0 1 1 1 0 0 1 1 0 1 1 0 0 1 0 1 0 0 0 0
Index (see below) 16 38 5 51 25 16
base64 Q m F z Z Q = =

In the above quote, the encoded value of Base is QmFzZQ==. Encoded in ASCII, the characters B, a, s, and e are stored as the bytes 66, 97, 115 and 101, which are the 8-bit binary values 01000010, 01100001, 01110011 and 01100101. These groups of binary strings are joined together. The binary string is split into groups of 6 bits (fill the last group with zeroes if necessary, to make it 6-bit long) and these 6-bit binary strings are converted into individual numbers, which are then converted into their corresponding Base64 character values using the following index:

Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
Letter A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h
Index 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63
Letter i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 + /
This index will appear in malware using base64, as the following string: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/.

Custom index


Malware can modify the base64 index to make the standard base64 decoding function ineffective. Below is an example:

$ echo -n "RFUsbF8gU29ybFP=" | base64 -d
DU,l_ SorlS

Though, the string RFUsbF8gU29ybFP= looks like base64 encoded. And it is, but with a subtle difference in the index. Indeed, here is the index used to encode the string:


Python script

Below is a python script I've written to encrypt and decrypt a string with a custom index and/or padding character.

#!/usr/bin/env python
import argparse

# Base64 Index and padding character
ind = "aABCDEFGHIJKLMNOPQRSTUVWXYZbcdefghijklmnopqrstuvwxyz0123456789+/"
padding_char = "="

def encode(s):
    # Convert input into binary string
    b = ""
    for i in s:
        b += "{0:b}".format(ord(i)).zfill(8)

    # Padding with 0 in case the length of binary string not multiple of 6
    padding = ""
    while(len(b)%6 != 0):
        b += "00"
        padding += padding_char

    return "%s%s" % (''.join(ind[int(b[i*6:i*6+6], 2)] for i in range(len(b)/6)), padding)

def decode(s):
    # Build binary string from base64 characters
    b = ""
    for i in s:
        if i != padding_char:
            b += "{0:b}".format(ind.index(i)).zfill(6)

    # Convert binary by group of 8 bits into letters
    return ''.join(chr(int(b[i*8:i*8+8], 2)) for i in range(len(b)/8))
    return b

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    group = parser.add_mutually_exclusive_group()
    group.add_argument("-e", "--encode", help="Encode string in base64")
    group.add_argument("-d", "--decode", help="Decode string in base64")
    args = parser.parse_args()
    if args.encode:
        print encode(args.encode)
        print decode(args.decode)

Here is how I've encrypted the string:

Operation Standard base64 index Custom base64 index
$ echo -n "Hello World" | base64
$ ./ -e "Hello World"
$ echo "SGVsbG8gV29ybGQ=" | base64 -d
Hello World
$ ./ -d "RFUsbF8gU29ybFP="
Hello World

Detecting base64

Yara signature

You can use the following Yara signature in combination with pescanner to detect the presence of base64 (with a standard index) in the code:

rule base64_encoding
    description = "Base64 detected"

    $a = {41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F 50 51 52 53 54 55 56 57 58 59 5A 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 73 74 75 76 77 78 79 7A 30 31 32 33 34 35 36 37 38 39 2B 2F}

    any of them


The KANAL plugin in PEiD will be able to detect the presence of Base64:



IDA Pro's entropy plugin can identify base64 (even with a non-standard index), as shown on the below screenshot:

Base64 pattern

                                   │ sub_4011C9 │  (Function that calls the base64 function to encode strings)
                                   │ sub_4010B1 │  (Base64 encode function)
                                   │ sub_401000 │  (Base64 index function)
│ aAbcdefghijklmn db 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/',0 │  (Index string)

Index string

As you can see below, the index string is easily identifiable. Apply an ASCII transformation (press "a" key):

Before applying ascii transformation
After applying ascii transformation

Base64 index

The string has 4 cross references, within the same function, the base64 index. One additional hint to identify the purpose of this function is the presence of the base64 padding: 0x3D ("=").


Malware can have a standard base64 index string but a custom padding. Below is an example:

Base64 encode

Below is the assembly code of the real base64 encode function.

.text:004010B1 ; int __cdecl base64encode(char *, int)
.text:004010B1 base64encode    proc near               ; CODE XREF: sub_4011C9+61�p
.text:004010B1 var_1C          = dword ptr -1Ch
.text:004010B1 var_18          = dword ptr -18h
.text:004010B1 var_14          = dword ptr -14h
.text:004010B1 var_10          = byte ptr -10h
.text:004010B1 var_C           = byte ptr -0Ch
.text:004010B1 var_8           = dword ptr -8
.text:004010B1 var_4           = dword ptr -4
.text:004010B1 arg_0           = dword ptr  8
.text:004010B1 arg_4           = dword ptr  0Ch
.text:004010B1                 push    ebp
.text:004010B2                 mov     ebp, esp
.text:004010B4                 sub     esp, 1Ch
.text:004010B7                 mov     [ebp+var_1C], 0
.text:004010BE                 mov     eax, [ebp+arg_0]
.text:004010C1                 push    eax             ; char *
.text:004010C2                 call    _strlen         ; get length of the string
.text:004010C7                 add     esp, 4
.text:004010CA                 mov     [ebp+var_8], eax
.text:004010CD                 mov     [ebp+var_4], 0
.text:004010D4                 mov     [ebp+var_18], 0
.text:004010DB loc_4010DB:                             ; CODE XREF: base64encode:loc_401187�j
.text:004010DB                 mov     ecx, [ebp+var_4]
.text:004010DE                 cmp     ecx, [ebp+var_8]
.text:004010E1                 jge     loc_40118C
.text:004010E7                 mov     [ebp+var_1C], 0
.text:004010EE                 mov     [ebp+var_14], 0
.text:004010F5                 jmp     short loc_401100
.text:004010F7 ; ---------------------------------------------------------------------------
.text:004010F7 loc_4010F7:                             ; CODE XREF: base64encode:loc_401139�j
.text:004010F7                 mov     edx, [ebp+var_14]
.text:004010FA                 add     edx, 1
.text:004010FD                 mov     [ebp+var_14], edx
.text:00401100 loc_401100:                             ; CODE XREF: base64encode+44�j
.text:00401100                 cmp     [ebp+var_14], 3
.text:00401104                 jge     short loc_40113B
.text:00401106                 mov     eax, [ebp+arg_0]
.text:00401109                 add     eax, [ebp+var_4]
.text:0040110C                 mov     ecx, [ebp+var_14]
.text:0040110F                 mov     dl, [eax]
.text:00401111                 mov     [ebp+ecx+var_10], dl
.text:00401115                 mov     eax, [ebp+var_4]
.text:00401118                 cmp     eax, [ebp+var_8]
.text:0040111B                 jge     short loc_401131
.text:0040111D                 mov     ecx, [ebp+var_1C]
.text:00401120                 add     ecx, 1
.text:00401123                 mov     [ebp+var_1C], ecx
.text:00401126                 mov     edx, [ebp+var_4]
.text:00401129                 add     edx, 1
.text:0040112C                 mov     [ebp+var_4], edx
.text:0040112F                 jmp     short loc_401139
.text:00401131 ; ---------------------------------------------------------------------------
.text:00401131 loc_401131:                             ; CODE XREF: base64encode+6A�j
.text:00401131                 mov     eax, [ebp+var_14]
.text:00401134                 mov     [ebp+eax+var_10], 0
.text:00401139 loc_401139:                             ; CODE XREF: base64encode+7E�j
.text:00401139                 jmp     short loc_4010F7
.text:0040113B ; ---------------------------------------------------------------------------
.text:0040113B loc_40113B:                             ; CODE XREF: base64encode+53�j
.text:0040113B                 cmp     [ebp+var_1C], 0
.text:0040113F                 jz      short loc_401187
.text:00401141                 mov     ecx, [ebp+var_1C]
.text:00401144                 push    ecx
.text:00401145                 lea     edx, [ebp+var_C]
.text:00401148                 push    edx
.text:00401149                 lea     eax, [ebp+var_10]
.text:0040114C                 push    eax
.text:0040114D                 call    base64index
.text:00401152                 add     esp, 0Ch
.text:00401155                 mov     [ebp+var_14], 0
.text:0040115C                 jmp     short loc_401167
.text:0040115E ; ---------------------------------------------------------------------------
.text:0040115E loc_40115E:                             ; CODE XREF: base64encode+D4�j
.text:0040115E                 mov     ecx, [ebp+var_14]
.text:00401161                 add     ecx, 1
.text:00401164                 mov     [ebp+var_14], ecx
.text:00401167 loc_401167:                             ; CODE XREF: base64encode+AB�j
.text:00401167                 cmp     [ebp+var_14], 4
.text:0040116B                 jge     short loc_401187
.text:0040116D                 mov     edx, [ebp+arg_4]
.text:00401170                 add     edx, [ebp+var_18]
.text:00401173                 mov     eax, [ebp+var_14]
.text:00401176                 mov     cl, [ebp+eax+var_C]
.text:0040117A                 mov     [edx], cl
.text:0040117C                 mov     edx, [ebp+var_18]
.text:0040117F                 add     edx, 1
.text:00401182                 mov     [ebp+var_18], edx
.text:00401185                 jmp     short loc_40115E
.text:00401187 ; ---------------------------------------------------------------------------
.text:00401187 loc_401187:                             ; CODE XREF: base64encode+8E�j
.text:00401187                                         ; base64encode+BA�j
.text:00401187                 jmp     loc_4010DB
.text:0040118C ; ---------------------------------------------------------------------------
.text:0040118C loc_40118C:                             ; CODE XREF: base64encode+30�j
.text:0040118C                 mov     esp, ebp
.text:0040118E                 pop     ebp
.text:0040118F                 retn
.text:0040118F base64encode    endp

  • the call to the _strlen function at offset 0x4010C2 that computes the length of the string
  • the comparaison cmp [ebp+var_14], 3 at offset 0x401100 at the start of the outter loop
  • the comparaison cmp [ebp+var_14], 4 at offset 0x401167 at the start of the inner loop

