Category:Encryption/base64
You are here | base64
|
Description
Base64 encoding is used to represent binary data in an ASCII string format.
The term Base64 originates from a specific MIME content transfer encoding.
Text content | B | a | s | e | ||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ASCII | 66 | 97 | 115 | 101 | ||||||||||||||||||||||||||||||||||||||||||||
Binary | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | ||||||||||||
Index (see below) | 16 | 38 | 5 | 51 | 25 | 16 | ||||||||||||||||||||||||||||||||||||||||||
base64 | Q | m | F | z | Z | Q | = | = |
In the above quote, the encoded value of Base is QmFzZQ==. Encoded in ASCII, the characters B, a, s, and e are stored as the bytes 66, 97, 115 and 101, which are the 8-bit binary values 01000010, 01100001, 01110011 and 01100101. These groups of binary strings are joined together. The binary string is split into groups of 6 bits (fill the last group with zeroes if necessary, to make it 6-bit long) and these 6-bit binary strings are converted into individual numbers, which are then converted into their corresponding Base64 character values using the following index:
Index | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Letter | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | a | b | c | d | e | f | g | h |
Index | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Letter | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | + | / |
Custom index
Description
Malware can modify the base64 index to make the standard base64 decoding function ineffective. Below is an example:
$ echo -n "RFUsbF8gU29ybFP=" | base64 -d DU,l_ SorlS
Though, the string RFUsbF8gU29ybFP= looks like base64 encoded. And it is, but with a subtle difference in the index. Indeed, here is the index used to encode the string:
aABCDEFGHIJKLMNOPQRSTUVWXYZbcdefghijklmnopqrstuvwxyz0123456789+/
Python script
Below is a python script I've written to encrypt and decrypt a string with a custom index and/or padding character.
#!/usr/bin/env python
import argparse
# Base64 Index and padding character
ind = "aABCDEFGHIJKLMNOPQRSTUVWXYZbcdefghijklmnopqrstuvwxyz0123456789+/"
padding_char = "="
def encode(s):
# Convert input into binary string
b = ""
for i in s:
b += "{0:b}".format(ord(i)).zfill(8)
# Padding with 0 in case the length of binary string not multiple of 6
padding = ""
while(len(b)%6 != 0):
b += "00"
padding += padding_char
return "%s%s" % (''.join(ind[int(b[i*6:i*6+6], 2)] for i in range(len(b)/6)), padding)
def decode(s):
# Build binary string from base64 characters
b = ""
for i in s:
if i != padding_char:
b += "{0:b}".format(ind.index(i)).zfill(6)
# Convert binary by group of 8 bits into letters
return ''.join(chr(int(b[i*8:i*8+8], 2)) for i in range(len(b)/8))
return b
if __name__ == '__main__':
parser = argparse.ArgumentParser()
group = parser.add_mutually_exclusive_group()
group.add_argument("-e", "--encode", help="Encode string in base64")
group.add_argument("-d", "--decode", help="Decode string in base64")
args = parser.parse_args()
if args.encode:
print encode(args.encode)
else:
print decode(args.decode)
Here is how I've encrypted the string:
Operation | Standard base64 index | Custom base64 index |
---|---|---|
Encode |
$ echo -n "Hello World" | base64 SGVsbG8gV29ybGQ= |
$ ./base64.py -e "Hello World" RFUsbF8gU29ybFP= |
Decode |
$ echo "SGVsbG8gV29ybGQ=" | base64 -d Hello World |
$ ./base64.py -d "RFUsbF8gU29ybFP=" Hello World |
Detecting base64
Yara signature
You can use the following Yara signature in combination with pescanner to detect the presence of base64 (with a standard index) in the code:
rule base64_encoding { meta: description = "Base64 detected" strings: $a = {41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F 50 51 52 53 54 55 56 57 58 59 5A 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 73 74 75 76 77 78 79 7A 30 31 32 33 34 35 36 37 38 39 2B 2F} condition: any of them }
KANAL
The KANAL plugin in PEiD will be able to detect the presence of Base64:
IDA Pro
Entropy
IDA Pro's entropy plugin can identify base64 (even with a non-standard index), as shown on the below screenshot:
Base64 pattern
┌────────────┐ │ sub_4011C9 │ (Function that calls the base64 function to encode strings) └────────────┘ │ ▼ ┌────────────┐ │ sub_4010B1 │ (Base64 encode function) └────────────┘ │ ▼ ┌────────────┐ │ sub_401000 │ (Base64 index function) └────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────────────────────────┐ │ aAbcdefghijklmn db 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/',0 │ (Index string) └─────────────────────────────────────────────────────────────────────────────────────────┘
Index string
As you can see below, the index string is easily identifiable. Apply an ASCII transformation (press "a" key):
Base64 index
The string has 4 cross references, within the same function, the base64 index. One additional hint to identify the purpose of this function is the presence of the base64 padding: 0x3D ("=").
Base64 encode
Below is the assembly code of the real base64 encode function.
.text:004010B1 ; int __cdecl base64encode(char *, int)
.text:004010B1 base64encode proc near ; CODE XREF: sub_4011C9+61�p
.text:004010B1
.text:004010B1 var_1C = dword ptr -1Ch
.text:004010B1 var_18 = dword ptr -18h
.text:004010B1 var_14 = dword ptr -14h
.text:004010B1 var_10 = byte ptr -10h
.text:004010B1 var_C = byte ptr -0Ch
.text:004010B1 var_8 = dword ptr -8
.text:004010B1 var_4 = dword ptr -4
.text:004010B1 arg_0 = dword ptr 8
.text:004010B1 arg_4 = dword ptr 0Ch
.text:004010B1
.text:004010B1 push ebp
.text:004010B2 mov ebp, esp
.text:004010B4 sub esp, 1Ch
.text:004010B7 mov [ebp+var_1C], 0
.text:004010BE mov eax, [ebp+arg_0]
.text:004010C1 push eax ; char *
.text:004010C2 call _strlen ; get length of the string
.text:004010C7 add esp, 4
.text:004010CA mov [ebp+var_8], eax
.text:004010CD mov [ebp+var_4], 0
.text:004010D4 mov [ebp+var_18], 0
.text:004010DB
.text:004010DB loc_4010DB: ; CODE XREF: base64encode:loc_401187�j
.text:004010DB mov ecx, [ebp+var_4]
.text:004010DE cmp ecx, [ebp+var_8]
.text:004010E1 jge loc_40118C
.text:004010E7 mov [ebp+var_1C], 0
.text:004010EE mov [ebp+var_14], 0
.text:004010F5 jmp short loc_401100
.text:004010F7 ; ---------------------------------------------------------------------------
.text:004010F7
.text:004010F7 loc_4010F7: ; CODE XREF: base64encode:loc_401139�j
.text:004010F7 mov edx, [ebp+var_14]
.text:004010FA add edx, 1
.text:004010FD mov [ebp+var_14], edx
.text:00401100
.text:00401100 loc_401100: ; CODE XREF: base64encode+44�j
.text:00401100 cmp [ebp+var_14], 3
.text:00401104 jge short loc_40113B
.text:00401106 mov eax, [ebp+arg_0]
.text:00401109 add eax, [ebp+var_4]
.text:0040110C mov ecx, [ebp+var_14]
.text:0040110F mov dl, [eax]
.text:00401111 mov [ebp+ecx+var_10], dl
.text:00401115 mov eax, [ebp+var_4]
.text:00401118 cmp eax, [ebp+var_8]
.text:0040111B jge short loc_401131
.text:0040111D mov ecx, [ebp+var_1C]
.text:00401120 add ecx, 1
.text:00401123 mov [ebp+var_1C], ecx
.text:00401126 mov edx, [ebp+var_4]
.text:00401129 add edx, 1
.text:0040112C mov [ebp+var_4], edx
.text:0040112F jmp short loc_401139
.text:00401131 ; ---------------------------------------------------------------------------
.text:00401131
.text:00401131 loc_401131: ; CODE XREF: base64encode+6A�j
.text:00401131 mov eax, [ebp+var_14]
.text:00401134 mov [ebp+eax+var_10], 0
.text:00401139
.text:00401139 loc_401139: ; CODE XREF: base64encode+7E�j
.text:00401139 jmp short loc_4010F7
.text:0040113B ; ---------------------------------------------------------------------------
.text:0040113B
.text:0040113B loc_40113B: ; CODE XREF: base64encode+53�j
.text:0040113B cmp [ebp+var_1C], 0
.text:0040113F jz short loc_401187
.text:00401141 mov ecx, [ebp+var_1C]
.text:00401144 push ecx
.text:00401145 lea edx, [ebp+var_C]
.text:00401148 push edx
.text:00401149 lea eax, [ebp+var_10]
.text:0040114C push eax
.text:0040114D call base64index
.text:00401152 add esp, 0Ch
.text:00401155 mov [ebp+var_14], 0
.text:0040115C jmp short loc_401167
.text:0040115E ; ---------------------------------------------------------------------------
.text:0040115E
.text:0040115E loc_40115E: ; CODE XREF: base64encode+D4�j
.text:0040115E mov ecx, [ebp+var_14]
.text:00401161 add ecx, 1
.text:00401164 mov [ebp+var_14], ecx
.text:00401167
.text:00401167 loc_401167: ; CODE XREF: base64encode+AB�j
.text:00401167 cmp [ebp+var_14], 4
.text:0040116B jge short loc_401187
.text:0040116D mov edx, [ebp+arg_4]
.text:00401170 add edx, [ebp+var_18]
.text:00401173 mov eax, [ebp+var_14]
.text:00401176 mov cl, [ebp+eax+var_C]
.text:0040117A mov [edx], cl
.text:0040117C mov edx, [ebp+var_18]
.text:0040117F add edx, 1
.text:00401182 mov [ebp+var_18], edx
.text:00401185 jmp short loc_40115E
.text:00401187 ; ---------------------------------------------------------------------------
.text:00401187
.text:00401187 loc_401187: ; CODE XREF: base64encode+8E�j
.text:00401187 ; base64encode+BA�j
.text:00401187 jmp loc_4010DB
.text:0040118C ; ---------------------------------------------------------------------------
.text:0040118C
.text:0040118C loc_40118C: ; CODE XREF: base64encode+30�j
.text:0040118C mov esp, ebp
.text:0040118E pop ebp
.text:0040118F retn
.text:0040118F base64encode endp
http://www.aldeid.com/wiki/Category:Encryption/base64#Base64_encode:
- the call to the _strlen function at offset 0x4010C2 that computes the length of the string
- the comparaison cmp [ebp+var_14], 3 at offset 0x401100 at the start of the outter loop
- the comparaison cmp [ebp+var_14], 4 at offset 0x401167 at the start of the inner loop
Pages in category "Encryption/base64"
The following 3 pages are in this category, out of 3 total.