You might also see: pdfid

Description

pdf-parser is a python-based script written by Didier Stevens, that parses a PDF document to identify the fundamental elements used in the analyzed file.

Installation

$ cd /data/src/
$ wget http://didierstevens.com/files/software/pdf-parser_V0_4_3.zip
$ unzip pdf-parser_V0_4_3.zip
$ chmod +x pdf-parser.py

Usage

Syntax

Usage: pdf-parser.py [options] pdf-file

Options

--version: show program's version number and exit
-h, --help: show this help message and exit
-s SEARCH, --search=SEARCH: string to search in indirect objects (except streams)
-f, --filter: pass stream object through filters (FlateDecode, ASCIIHexDecode, ASCII85Decode, LZWDecode and RunLengthDecode only)
-o OBJECT, --object=OBJECT: id of indirect object to select (version independent)
-r REFERENCE, --reference=REFERENCE: id of indirect object being referenced (version independent)
-e ELEMENTS, --elements=ELEMENTS: type of elements to select (cxtsi)
-w, --raw: raw output for data and filters
-a, --stats: display stats for pdf document
-t TYPE, --type=TYPE: type of indirect object to select
-v, --verbose: display malformed PDF elements
-x EXTRACT, --extract=EXTRACT: filename to extract to
-H, --hash: display hash of objects
-n, --nocanonicalizedoutput: do not canonicalize the output
-d DUMP, --dump=DUMP: filename to dump stream content to
-D, --debug: display debug info
-c, --content: display the content for objects without streams or with streams without filters
--searchstream=SEARCHSTREAM: string to search in streams
--unfiltered: search in unfiltered streams
--casesensitive: case sensitive search in streams
--regex: use regex to search in streams

Example

Confirm presence of Javascript

With pdfid, we have been able to detect the presence of Javascript in the PDF file.

Highlight links between objects

Using pdf-parser

Let's use pdf-parser to dig more about this PDF file.

$ ./pdf-parser.py --search=javascript jsunpack-n-read-only/samples/pdf-thisCreator.file
obj 3 0
 Type: 
 Referencing: 5 0 R

  <<
    /JavaScript 5 0 R
  >>

obj 6 0
 Type: 
 Referencing: 111611 0 R

  <<
    /JS 111611 0 R
    /S /JavaScript
  >>

The above command shows the links between objects 3 and 5 on one hand and 6 and 111611 on the other hand. Let's see whether object 5 is linked with other objects:

$ ./pdf-parser.py --object=5 jsunpack-n-read-only/samples/pdf-thisCreator.file
obj 5 0
 Type: 
 Referencing: 6 0 R

  <<
    /Names [(A)6 0 R ]
  >>

Object 5 is linked to object 6 and we now have the complete map:

Using pdfobjflow

Using pdfobjflow offers a quicker way of having the map:

$ ./pdf-parser.py /data/tools/jsunpack-n-read-only/samples/pdf-thisCreator.file | ./pdfobjflow.py
$ eog pdfobjflow.png

Here is the map:

Decompress javascript

Now, let's decompress the javascript contained in object 111611 with the --filter and --raw options:

$ ./pdf-parser.py --object=111611 --filter --raw jsunpack-n-read-only/samples/pdf-thisCreator.file > out.js
$ cat out.js
obj 111611 0
 Type: 
 Referencing: 
 Contains stream

  <<
    /Filter /FlateDecode
    /Length 142
  >>

 /*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/var b/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/=/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/
this.creator;/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/var a/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/=/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/
unescape(/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/b/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/);/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/eval(
/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/unescape(/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/this.creator.replace(/z/igm,'%')
/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/)/*fjudfs4FSf4ZX <POFRNFSdfnjrfnc> SaKsonifbdh*/);

The above command reveals an obfuscated JavaScript code. Piping the output to a few commands helps decoding it:

$ tail -n +11 out.js | js_beautify - | grep -v "^\/\*" | indent 
var b = this.creator;
var a = unescape (b);
eval (unescape (this.creator.replace (/z / igm, '%')));

Comments

Pdf-parser

Contents

Description

Installation

Usage

Syntax

Options

Example

Confirm presence of Javascript

Highlight links between objects

Using pdf-parser

Using pdfobjflow

Decompress javascript

Comments

Share your opinion

Navigation menu

Pdf-parser

Description

Installation

Usage

Syntax

Options

Example

Confirm presence of Javascript

Highlight links between objects

Using pdf-parser

Using pdfobjflow

Decompress javascript

Comments

Share your opinion

Navigation menu

Search