Описание
Keras Directory Traversal Vulnerability
Summary
Keras's keras.utils.get_file() function is vulnerable to directory traversal attacks despite implementing filter_safe_paths(). The vulnerability exists because extract_archive() uses Python's tarfile.extractall() method without the security-critical filter="data" parameter. A PATH_MAX symlink resolution bug occurs before path filtering, allowing malicious tar archives to bypass security checks and write files outside the intended extraction directory.
Details
Root Cause Analysis
Current Keras Implementation
The Critical Flaw
While Keras attempts to filter unsafe paths using filter_safe_paths(), this filtering happens after the tar archive members are parsed and before actual extraction. However, the PATH_MAX symlink resolution bug occurs during extraction, not during member enumeration.
Exploitation Flow:
- Archive parsing:
filter_safe_paths()sees symlink paths that appear safe - Extraction begins:
extractall()processes the filtered members - PATH_MAX bug triggers: Symlink resolution fails due to path length limits
- Security bypass: Failed resolution causes literal path interpretation
- Directory traversal: Files written outside intended directory
Technical Details
The vulnerability exploits a known issue in Python's tarfile module where excessively long symlink paths can cause resolution failures, leading to the symlink being treated as a literal path. This bypasses Keras's path filtering because:
filter_safe_paths()operates on the parsed tar member information- The PATH_MAX bug occurs during actual file system operations in
extractall() - Failed symlink resolution falls back to literal path interpretation
- This allows traversal paths like
../../../../etc/passwdto be written
Affected Code Location
File: keras/src/utils/file_utils.py
Function: extract_archive() around line 121
Issue: Missing filter="data" parameter in tarfile.extractall()
Proof of Concept
Environment Setup
- Python: 3.8+ (tested on multiple versions)
- Keras: Standalone Keras or TensorFlow.Keras
- Platform: Linux, macOS, Windows (path handling varies)
Exploitation Steps
- Create malicious tar archive with PATH_MAX symlink chain
- Host archive on accessible HTTP server
- Call
keras.utils.get_file()withextract=True - Observe directory traversal - files written outside cache directory
Key Exploit Components
- Deep symlink chain: 16+ nested symlinks with long directory names
- PATH_MAX overflow: Final symlink path exceeding system limits
- Traversal payload: Relative path traversal (
../../../target/file) - Legitimate disguise: Archive contains valid-looking dataset files
Demonstration Results
Vulnerable behavior:
- Files extracted outside intended
cache_dir/datasets/location - Security filtering bypassed completely
- No error or warning messages generated
Expected secure behavior:
- Extraction blocked or confined to cache directory
- Security warnings for suspicious archive contents
Impact
Vulnerability Classification
- Type: Directory Traversal / Path Traversal (CWE-22)
- Severity: High
- CVSS Components: Network accessible, no authentication required, impacts confidentiality and integrity
Who Is Impacted
Direct Impact:
- Applications using
keras.utils.get_file()withextract=True - Machine learning pipelines downloading and extracting datasets
- Automated ML training systems processing external archives
Attack Scenarios:
- Malicious datasets: Attacker hosts compromised ML dataset
- Supply chain: Legitimate dataset repositories compromised
- Model poisoning: Extraction writes malicious files alongside training data
- System compromise: Configuration files, executables written to system directories
Affected Environments:
- Research environments downloading public datasets
- Production ML systems with automated dataset fetching
- Educational platforms using Keras for tutorials
- CI/CD pipelines training models with external data
Risk Assessment
High Risk Factors:
- Common usage pattern in ML workflows
- No user awareness of extraction security
- Silent failure mode (no warnings)
- Cross-platform vulnerability
Potential Consequences:
- Arbitrary file write on target system
- Configuration file tampering
- Code injection via overwritten scripts
- Data exfiltration through planted files
- System compromise in containerized environments
Recommended Fix
Immediate Mitigation
Replace the vulnerable extraction code with:
Long-term Solution
- Add
filter="data"parameter to alltarfile.extractall()calls - Implement comprehensive path validation before extraction
- Add extraction logging for security monitoring
- Consider sandboxed extraction for untrusted archives
- Update documentation to warn about archive security risks
Backward Compatibility
The fix maintains full backward compatibility as filter="data" is the recommended secure default for Python 3.12+.
References
- [Python tarfile security documentation](https://docs.python.org/3/library/tarfile.html#extraction-filters)
- [CVE-2007-4559](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2007-4559) - Related tarfile vulnerability
- [OWASP Path Traversal](https://owasp.org/www-community/attacks/Path_Traversal)
Note: Reported in Huntr as well, but didn't get response https://huntr.com/bounties/f94f5beb-54d8-4e6a-8bac-86d9aee103f4
Ссылки
- https://github.com/keras-team/keras/security/advisories/GHSA-hjqc-jx6g-rwp9
- https://nvd.nist.gov/vuln/detail/CVE-2025-12060
- https://nvd.nist.gov/vuln/detail/CVE-2025-12638
- https://github.com/keras-team/keras/pull/21760
- https://github.com/keras-team/keras/commit/47fcb397ee4caffd5a75efd1fa3067559594e951
- https://huntr.com/bounties/f94f5beb-54d8-4e6a-8bac-86d9aee103f4
Пакеты
keras
<= 3.11.3
3.12.0
Связанные уязвимости
The keras.utils.get_file API in Keras, when used with the extract=True option for tar archives, is vulnerable to a path traversal attack. The utility uses Python's tarfile.extractall function without the filter="data" feature. A remote attacker can craft a malicious tar archive containing special symlinks, which, when extracted, allows them to write arbitrary files to any location on the filesystem outside of the intended destination folder. This vulnerability is linked to the underlying Python tarfile weakness, identified as CVE-2025-4517. Note that upgrading Python to one of the versions that fix CVE-2025-4517 (e.g. Python 3.13.4) is not enough. One additionally needs to upgrade Keras to a version with the fix (Keras 3.12).
The keras.utils.get_file API in Keras, when used with the extract=True option for tar archives, is vulnerable to a path traversal attack. The utility uses Python's tarfile.extractall function without the filter="data" feature. A remote attacker can craft a malicious tar archive containing special symlinks, which, when extracted, allows them to write arbitrary files to any location on the filesystem outside of the intended destination folder. This vulnerability is linked to the underlying Python tarfile weakness, identified as CVE-2025-4517. Note that upgrading Python to one of the versions that fix CVE-2025-4517 (e.g. Python 3.13.4) is not enough. One additionally needs to upgrade Keras to a version with the fix (Keras 3.12).
The keras.utils.get_file API in Keras, when used with the extract=True ...