Getting Started with CIRCL hashlookup

← hashlookup CIRCL API

When to use this API

When a file hash lands in front of you during incident response, malware triage, or software forensics and you need to know whether it belongs to known-good software. CIRCL hashlookup is a public lookup service backed by the NIST National Software Reference Library (NSRL) and supplementary datasets including Snap packages and Android RDS releases, covering over 6 billion indexed hashes. The forensically meaningful result is the negative: a hash that returns HTTP 404 is not in the known-good corpus and warrants investigation. For real-time threat intelligence or malware signatures, use VirusTotal or MISP instead — hashlookup is a file-identity oracle, not a reputation engine.

Verifying a file against the known-good software corpus

"Is this SHA1 from a legitimate software install, or should I be worried?" The SHA1 lookup is the primary call. Use it when you have a SHA1 from a file system scan, a forensic image, or a log entry and need to know whether the file is documented in any known software distribution.

SHA1 A9993E364706816ABA3E25717850C26C9CD0D89D is the canonical hash of the three-byte string "abc" — and it appears in the corpus as a Python 2.7 demo file (usr/share/doc/python2.7/examples/Demo/md5test/foo) distributed with multiple Linux distributions.

curl "https://hashlookup.circl.lu/lookup/sha1/A9993E364706816ABA3E25717850C26C9CD0D89D" | head -c 10000
{
  "CRC32": "352441C2",
  "FileName": "./usr/share/doc/python2.7/examples/Demo/md5test/foo",
  "FileSize": "3",
  "MD5": "900150983CD24FB0D6963F7D28E17F72",
  "SHA-1": "A9993E364706816ABA3E25717850C26C9CD0D89D",
  "SHA-256": "BA7816BF8F01CFEA414140DE5DAE2223B00361A396177A9CB410FF61F20015AD",
  "ProductCode": {
    "ProductName": "OpenLinux eServer 2.3",
    "ProductVersion": "2.3",
    "ApplicationType": "Server"
  },
  "db": "nsrl_legacy",
  "source": "RDS_2025.03.1_android.db",
  "mimetype": "text/plain"
}

The source field names the RDS release that confirmed the match — here, the March 2025 Android dataset. That's not a contradiction: Android's reference library includes standard Linux-derived file trees, and this Python demo file has traveled across distributions for decades. The ProductCode.ProductName ("OpenLinux eServer 2.3") is the NSRL's historical product attribution, not a description of where you'd find this file today; treat it as provenance context. The db: "nsrl_legacy" flag means this record came from the original NSRL historical corpus rather than a recent RDS update.

This SHA1 matches a known file from the Python 2.7 demo suite (Demo/md5test/foo), confirmed in NSRL's March 2025 Android dataset. It's a 3-byte test file containing "abc" — a known-good, documented file, not suspicious.

Tracing a file's distribution across the supply chain

"How many packages ship this exact file, and which ones?" The lookup response includes hashlookup:parent-total, which counts how many parent packages in the corpus contain the file. A high count indicates a widely distributed file — useful for understanding blast radius when a vulnerability is found in a shared component.

MD5 5F4DCC3B5AA765D61D8327DEB882CF99 is the hash of the literal eight-byte string "password" stored as a binary file, a standard test fixture in cryptographic libraries.

curl "https://hashlookup.circl.lu/lookup/md5/5F4DCC3B5AA765D61D8327DEB882CF99" | head -c 10000
{
  "FileName": "./usr/share/cargo/registry/pbkdf2-0.3.0/tests/data/3.password.bin",
  "FileSize": "8",
  "MD5": "5F4DCC3B5AA765D61D8327DEB882CF99",
  "SHA-1": "5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8",
  "SHA-256": "5E884898DA28047151D0E56F8DC6292773603D0D6AABBDD62A11EF721D1542D8",
  "ProductCode": {
    "ProductName": "iMovie Toolkit",
    "ProductVersion": "2003",
    "ApplicationType": "Demonstration"
  },
  "RDS:package_id": "9122",
  "db": "nsrl_legacy",
  "hashlookup:parent-total": 13,
  "mimetype": "text/plain",
  "source": "snap:8qJy8uoPtOYy36n9qDw7tSAA1s16cl7S_324"
}

The hashlookup:parent-total: 13 tells you 13 distinct packages in the corpus include this exact file. To retrieve the full parent list, follow up with /parents/{sha1}/{count}/{cursor} — the parents endpoint takes only SHA1, so use the SHA-1 value from this response: GET /parents/5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8/10/0. The disconnect between ProductCode.ProductName ("iMovie Toolkit 2003") and source (a Rust pbkdf2 Snap package) illustrates a real data quality pattern in nsrl_legacy records: the product attribution reflects the first historical record that indexed the hash, not the actual source of the file you're looking at. Trust source and FileName when they conflict with ProductCode.

The file with MD5 5F4DCC3B5AA765D61D8327DEB882CF99 is an 8-byte cryptographic test fixture (the literal string "password") found in 13 different packages in the NSRL corpus. It's a known-good file — its wide distribution reflects how crypto test vectors propagate through open-source dependency trees.

Checking database coverage before a forensics session

"How current is the known-good database, and will it cover software from this year?" Before running a batch of hashes through the service, verify the NSRL version and corpus size. Hashes from software released after the last NSRL update won't be found, which generates false-positive "unknown file" flags that waste investigation time.

curl "https://hashlookup.circl.lu/info" | head -c 10000
{
  "nsrl-version": "2023.09.2",
  "stat:hashlookup_total_keys": 6323900088
}

Over 6.3 billion indexed keys, but the NSRL base version — 2023.09.2 — dates to September 2023. CIRCL supplements this with Snap packages and Android RDS releases that run into 2025, but commercial software and recent Linux distributions released after late 2023 may not have full coverage. Check this at the start of an investigation so you know which "not found" results are worth escalating and which are plausibly just post-NSRL releases.

The hashlookup database covers over 6.3 billion file hashes, backed by NSRL version 2023.09.2 plus supplementary datasets through 2025. Files from commercial software released after September 2023 may not be present even if they're completely legitimate.

Pitfalls

One-line summary for the user

I can check any MD5, SHA1, or SHA256 against CIRCL's public known-good file corpus (6+ billion NSRL-backed hashes) — a 404 means the file is not documented in any known software distribution, which is the result that matters for triage.