
Malware Analysis Pipeline
CINDR Triage Engine
Automated static & behavioral malware analysis pipeline
Submit a file. Get a report. No sandboxes to configure, no RE tooling to maintain. The Triage Engine runs a full analysis stack — triage, strings, YARA, PE, macros, archives, obfuscation — in a serverless Azure pipeline built for speed and repeatability.
Pipeline Architecture
From upload to report
in seconds.
Four stages. Fully automated. Every file that enters the pipeline is triaged, analyzed, and reported without operator intervention.
Intake
Files are submitted via a secure REST API endpoint. The pipeline validates, fingerprints, and deduplicates each submission before staging it in Azure Blob Storage and queuing it for analysis.
- Multipart HTTP upload — single or batch submissions
- Per-file SHA-256 fingerprinting with submission-level dedup
- Filename sanitization and size validation (configurable limits)
- Structured intake form (submitter, context, archive password)
- Automatic rollback on partial failures — no orphaned data
Triage
Each file is classified by magic-byte inspection before any analysis runs. Type detection drives routing decisions — what analyses to run, what tools to dispatch, and how to recurse into extracted content.
- 25+ format signatures: PE, ELF, Mach-O, PDF, OLE2, ZIP, RAR, 7z, gzip, and more
- Platform routing: Windows, Linux, macOS
- Extension fallback for interpreted code (JS, PS1, py, sh, bat, vbs, hta)
- Category classification: executable, archive, document, interpreted_code, media, unknown
- Triage result embedded in every report for full traceability
Analyze
Analysis modules run in parallel — inline Python modules execute immediately and heavy native tools are dispatched asynchronously. Derived files are re-triaged and recursed up to three levels deep.
- Inline modules: strings, YARA, PE, PDF, macros, obfuscation, base64, JS beautify, archive unpack
- Heavy tools dispatched async: capa, speakeasy, pestudio, detect-it-easy, trid, xorsearch, binwalk
- Recursive derived-file analysis — up to 3 levels of nesting
- VirusTotal hash reputation lookup
- Full error isolation — one failed module never blocks the rest
Report
Analysis results are written as structured JSON — one report per file, one summary per submission. Reports are stored in Azure Blob Storage and indexed in Table Storage for fast retrieval.
- Per-file reports: triage, hashes, all analysis module outputs, derived files
- Submission summary: aggregate status, file count, error log, timestamp
- Status tracking: complete, partial, error — queryable from Table Storage
- Pending analyses listed for files awaiting heavy-tool results
- Fully structured JSON — import directly into your SIEM, SOAR, or case platform
Analysis Modules
Six modules.
Parallel execution.
Every module runs in isolation. A failure in one never stops the others. Results are merged into a single structured report per file.
String & IOC Extraction
- ASCII and UTF-16LE string extraction
- IOC parsing: URLs, IPv4, email addresses, registry keys
- Suspicious Win32 API detection (60+ signatures)
- Packer signature matching: UPX, ASPack, Themida, VMProtect, MPRESS
- Crypto artifact detection (PEM keys, AES markers)
YARA Scanning
- Bulk and per-file rule compilation with broken-rule filtering
- Namespace-based loading — community rulesets alongside custom rules
- External variable injection: filename, extension, filetype per scan
- Configurable per-scan timeout to prevent runaway matches
- Match output: rule name, namespace, tags, metadata, string hit count
PE / Executable Analysis
- Machine type, compile timestamp, image base, entry point, subsystem
- Section entropy analysis — high-entropy sections flagged automatically
- Full import and export table parsing with imphash
- Resource directory enumeration
- TLS callback and overlay detection
- Packer section name matching
Document & Macro Analysis
- PDF: metadata, URI extraction, JavaScript detection, embedded file enumeration
- PDF JavaScript extracted as derived files for further analysis
- OLE2 and OOXML VBA module extraction via olevba
- XLM macro enumeration
- olevba IOC scan: AutoExec, hex/base64 strings, Dridex artifacts
- Each extracted macro module re-enters the analysis pipeline
Archive Unpacking
- ZIP: encrypted member support, per-member size and total size limits
- tar, gzip, bzip2, xz stream decompression
- Nested archive recursion — gzip wrapping tar re-enters the pipeline automatically
- RAR and 7z detected and flagged for heavy-tool dispatch
- All unpacked members triaged and analyzed as first-class files
Obfuscation Scoring
- Language-specific heuristics: JavaScript, PowerShell, Python, shell scripts
- Generic metrics: entropy, line length, whitespace ratio, hex escape density
- JS: eval, new Function, String.fromCharCode, atob, hex-prefixed identifiers
- PowerShell: -EncodedCommand, FromBase64String, IEX, -bxor, char casts
- Verdicts: clean / minified / likely_obfuscated / highly_obfuscated
Behavioral Analysis — Pro
Heavy-tool dispatch.
Async. Non-blocking.
When a file warrants deeper inspection, the Triage Engine dispatches it to a second analysis tier that runs industry-standard native tools inside a containerized environment — without blocking the fast static analysis path.
Tool dispatch is routed by file category and platform: Windows PE files receive the full stack; Linux/macOS executables receive a subset; media and unknown files receive detect-it-easy, trid, and xorsearch.
Supported File Types
25+ format signatures.
Magic-byte detection.
Extension-based fallback handles interpreted code and script files where magic bytes are absent.
Built for Real-World Operations
Four operational
use cases.
Incident Response Triage
During an active incident, analysts need answers fast. Submit suspicious files from an endpoint or network capture and receive structured reports in seconds — no sandbox queue, no manual RE required to get initial signal.
Phishing & Email Analysis
Drop suspected phishing attachments directly into the pipeline. The engine unpacks archives, extracts Office macros, analyzes embedded PDFs, and scores scripts for obfuscation — surfacing what a document actually does without opening it.
Threat Intelligence Enrichment
Enrich IOC feeds and threat reports with static analysis context. YARA matches tie samples to known threat actor tooling, PE hashes correlate against VirusTotal, and string extraction pulls embedded infrastructure at scale.
SOC Automation
Integrate the intake API with your SOAR platform or SIEM alert pipeline. Every alert involving a suspicious file can trigger an automated analysis job, with results delivered via webhook and indexed for case management.
Azure-Native Architecture
Serverless.
No VMs. No clusters.
The Triage Engine is built entirely on Azure serverless primitives. It scales to zero when idle and handles burst submissions without pre-provisioning. No infrastructure to maintain, no clusters to size.
Get the Triage Engine.
Available as a self-hosted open-source deployment or as a managed cloud service with SLA, custom YARA management, and dedicated support.