Research

Publications & research lines

My research is on agentic AI systems for software engineering — pipelines that combine LLMs with static-analysis tooling for vulnerability detection, triage, and remediation. Conducted at Simon Fraser University under Dr. Mohammad Tayebi.

Publications

ZeroFalse: Improving Precision in Static Analysis with LLMs

Under review RAID 2026 2026

Mohsen Iranmanesh, Sina Moradi Sabet, Sina Marefat, Ali Javidi Ghasr, Allison Wilson, Iman Sharafaldin, Mohammad A. Tayebi

A multi-stage LLM pipeline that takes raw static-analyzer alerts and triages them through contextual reasoning and structured evidence validation, reducing false positives without sacrificing recall. Evaluated 10 frontier LLMs across 6 model families (Gemini, GPT, Grok, Mistral, DeepSeek, Qwen) on the OWASP Java Benchmark (1,974 cases / 10 CWE categories) and CWE-bench — a real-world dataset of 755 CodeQL alerts across 56 project–CVE pairs from 37 open-source Java repositories. CWE-specialized prompting improved F1 by up to +0.26 on real-world code; best F1 is 0.912 on OWASP and 0.837 on CWE-bench.

First-author submission, currently under review at RAID 2026.

arXiv

Ongoing research lines

AutoSec — fully agentic vulnerability remediation

End-to-end multi-agent pipeline: a static-analyzer agent surfaces candidate vulnerabilities, an LLM triage agent validates them against code context, a patch-generation agent proposes fixes, and a verification agent runs the patched code through the test suite and re-analyzes for regressions. Current focus: multi-stage prioritization to improve both precision and remediation coverage.

ThreatEZ — bottom-up threat modeling

A 6-phase static-analysis-grounded multi-agent pipeline that derives system architecture and STRIDE threats directly from source code — no manually authored data-flow diagram required. Maps findings to NIST 800-53 controls. Shipped as a VS Code extension; evaluated via an LLM-as-Judge harness with semantic threat matching against human-authored ground truth.

CVE-Bench — reproducible CVE exploitation & patching

LangGraph-orchestrated agentic pipeline that reproduces and patches documented CVEs end-to-end. Curated dataset of 100+ CVEs with dockerized vulnerable + patched builds and an automated exploit-validation loop that verifies the PoC succeeds on the vulnerable image and fails on the patched one.