Table of Contents

Executive Summary Introduction How We Measured Accuracy How Testing Was Performed Overall Results Dryrun Security Semgrep Snyk Code Github Advanced Security SonarQube Why DryRun Security Outperforms Final Thoughts

2025 SAST Accuracy Report

See how traditional SAST tools compare to Contextual Security Analysis

Download the Report

Executive Summary

Context, not pattern chasing, is what separates modern AppSec from legacy SAST. Using publicly available, intentionally vulnerable applications in Ruby on Rails, Python/Django, C#/ASP.NET Core, and Java/Spring Boot, we ran each scanner—DryRun Security, Semgrep, Snyk Code, GitHub Advanced Security (CodeQL), and SonarQube/SonarCloud—straight out of the box, with no custom rules or tuning. We ignored cosmetic linting and focused on what actually compromises production systems: SQL injection, XSS, SSRF, and hard‑to‑detect logic flaws such as BOLA and IDOR.

The DryRun Security Contextual Security Analysis engine identified 88 percent of all seeded vulnerabilities and the vast majority of logic‑level flaws, while the nearest competitor caught less than half, and most missed logic flaws entirely.

The takeaway is stark: many organizations relying on pattern‑based scanners operate

with a false sense of security, leaving it trivially easy for developers to introduce critical bugs without triggering any alarms. Because DryRun Security achieves high accuracy immediately and allows new checks to be written via Natural Language Code Policies, security teams can finally enforce real‑world protections without rule‑writing marathons or developer fatigue. We encourage readers to try the vulnerability samples in the public test repos against their current SAST. The results will be eye‑opening.

Introduction

Static Application Security Testing (SAST) is a must-have for AppSec teams. By scanning source code for vulnerabilities early, you reduce risk and prevent costly breaches. However, not all SAST tools are created equal—accuracy (i.e., finding true vulnerabilities without swamping teams with false positives) is the key differentiator.

This report compares the accuracy of five prominent tools—DryRun Security, Semgrep, Snyk Code, GitHub Advanced Security (CodeQL), and SonarQube / SonarCloud. The comparison is based on head-to-head public tests in Ruby, Python, C#, and Java.

Our findings show exactly how these solutions handle both classic application security issues and trickier logic flaws such as IDOR, BOLA, and business-logic errors. All of the tools tested market AI augmented capabilities, but only DryRun Security offers an AI-native approach to detection.

How We Measured Accuracy

To fairly compare the accuracy of leading SAST tools, we evaluated performance across two categories which we dubbed classic vulnerabilities and logic flaws. Because the scoring was not weighted, the category groupings offer no material difference in the analysis.

Classic Vulnerabilities	Logic Flaws
Samples included: SQL Injection (SQLi) Cross-Site Scripting (XSS) Server-Side Request Forgery (SSRF) These often appear in OWASP Top 10 lists and are common targets for both scanners and attackers.	Samples included: Insecure Direct Object Reference (IDOR) Resource & User Enumeration Authn/Authz Flaws & Misuse These require a deeper understanding of application behavior and context. Historically, they have been difficult for pattern-matching tools to detect.

How Testing Was Performed

The goal of this research was to assess each tool’s accuracy out of the box, without weeks of tuning or rule-writing. That’s the reality most AppSec teams face—and that’s where accuracy matters most. So, the basic rules for the tests:

Stock Install Only: All tools were run with default configurations. No custom rules, tuning, or additional settings were applied to any tool—including DryRun Security.
No Prior Knowledge: Tools were tested blind to the seeded vulnerabilities.

Expert Validation: Each tool’s results were manually verified for accuracy.

Each test application contained a fixed set of known vulnerabilities, including a mix of classic vulnerabilities and logic flaws. A total of 26 vulnerabilities were seeded across four apps (Ruby: 6, Python: 5, C#: 6, Java: 9). To ensure a level playing field, all testing was performed using public, intentionally vulnerable applications written in one of four major languages and frameworks.

The code is publicly available and can be validated on the provided GitHub pull request links below. The vulnerability by vulnerability analysis reports are also publicly available at the links below. We’ve explicitly documented the testing for transparency and reproducibility, and as of publication, no vendor has successfully disputed the findings.

Individual Test Results

This table shows the languages and frameworks we tested in early 2025 with a link to the actual GitHub pull request and our breakdown of the results for that language/framework.

Language/Framework	Raw Results in Public GitHub Pull Requests	Full Analysis w/ Code Breakdown
Ruby on Rails	railsgoat/pull/9	Report (link)
Python with Django	vtm/pull/5	Report (link)
C# with ASP.NET Core	dvcsharp-api/pull/7	Report (link)
Java with Spring Boot	javaspringvulny/pull/3	Report (link)

Overall Results

The chart below shows the total results of all four individual tests. DryRun Security accurately identified 23 out of 26 vulnerabilities, nearly twice as many as the closest traditional SAST tool.

Results for each SAST tool are summarized below.

Across the four public head‑to‑head trials, RailsGoat (Ruby on Rails), Vulnerable Task Manager (Python/Django), Damn Vulnerable C# API (ASP.NET Core), and a Spring Boot demo, DryRun Security’s Contextual Security Analysis engine surfaced 23 of the 26 deliberately injected vulnerabilities straight out of the box.

It posted perfect scores in the Rails (6/6), Django (5/5), and C# (6/6) tests and missed only the two edge‑case logic flaws that every scanner missed in the Java exercise and one XSS variant also in the Java test. Competitors rarely cleared even a third of the same flaw set, and none of them detected the IDOR, broken‑auth, or user‑enumeration weaknesses that DryRun Security flagged in three of the four languages.

DryRun Security emerged as the clear leader in our accuracy tests, identifying over 88 percent of the total vulnerability samples—the most coverage of any tool assessed, and almost twice as accurate as the next closest tool. It was also the sole platform that consistently nailed the toughest category of issues: complex, business‑logic flaws that typically slip past pattern‑matching scanners.

Beyond excelling at logic‑level attacks, DryRun Security detected more standard application‑security bugs (SQLi, XSS, SSRF, etc.) than its peers, thanks to its AI‑native Contextual Security Analysis engine. That same engine powers Natural Language Code Policies (NLCP), enabling teams to articulate custom security rules in plain English and extend detection to emerging, organization‑specific risks.

Semgrep’s out‑of‑the‑box scans surfaced 12 out of 26, or roughly forty six percent of the deliberately injected bugs across the four language tests. It was good at spotting the more obvious SQLi, XSS, and SSRF patterns but leaving most business‑logic flaws on the table. In the RailsGoat benchmark it earned 2 of 6 findings, however the tool missed one SQLi and the Semgrep team contacted us after the testing to let us know that they released a new core rule to catch this. The C# showdown told a similar story: Semgrep flagged three classic issues yet missed every IDOR and broken‑auth sample we planted.

The gap stems from its design philosophy—Semgrep relies on pattern‑based rule files, which the company and a vibrant community maintain. That model makes it fast, transparent, and (in its OSS form) free, which is why many ASPM and SAST vendors embed the forked version of Semgrep, Opengrep, inside their own products. For teams with staff to curate rules, Semgrep can be a solid first layer of defense; however, our tests show it struggles with accuracy and lacks code context.

Snyk Code’s results place it in third. Across the four public benchmarks it surfaced 10 out of 26, or only about thirty eight percent of the planted vulnerabilities. It caught a handful of SQL injection, XSS, and SSRF instances but missed every logic‑level flaw such as IDOR, broken authentication, and user enumeration. In RailsGoat it flagged two of six issues; in the C# showdown it managed three of six classic bugs yet still missed all context‑dependent ones; in the Python/Django test it recorded no findings. It shined in the Java tests with five out of nine possible detections, and it successfully found one XSS that DryRun Security missed.

Snyk’s scan output is relatively low‑noise and wrapped in a polished developer UX, which many teams appreciate. However, those strengths do not offset the language gaps outside of Java and its heavy reliance on pattern‑based signatures. When a vulnerability does not match a predefined rule, Snyk stays silent, leaving organizations with a dangerously partial view of risk.

GitHub Advanced Security (CodeQL) ranked fourth of the five tools we tested, surfacing only 8 out of 26, or about thirty percent of the seeded vulnerabilities. Despite its reputation for surgical precision, the default query set let numerous well‑known SQL injection and XSS issues slip through and failed to catch any logic‑level flaws. Bridging those gaps typically demands hand‑writing custom CodeQL queries—a powerful but decidedly steeper lift than crafting Semgrep rules.

Scans also ran noticeably slower than the other contenders, though tight native integration with GitHub and generally low false‑positive noise remain clear advantages. In practice, teams willing to trade speed and simplicity for depth and ecosystem convenience may find value, but out‑of‑the‑box accuracy is too low for CodeQL to stand alone as a primary safeguard.

SonarQube finished a distant fifth. Across the entire test suite it surfaced only 2 out of 26, or about eight percent of the planted vulnerabilities. It overlooked almost every classic SQLi, XSS, and SSRF sample and all logic‑level flaws. That outcome reflects its design: SonarQube is basically a robust code‑quality linter with a security “hotspot” overlay.

It can flag code stylistic problems, minor misconfigurations, and typo‑level mistakes, but it lacks the depth needed to catch real‑world attacks. For developer hygiene and light security checks it’s fine, yet teams that lean on it as their primary SAST risk shipping critical bugs without a whisper of warning.

Key takeaway: DryRun Security identified over 88% of seeded vulnerabilities across classic vulnerabilities and logic flaws. The leading competitors found only a fraction of the issues, with the nearest competitor catching less than half of the issues. Most of the tools besides DryRun Security missed logic flaws entirely.

Why DryRun Security Outperforms

Contextual Analysis vs. Pattern Matching
DryRun Security built Contextual Security Analysis, an AI-native approach that “understands” what the code does, enabling detection of advanced flaws that pattern-matching tools just can’t identify.

Real-Time Developer Experience
Alerts appear quickly in pull requests, consolidated into a single, clear summary—no messy dashboards or dozens of repeated alerts that clutter up the developer’s experience..

No Tuning Required
DryRun Security provides high accuracy immediately, but can be expanded using natural language code policies (NLCPs) which let users enforce security policies (e.g., “don’t allow logging of PII or customer data”) without a specialized DSL.

Minimal False Positives
Developers enjoy using DryRun Security because it reliably flags real issues, not hypothetical “hotspots.” This improves adoption and reduces time to remediation.

Final Thoughts

We realize that testing is always difficult which is why we set up a laboratory environment to run these tests. Unfortunately in real world scenarios, it is likely that the tools tested will perform even worse in accuracy because they lack context. We recommend trying DryRun Security in your environment against your current SAST tooling to see how accuracy makes all the difference and to help your team solve application security issues that have felt unsolvable until now.

About DryRun Security

DryRun Security created Contextual Security Analysis to detect vulnerabilities ranging from OWASP Top 10 to application-specific logic flaws—all with minimal configuration. Founded by developers who saw the need for practical security, DryRun integrates seamlessly into modern CI/CD pipelines, providing clear, actionable insights that don’t bog down teams.

Want to see more for yourself?
We’d love to show you how you can have better coverage and fewer false positives. Reach out to DryRun Security to set up a demo and we’ll walk you through how it can revolutionize your AppSec program.

Thank you for submitting your information!
The report will be sent to your provided email address shortly.

Oops! Something went wrong while submitting the form.