**Ideal scenarios:** - Quick security scans (minutes, not hours) - Pattern-based bug detection
# pip python3 -m pip install semgrep # Homebrew brew install semgrep # Docker docker run --rm -v "${PWD}:/src" returntocorp/semgrep semgrep --config auto /src # Update pip install --upgrade semgrep
semgrep --config auto . # Auto-detect rules semgrep --config auto --metrics=off . # Disable telemetry for proprietary code `### 2\. Use Rulesets` semgrep --config p/<RULESET> . # Single ruleset semgrep --config p/security-audit --config p/trailofbits . # Multiple
p/defaultp/security-auditp/owasp-top-tenp/cwe-top-25p/r2c-security-auditp/trailofbitsp/pythonp/javascriptp/golangsemgrep --config p/security-audit --sarif -o results.sarif . # SARIF semgrep --config p/security-audit --json -o results.json . # JSON semgrep --config p/security-audit --dataflow-traces . # Show data flow `### 4\. Scan Specific Paths` semgrep --config p/python app.py # Single file semgrep --config p/javascript src/ # Directory semgrep --config auto --include='**/test/**' . # Include tests (excluded by default)
rules: - id: hardcoded-password languages: [python] message: "Hardcoded password detected: $PASSWORD" severity: ERROR pattern: password = "$PASSWORD"
...func(...)$VAR$FUNC($INPUT)<... ...><... user_input ...>patternpatternspattern-eitherpattern-notpattern-insidepattern-not-insidepattern-regexmetavariable-regexmetavariable-comparisonrules: - id: sql-injection languages: [python] message: "Potential SQL injection" severity: ERROR patterns: - pattern-either: - pattern: cursor.execute($QUERY) - pattern: db.execute($QUERY) - pattern-not: - pattern: cursor.execute("...", (...)) - metavariable-regex: metavariable: $QUERY regex: .*\+.*|.*\.format\(.*|.*%.*
# Pattern `os.system($CMD)` catches this: os.system(user_input) # Found `But misses indirect flows:` # Same pattern misses this: cmd = user_input processed = cmd.strip() os.system(processed) # Missed - no direct match
user_input)cmd = ..., processed = ...)shlex.quote())os.system())rules: - id: command-injection languages: [python] message: "User input flows to command execution" severity: ERROR mode: taint pattern-sources: - pattern: request.args.get(...) - pattern: request.form[...] - pattern: request.json pattern-sinks: - pattern: os.system($SINK) - pattern: subprocess.call($SINK, shell=True) - pattern: subprocess.run($SINK, shell=True, ...) pattern-sanitizers: - pattern: shlex.quote(...) - pattern: int(...) `### Full Rule with Metadata` rules: - id: flask-sql-injection languages: [python] message: "SQL injection: user input flows to query without parameterization" severity: ERROR metadata: cwe: "CWE-89: SQL Injection" owasp: "A03:2021 - Injection" confidence: HIGH mode: taint pattern-sources: - pattern: request.args.get(...) - pattern: request.form[...] - pattern: request.json pattern-sinks: - pattern: cursor.execute($QUERY) - pattern: db.execute($QUERY) pattern-sanitizers: - pattern: int(...) fix: cursor.execute($QUERY, (params,))
# test_rule.py def test_vulnerable(): user_input = request.args.get("id") # ruleid: flask-sql-injection cursor.execute("SELECT * FROM users WHERE id = " + user_input) def test_safe(): user_input = request.args.get("id") # ok: flask-sql-injection cursor.execute("SELECT * FROM users WHERE id = ?", (user_input,))
semgrep --test rules/name: Semgrep on: push: branches: [main] pull_request: schedule: - cron: '0 0 1 * *' # Monthly jobs: semgrep: runs-on: ubuntu-latest container: image: returntocorp/semgrep steps: - uses: actions/checkout@v4 with: fetch-depth: 0 # Required for diff-aware scanning - name: Run Semgrep run: | if [ "${{ github.event_name }}" = "pull_request" ]; then semgrep ci --baseline-commit ${{ github.event.pull_request.base.sha }} else semgrep ci fi env: SEMGREP_RULES: >- p/security-audit p/owasp-top-ten p/trailofbits
tests/fixtures/ **/testdata/ generated/ vendor/ node_modules/ `### Suppress False Positives` password = get_from_vault() # nosemgrep: hardcoded-password dangerous_but_safe() # nosemgrep `## Performance` semgrep --config rules/ --time . # Check rule performance ulimit -n 4096 # Increase file descriptors for large codebases `### Path Filtering in Rules` rules: - id: my-rule paths: include: [src/] exclude: [src/generated/] `## Third-Party Rules` pip install semgrep-rules-manager semgrep-rules-manager --dir ~/semgrep-rules download semgrep -f ~/semgrep-rules .
semgrep --test; false negatives are silent