Static Application Security Testing using “semgrep”

n00🔑
3 min readJul 1, 2024

--

Static Application Security Testing (SAST) is an essential part of modern software development. It helps developers identify vulnerabilities in their code before it goes into production. One powerful and versatile tool for SAST is Semgrep. Semgrep is a language-agnostic static-analysis tool that supports parsing a variety of languages. It allows developers to define custom rules for identifying vulnerabilities, thus helping run a contextual scan on code. Semgrep can publish results to GitLab’s SAST ecosystem and is available in GitLab Ultimate. In this blog, we’ll explore how to use Semgrep for SAST testing.

Symbols in semgrep:

  1. Ellipsis (…)

similar to “*” in regex.

Pattern

1. pattern-either

similar to OR operator(but for patterns:)

rules:
- id: use-string-equals
message: In Java, do not use == with strings. Use String.equals() instead.
pattern-either:
- pattern: if ($X == "...") ...
- pattern: if ("..." == $Y) ...

2. pattern-not

to filter out patterns we do not want to match.

rules:
- id: subprocess-call
patterns:
- pattern: subprocess.call(...)
- pattern-not: subprocess.call("...", ...)

3. pattern-inside

rules:
- id: http-responsewriter-write
patterns:
- pattern-inside: |
func $FUNC(..., $WRITER http.ResponseWriter, ...) {
...
}
- pattern: $WRITER.Write(...)

In this case, we are looking for the pattern “$WRITER.Write(…)” inside functions having a parameter of type “http.ResponseWriter”.

https://semgrep.dev/learn/composition/3

4. pattern-not-inside(defines area where not to check for pattern)

rules:
- id: secure-flag-not-set
patterns:
- pattern: $RESPONSE.addCookie($COOKIE);
- pattern-not-inside: |
$COOKIE.setSecure(true);
...

pattern-not-inside to filter out cases where setSecure(true) has been called.

One more example of pattern-not-inside. Finding files that are opened but not closed.

rules:
- id: open-never-closed
patterns:
- pattern: $F = open(...)
- pattern-not-inside: |
$F = open(...)
...
$F.close()
message: file object opened without corresponding close
languages:
- python
severity: ERROR

https://semgrep.dev/docs/writing-rules/rule-syntax/#pattern-not-inside

5. metavariable-regex

rules:
- id: use-decimalfield-for-money
patterns:
- pattern-inside: |
class $M(...):
...
- pattern: $F = django.db.models.FloatField(...)
- metavariable-regex:
metavariable: '$F'
regex: '.*(fee|salary|price).*'
message: Found a FloatField used for variable $F. Use DecimalField for currency fields to avoid float-rounding errors.
languages: [python]
severity: ERROR

Matching metavariables based on a regex. In this particular example, we are checking for keywords (fee or salary, or price).

Metavariables

Metavariables must begin with a $ symbol and can only include uppercase letters, numbers, and the underscore character.

Rules Syntax

When writing a custom rule in Semgrep, there are several required fields that must be present at the top level of the rule, immediately under the rules key. These fields include:

  • id: A unique, descriptive identifier for the rule, such as no-unused-variable.
  • message: A message that explains why Semgrep matched this pattern and how to fix the issue. You can find more information about rule messages in the Semgrep documentation.
  • severity: The severity of the issues that the rule potentially detects. This can be one of three values: INFO (low severity), WARNING (medium severity), or ERROR (high severity). Note that Semgrep Supply Chain rules use CVE assignments for severity instead. You can find more information about this in the Filters section of the Semgrep Supply Chain documentation.
  • languages: An array of languages that the rule applies to. You can find a list of supported language extensions and tags in the Semgrep documentation.
  • pattern, patterns, pattern-either, or pattern-regex: These fields specify the pattern or patterns that Semgrep should search for in your code.

You only need to include one of these fields in your rule. The pattern field allows you to specify a single expression to search for, while the patterns field allows you to specify multiple patterns that must all be true (logical AND). The pattern-either field allows you to specify multiple patterns where at least one must be true (logical OR), and the pattern-regex field allows you to specify a PCRE-compatible regular expression to search for.

Scanning with semgrep-

Local repo(current directory)-

semgrep scan --config auto -q

Bit more automation-

semgrep scan --config auto -q | tee default_semgrep.out

semgrep scan --config auto -q --json > semgrep_report.json

semgrep scan --config auto -q --json | jq | tee readable_semgrep_report.json

--

--