Skip to content

CBMC May Struggle with sscanf and strcmp Handling in Symbolic Execution? #8602

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zhoulaifu opened this issue Feb 25, 2025 · 8 comments
Open

Comments

@zhoulaifu
Copy link

zhoulaifu commented Feb 25, 2025

Dear CBMC Team,

Thank you for your incredible work on CBMC, which has been invaluable for our verification tasks. I am writing to kindly inquire if there might be a limitation or bug in CBMC 6.4.1 on Mac OS that prevents it from correctly handling sscanf and strcmp with symbolic strings. I’ve encountered unexpected counterexamples in my tests, and I’d greatly appreciate your guidance or confirmation if this is a known issue.

Test Program

Below is a minimal test program I’ve created to investigate this behavior:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>

#define MAX_BUFFER_SIZE 20

void harness_sscanf() {
    char buffer[MAX_BUFFER_SIZE];

    // Ensure the buffer is null-terminated within bounds
    __CPROVER_assume(buffer[MAX_BUFFER_SIZE - 1] == '\0');

    // Check if input starts with "Hello"
    if (sscanf(buffer, "Hello") == 0) {
        assert(0); // Force CBMC to find an input that starts with "Hello"
    }
}

void harness_strcmp() {
    char buffer[MAX_BUFFER_SIZE]; // Symbolic input buffer

    // Ensure the buffer is null-terminated within bounds
    __CPROVER_assume(buffer[MAX_BUFFER_SIZE - 1] == '\0');
    // Constrain buffer to have at least length 2 (for "Hello") and be reasonable
    __CPROVER_assume(strlen(buffer) >= 2 && strlen(buffer) < MAX_BUFFER_SIZE - 1);

    // Check if input exactly matches "Hello" using strcmp
    if (strcmp(buffer, "Hello") != 0) {
        assert(0); // Force CBMC to find an input that exactly matches "Hello"
    }
}

void harness_simple_check() {
    char buffer[MAX_BUFFER_SIZE]; // Symbolic input buffer

    // Ensure the buffer is null-terminated within bounds
    __CPROVER_assume(buffer[MAX_BUFFER_SIZE - 1] == '\0');
    // Optional: constrain length if needed (e.g., strlen(input) < 9)
    
    if (buffer[0] == 'H' && buffer[1] == 'e' && buffer[2] == 'l' && buffer[3] == 'l' && buffer[4] == 'o') {
        assert(0); // Fail if "HI......!" pattern matches
    }
}

Commands and Counterexamples

I ran the following commands on macOS ARM64 using CBMC 6.4.1:

For harness_sscanf

cbmc test_ssncf.c --function harness_sscanf --no-standard-checks --no-built-in-assertions --unwind 6 --trace

Counterexample Produced:

buffer={ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }

For harness_strcmp

cbmc test_ssncf.c --function harness_strcmp --no-standard-checks --no-built-in-assertions --unwind 6 --trace

Counterexample Produced:

buffer={ 'I', 'd', 0, 2, '`', 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }

For harness_simple_check

cbmc test_ssncf.c --function harness_simple_check --no-standard-checks --no-built-in-assertions --unwind 6 --trace

Counterexample Produced:

 buffer={ 'H', 'e', 'l', 'l', 'o', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }

Expected Behavior

For the first two harnesses, I expected CBMC to produce a counterexample where buffer = "Hello\0..." (e.g., {'H', 'e', 'l', 'l', 'o', 0, 0, ..., 0}). This would indicate that CBMC successfully found an input that:

  • For harness_sscanf, starts with "Hello" (so sscanf(buffer, "Hello") == 1).
  • For harness_strcmp, exactly matches "Hello" (so strcmp(buffer, "Hello") == 0).

Environment

  • Operating System: macOS (ARM64, Apple Silicon M3)
  • CBMC Version: 6.4.1

In sum, this issue looks like a limitation or bug in CBMC 6.4.1 on macOS ARM64 that affects its handling of sscanf and strcmp with symbolic strings. The correct behavior of harness_simple_check suggests CBMC handles direct character comparisons well, but sscanf and strcmp struggle with constraint propagation or symbolic string modeling.

Thank you very much for your time and support!

@zhoulaifu
Copy link
Author

Perhaps this sounds naive, but is it possible that in theory, CBMC it's not guaranteed to always give a valid counterexample that triggers the failure in the original program (assuming that failure is reachable) ? As a model checker, CBMC.should only offer a SAT/SMT solution for an over-approximated model of the safety property of the original program, which might not be a real counterexample? Yet, since CBMC is also a symbolic executor, shouldn't its counterexample be correct? Sorry for my lack of understanding.

@remi-delmas-3000
Copy link
Collaborator

remi-delmas-3000 commented Feb 28, 2025

Hi,
I think you are mistaken on the specification for sscanf. sscanf returns the number of succesfully matched items from the format string. The string "Hello" does not contain format specifiers so sscanf can only return 0 for that format string anyway, i.e. returning zero does not mean that the input string matched "Hello".

if the format string actually contains a format specifier you get the expected result:

#include <stdio.h>
int main() {
    char str[] = "hello world";
    char out[20];
    if(sscanf(str, "hello %s", out) == 1 ) {
      assert(0);
    }
    return 0;
}
scanf.c function main
[main.assertion.1] line 7 assertion 0: FAILURE

Trace for main.assertion.1:

State 12 file scanf.c function main line 4 thread 0
----------------------------------------------------
  str={ 'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', 0 }

@remi-delmas-3000
Copy link
Collaborator

remi-delmas-3000 commented Feb 28, 2025

For the strcmp example, you are also misinterpreting the output of strcmp, which returns 0 when the strings are equal, and a value different from 0 otherwise.

void main_strcmp() {
  char buffer[MAX_BUFFER_SIZE]; // Symbolic input buffer
  // Check if input exactly matches "Hello" using strcmp
  if (strcmp(buffer, "Hello") == 0) {
      assert(0); // Force CBMC to find an input that exactly matches "Hello"
  }
}
Trace for main_strcmp.assertion.1:

State 11 file scanf.c function main_strcmp line 14 thread 0
----------------------------------------------------
  buffer={ 'H', 'e', 'l', 'l', 'o', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }

@remi-delmas-3000
Copy link
Collaborator

You should also make sure the --unwind N directive matches at least the longest string length your program is manipulating, otherwise theres a risk of not unwinding loops found inside of string functions deep enough. You should use --unwinding-assertions to check that loops were fully unwound.

@zhoulaifu
Copy link
Author

zhoulaifu commented Feb 28, 2025

Good point, Remi! I was using a previously used unwind number and forgot to update it. I'll let you know the results.

Your comments on strcmp and sscanf are great. Clearly, my C skills have rusted to the point where I need to restudy my old professor's course 😅

Meanwhile, could you confirm the following: Is the counterexample produced by CBMC guaranteed to trigger the assertion under analysis, or could it be a spurious one, similar to cases in CEGAR?

@remi-delmas-3000
Copy link
Collaborator

remi-delmas-3000 commented Feb 28, 2025

The symbolic execution engine of CBMC does not do over approximations on its own, but some models of the C standard library function may be over approximating. The source code of the CPROVER model of the C standard library is available under src/ansi-c/library/* in the CBMC source tree

@zhoulaifu
Copy link
Author

Hi Remi and cbmc developers.

I have double-checked my usage of sscanf, and the following updated code is still producing an incorrect counterexample with CBMC. I have verified that sscanf is now interpreted correctly, and I have also increased the unwinding limit and used the --unwinding-assertions option.

Command Line:

cbmc test_ssncf.c --function harness_sscanf --no-standard-checks --no-built-in-assertions --unwind 600 --unwinding-assertions --trace

Code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#include <stdbool.h>

#define MAX_BUFFER_SIZE 10

void harness_sscanf() {
    char buffer[MAX_BUFFER_SIZE];
    char dummy[100];
    __CPROVER_assume(buffer[MAX_BUFFER_SIZE - 1] == '\0');

    // Check if input starts with "Hello" followed by at least something such as Hellooo
    if(sscanf(buffer, "Hello%s", dummy) == 1) {
        assert(0); 
    }
}

Unexpected Counterexample:

CBMC produces the following counterexample:

buffer={ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }

This does not seem to match the expected conditions for triggering the assertion failure.

Additional Output:

% cbmc test_ssncf.c --function harness_sscanf --no-standard-checks --no-built-in-assertions --unwind 600 --unwinding-assertions --trace

CBMC version 6.4.1 (cbmc-6.4.1) 64-bit arm64 macOS
...
test_ssncf.c function harness_sscanf
[harness_sscanf.assertion.1] line 16 assertion 0: FAILURE
...
Trace for harness_sscanf.assertion.1:
----------------------------------------------------
State 11 file test_ssncf.c function harness_sscanf line 9 thread 0
buffer={ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 } 

State 12 file test_ssncf.c function harness_sscanf line 10 thread 0
dummy={ 0, 0, 0, 0, ... } 

...
Violated property:
file test_ssncf.c function harness_sscanf line 16 thread 0
assertion 0
!((signed long int)(signed long int)!(0 != 0) != 0l)

** 1 of 7 failed (2 iterations)
VERIFICATION FAILED

Would appreciate any insights! Maybe this is because CBMC does not model sscanf well?

@remi-delmas-3000
Copy link
Collaborator

This is a case where the model of the standard library function is more abstract than a real implementationL

sscanf is modeled like this in this file src/ansi-c/library/stdio.c

int sscanf(const char *restrict s, const char *restrict format, ...)
{
__CPROVER_HIDE:;
  va_list list;
  va_start(list, format);
  int result=vsscanf(s, format, list);
  va_end(list);
  return result;
}

And vsscanf is an abstract model of the operation: it picks a nondet value to return to the caller and havocs the va_list given as arguments, meaning it does not attempt to parse the string and match it to the format, it gives you nondet results that over approximate possible behaviours (if your program is safe under that model of sscanf then it is also safe under a more precise version of it).

int vsscanf(const char *restrict s, const char *restrict format, va_list arg)
{
__CPROVER_HIDE:;
  int result = __VERIFIER_nondet_int();
  (void)*s;
  (void)*format;
#  if(defined(__aarch64__) || defined(_M_ARM64)) && !defined(__APPLE__)
  while((__CPROVER_size_t)__CPROVER_POINTER_OFFSET(arg.__stack) <
        __CPROVER_OBJECT_SIZE(arg.__stack))
  {
    void *a = va_arg(arg, void *);
    __CPROVER_havoc_object(a);
  }
#  else
  while((__CPROVER_size_t)__CPROVER_POINTER_OFFSET(arg) <
        __CPROVER_OBJECT_SIZE(arg))
  {
    void *a = va_arg(arg, void *);
    __CPROVER_havoc_object(a);
  }
#  endif

  return result;
}

Alternatively you could undef sscanf and use your own concrete version of sscanf:

#undef sscanf
int sscanf(const char *str, const char *format, ...)
{
  va_list args;
  va_start(args, format);

  int matched = 0; // Count of successfully matched items
  const char *s = str;
  const char *f = format;

  while(*f && *s)
  {
    // Skip whitespace in both strings
    while(isspace(*f))
      f++;
    while(isspace(*s))
      s++;

    if(*f == '%')
    {
      f++; // Move past '%'

      // Handle width specification
      int width = 0;
      while(isdigit(*f))
      {
        width = width * 10 + (*f - '0');
        f++;
      }
      if(width == 0)
        width = INT_MAX;

      switch(*f)
      {
      case 'd':
      { // Integer
        int *target = va_arg(args, int *);
        int value = 0;
        int sign = 1;

        // Handle sign
        if(*s == '-')
        {
          sign = -1;
          s++;
        }
        else if(*s == '+')
        {
          s++;
        }

        // Parse digits
        int digits_read = 0;
        while(isdigit(*s) && digits_read < width)
        {
          value = value * 10 + (*s - '0');
          s++;
          digits_read++;
        }

        if(digits_read > 0)
        {
          *target = value * sign;
          matched++;
        }
        else
        {
          goto end;
        }
        break;
      }

      case 's':
      { // String
        char *target = va_arg(args, char *);
        int chars_read = 0;

        while(!isspace(*s) && *s && chars_read < width)
        {
          *target++ = *s++;
          chars_read++;
        }
        *target = '\0';

        if(chars_read > 0)
        {
          matched++;
        }
        break;
      }

      case 'c':
      { // Single character
        char *target = va_arg(args, char *);
        if(*s)
        {
          *target = *s++;
          matched++;
        }
        break;
      }

      case 'x':
      { // Hexadecimal
        int *target = va_arg(args, int *);
        int value = 0;
        int digits_read = 0;

        while(digits_read < width)
        {
          if(isdigit(*s))
          {
            value = value * 16 + (*s - '0');
          }
          else if(*s >= 'a' && *s <= 'f')
          {
            value = value * 16 + (*s - 'a' + 10);
          }
          else if(*s >= 'A' && *s <= 'F')
          {
            value = value * 16 + (*s - 'A' + 10);
          }
          else
          {
            break;
          }
          s++;
          digits_read++;
        }

        if(digits_read > 0)
        {
          *target = value;
          matched++;
        }
        break;
      }

      case '[':
      { // Character set
        char *target = va_arg(args, char *);
        f++; // Move past '['

        // Build character set
        char charset[256] = {0};
        int invert = 0;

        if(*f == '^')
        {
          invert = 1;
          f++;
        }

        while(*f && *f != ']')
        {
          charset[(unsigned char)*f] = 1;
          f++;
        }

        // Read matching characters
        int chars_read = 0;
        while(*s && chars_read < width)
        {
          if(charset[(unsigned char)*s] ^ invert)
          {
            *target++ = *s++;
            chars_read++;
          }
          else
          {
            break;
          }
        }
        *target = '\0';

        if(chars_read > 0)
        {
          matched++;
        }
        break;
      }
      }
      f++;
    }
    else
    {
      // Match literal character
      if(*f == *s)
      {
        f++;
        s++;
      }
      else
      {
        goto end;
      }
    }
  }

end:
  va_end(args);
  return matched;
}

void main_sscanf()
{
  char buffer[MAX_BUFFER_SIZE];
  char dummy[MAX_BUFFER_SIZE];

  // Check if input starts with "Hello" followed by at least something such as Hellooo
  if(sscanf(buffer, "He%s", dummy) == 1)
  {
    assert(0);
  }
}

It runs and gives results, but because of all the complexity of the comcrete sscanf you'll have to stick with very short strings and be really patient with the SAT solver:e

Runtime Convert SSA: 17.2826s
Running propositional reduction
Post-processing
Runtime Post-process: 0.0266783s
Solving with External SAT solver
Writing temporary CNF
Invoking SAT solver
Solver returned code: 10
result:SATISFIABLE
Runtime Solver: 5.78116s
Runtime decision procedure: 23.07s
Building error trace
Running propositional reduction
Solving with External SAT solver
Writing temporary CNF
Invoking SAT solver
Solver returned code: 10
result:SATISFIABLE
Runtime Solver: 5.43024s
Runtime decision procedure: 5.43092s
Building error trace
Running propositional reduction
Solving with External SAT solver
Writing temporary CNF
Invoking SAT solver
Solver returned code: 10
result:SATISFIABLE
Runtime Solver: 45.3146s
Runtime decision procedure: 45.3151s
Building error trace
Running propositional reduction
Solving with External SAT solver
Writing temporary CNF
Invoking SAT solver
eSolver returned code: 10
result:SATISFIABLE
Runtime Solver: 50.4762s
Runtime decision procedure: 50.4766s
Building error trace
Running propositional reduction
Solving with External SAT solver
Writing temporary CNF
Invoking SAT solver
Solver returned code: 10
result:SATISFIABLE
Runtime Solver: 49.2203s
Runtime decision procedure: 49.2207s
Building error trace
Running propositional reduction
Solving with External SAT solver
Writing temporary CNF
Invoking SAT solver
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants