Seeing as this is my first post, and I’m not entirely adjusted to writing blog-posts, I’ll begin with an introduction. I’m Unazed Spectaculum; I’ve been programming for six years with surplus, primarily in Python (as representative of the majority of the repositories on my GitHub), C (for a year) and x86{-64} (mainly studying documents like ABIs, learning about compiler theory and so forth in conjunction with assembly). This post will be of theme to the blog (if I see a reason for it to continue), that is, reverse engineering, as I’m genuinely interested in it, and other subsidiary topics such as binary exploitation.

Furthermore, I digress into the focal point of this post, i.e., reversing a simple crackme sourced here with the goal of recreating the source code (with the certain optimizations applied by GCC). Although it may seem slightly pointless to try and reverse engineer a binary whose source we have, even if we mayn’t explicitly refer back to it, we will still have some notion of what it does and hence have an obvious advantage during reverse engineering; however, my perspective is not to necessarily try and correlate bits of assembly to bits of C, rather understand the optimizations and transformations applied by the compiler.

Now, to begin, allow me to give the C source-code:

#include <stdio.h>
#include <string.h>

int main(int argc, char** argv) {

    if (argc != 2) {
        printf("Need exactly one argument.\n");
        return -1;
    }

    char* correct = "password1";

    if (strncmp(argv[1], correct, strlen(correct))) {
        printf("No, %s is not correct.\n", argv[1]);
        return 1;
    } else {
        printf("Yes, %s is correct!\n", argv[1]);
        return 0;
    }

}

As the purpose of this post is not to explain the C, I will move onto the disassembly produced on GNU/Linux Debian x86_64 krnl. 4.4.0, from the optimising GCC compiler with command gcc -O3 -o crackme crackme.c, and using the debugger gdb.

>gdb crackme
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
[...]
Reading symbols from crackme...(no debugging symbols found)...done.
(gdb) b main
Breakpoint 1 at 0x5c0
(gdb) r
Starting program: /home/unazed/crackme

Breakpoint 1, 0x00000000080005c0 in main ()

After running disassemble we are given the following output:

   0x00000000080005c0 <+0>:     cmp    $0x2,%edi
   0x00000000080005c3 <+3>:     push   %rbx
   0x00000000080005c4 <+4>:     jne    0x8000614 <main+84>
   0x00000000080005c6 <+6>:     mov    0x8(%rsi),%rax
   0x00000000080005ca <+10>:    lea    0x22e(%rip),%rdi        # 0x80007ff
   0x00000000080005d1 <+17>:    mov    $0x9,%ecx
   0x00000000080005d6 <+22>:    mov    %rax,%rsi
   0x00000000080005d9 <+25>:    repz cmpsb %es:(%rdi),%ds:(%rsi)
   0x00000000080005db <+27>:    mov    %rax,%rsi
   0x00000000080005de <+30>:    seta   %bl
   0x00000000080005e1 <+33>:    setb   %dl
   0x00000000080005e4 <+36>:    sub    %edx,%ebx
   0x00000000080005e6 <+38>:    movsbl %bl,%ebx
   0x00000000080005e9 <+41>:    test   %ebx,%ebx
   0x00000000080005eb <+43>:    jne    0x80005ff <main+63>
   0x00000000080005ed <+45>:    lea    0x22d(%rip),%rdi        # 0x8000821
   0x00000000080005f4 <+52>:    xor    %eax,%eax
   0x00000000080005f6 <+54>:    callq  0x80005a0 <printf@plt>
   0x00000000080005fb <+59>:    mov    %ebx,%eax
   0x00000000080005fd <+61>:    pop    %rbx
   0x00000000080005fe <+62>:    retq
   0x00000000080005ff <+63>:    lea    0x203(%rip),%rdi        # 0x8000809
   0x0000000008000606 <+70>:    xor    %eax,%eax
   0x0000000008000608 <+72>:    mov    $0x1,%ebx
   0x000000000800060d <+77>:    callq  0x80005a0 <printf@plt>
   0x0000000008000612 <+82>:    jmp    0x80005fb <main+59>
   0x0000000008000614 <+84>:    lea    0x1c9(%rip),%rdi        # 0x80007e4
   0x000000000800061b <+91>:    or     $0xffffffff,%ebx
   0x000000000800061e <+94>:    callq  0x8000590 <puts@plt>
   0x0000000008000623 <+99>:    jmp    0x80005fb <main+59>

Note that, for this and any proceeding reversing posts I will (most likely) continue to use GAS/AT&T-flavoured assembly, it is not that I have any general preference over either Intel-flavoured nor AT&T-flavoured, it is simply at the ease of myself to use the latter. Another thing to keep in mind whilst reading this is that the disassemblies will follow the LP64-model System V ABI conventions prescribed here (under x86-64 psABI), that is, the first three integer parameters are passed in %rdi, %rsi and %rbx (and that is really all the preknowledge that is necessary for reversing this program).

Firstly, I’d like to begin with the analysis by substituting the right-hand hexadecimal addresses to their correspondent datum, this can be achieved by the GDB command x/s <addr> (to read strings at some <addr>), and so, allow me to provide these:

|    address  |            string              |
|-------------|--------------------------------|
| `0x80007ff` | `"password1"`                  |
| `0x8000821` | `"Yes, %s is correct!\n"`      |
| `0x8000809` | `"No, %s is not correct.\n"`   |
| `0x80007e4` | `"Need exactly one argument."` |
   0x00000000080005c0 <+0>:     cmp    $0x2,%edi
   0x00000000080005c3 <+3>:     push   %rbx
   0x00000000080005c4 <+4>:     jne    0x8000614 <main+84>

To dive in, +0 compares argc by 0x2 (recall %[re]di is int argc as defined by the CRT), hereafter preserves %rbx and given that the comparison was not truthy, jumps to +84, allow us to write parallel C code which resembles the control flow of the assembly that we observe:

int
main (int argc, char **argv)
{
  if (argc != 2)
  {
    /* <...> */
  }
  /* <...> */
}

Now from here I’ll lead to decompile everything at +84 and to wherever it leads so that we can complete the if-clause, as we can intuitively presume that it will return from the program abruptly, as it is reached if argc is not equal to 2, i.e. there are not two arguments passed by the caller.

   0x0000000008000614 <+84>:    lea    0x1c9(%rip),%rdi        # "Need exactly one argument."
   0x000000000800061b <+91>:    or     $0xffffffff,%ebx
   0x000000000800061e <+94>:    callq  0x8000590 <puts@plt>
   0x0000000008000623 <+99>:    jmp    0x80005fb <main+59>

We load the address of the right-hand string into the first-parameter slot of the proceeding puts call, hence performing puts ("Need exactly one argument."); and thereafter unconditionally jumping to +59. Note that +91 simply places 0xffffffff into %ebx using bitwise operations.

   0x00000000080005fb <+59>:    mov    %ebx,%eax
   0x00000000080005fd <+61>:    pop    %rbx
   0x00000000080005fe <+62>:    retq

Conclusively, we move the 0xffffffff into %eax and return to the caller, effectively returning -1 in two’s complement (as the signature of the main function is defined int main (int argc, char **argv, char **envp); and int is a signed type. In total these operations lead to the following C:

int
main (int argc, char **argv)
{
  if (argc != 2)
  {
    puts ("Need exactly one argument.");
    return -1;
  }
  /* <...> */
}

Reverting back to the control flow, and passing the conditional jump with a falsey value we reach this:

   0x00000000080005c6 <+6>:     mov    0x8(%rsi),%rax
   0x00000000080005ca <+10>:    lea    0x22e(%rip),%rdi        # "password1"
   0x00000000080005d1 <+17>:    mov    $0x9,%ecx
   0x00000000080005d6 <+22>:    mov    %rax,%rsi
   0x00000000080005d9 <+25>:    repz cmpsb %es:(%rdi),%ds:(%rsi)
   0x00000000080005db <+27>:    mov    %rax,%rsi
   0x00000000080005de <+30>:    seta   %bl
   0x00000000080005e1 <+33>:    setb   %dl
   0x00000000080005e4 <+36>:    sub    %edx,%ebx
   0x00000000080005e6 <+38>:    movsbl %bl,%ebx
   0x00000000080005e9 <+41>:    test   %ebx,%ebx
   0x00000000080005eb <+43>:    jne    0x80005ff <main+63>

+6 moves argv[1] into %rax as %rsi contains char **argv, %rdi is populated with the address to the string "password1", presumably we move 0x9 into %ecx as the length of the string "password1" is 9 characters, hereafter we move the argv[1] into %rsi for subsequent usage in the +25 instruction which compares each byte of the respective strings until %rcx reaches 0. Finally, removing argv[1] into %rsi as it had been clobbered, and then setting the %bl and %dl registers with respect to the ZF/CF flags, to clarify the result we subtract the two bytes and test whether they are zero (equal), if not, jumping to +63, which follows:

EDITORIAL NOTE: %bl and %dl are only set if the strings are not equal at some part, that would mean that either one of them will be 1, however not both, in the case they are equal both will be zero. The purpose of +36 is actually unknown to me, it seems like the three instructions can be simplified down to test %ebx,%edx, jnz 0x80005ff <main+63>, refer to the end for a test on this.

   0x00000000080005ff <+63>:    lea    0x203(%rip),%rdi        # "No, %s is not correct!\n"
   0x0000000008000606 <+70>:    xor    %eax,%eax
   0x0000000008000608 <+72>:    mov    $0x1,%ebx
   0x000000000800060d <+77>:    callq  0x80005a0 <printf@plt>
   0x0000000008000612 <+82>:    jmp    0x80005fb <main+59>

Where we simply printf ("No, %s is not correct!\n", argv[1]); after clearing %eax for the vector-count to printf, and thence +72 for the near return-procedure branch to +59. To sum, we may glimpse at the original C source code and deduce that this is the compiler optimization done for the strncmp call (inlining), which is quite effective, nonetheless it would be pointless to reimplement the methodology used and instead we shall use the strncmp function:

EDITORIAL NOTE: mov $0x1,%ebx implies the return 1; directly after +82

int
main (int argc, char **argv)
{
  if (argc != 2)
  {
    puts ("Need exactly one argument.");
    return -1;
  }
  if (strncmp(argv[1], "password1", 0x9))
  {
    printf ("No, %s is not correct!\n", argv[1]);
    return 1;
  }
}

And so, it would be sensical to assume that the else clause will simply be a printf of the other string, with a return 0; value, as unprogrammatic as it is to make assumptions like this, we may glimpse over the remaining code:

   0x00000000080005ed <+45>:    lea    0x22d(%rip),%rdi        # "Yes, %s is correct!\n"
   0x00000000080005f4 <+52>:    xor    %eax,%eax
   0x00000000080005f6 <+54>:    callq  0x80005a0 <printf@plt>
   0x00000000080005fb <+59>:    mov    %ebx,%eax
   0x00000000080005fd <+61>:    pop    %rbx
   0x00000000080005fe <+62>:    retq

+59 equates to mov $0x0,%eax due to the preceding clause of jne <...> which implies %ebx is 0, and hence we return 0; and our final C code becomes:

int
main (int argc, char **argv)
{
  if (argc != 2)
  {
    puts ("Need exactly one argument.");
    return -1;
  }
  if (strncmp(argv[1], "password1", 0x9))
  {
    printf ("No, %s is not correct!\n", argv[1]);
    return 1;
  } else {
    printf ("Yes, %s is correct!\n", argv[1]);
    return 0;
  }
}

Now to test the optimization I suggested of test %ebx,%edx I will compile the program with gcc -O3 -S crackme.c and alter the assembly code, and recompile with gcc -o crackme crackme.s. … yeah, it didn’t work, it returns any input as correct, I leave it up to the reader to find out why.

Thanks for reading my first post.