Blog


How to Build an Automated Recon Pipeline with Python and Luigi - Part III (Port Scanning II)

Jan 22, 2020 | 12 minutes read

Tags: how-to, bug bounty, hack the box, python, recon, luigi

Welcome back! If you found your way here without reading the prior posts in this series, you may want to start with some of the links to previous posts (below). This post is part three of a multi-part series demonstrating how to build an automated pipeline for target reconnaissance. The target in question could be the target of a pentest, bug bounty, or capture the flag challenge (shout out to my HTB peoples!). By the end of the series, we’ll have built a functional recon pipeline that can be tailored to fit your own needs.

Previous posts:

Part III will:

  • Add a threaded version of nmap to our pipeline
  • Add searchsploit vulnerability check to our pipeline

Part III’s git tags:

  • stage-3
  • stage-4

To get the repository to the point at which we’ll start, we can run one of the following commands. Which command used depends on if the repository is already present or not.

git clone --branch stage-2 https://github.com/epi052/recon-pipeline.git
git checkout tags/stage-2

Roadmap:

  • Target scope
  • Port scanning I
  • Port scanning II <– this post
  • Subdomain enumeration
  • Web scanning
    • Screenshots
    • Subdomain takeover
    • CORS misconfiguration
    • Forced browsing
    • Tech stack identification
  • Data storage
  • Visualization / reporting
  • Slack integration

Stage 3 - Threaded nmap Scanning

If you would like to skip to this point in the code, run the following git command from within the cloned repository: git checkout tags/stage-2

In this post, we’ll add nmap to our pipeline. Our nmap scan will read target information from the pickled dictionary created by the ParseMasscanOutput Task. After reading in the target information, we’ll generate nmap commands that only scan the open ports that belong to each host. To facilitate scans across multiple hosts, we’ll be adding threading to this particular module. Let’s begin.

nmap.ThreadedNmap - Boilerplate (mostly)

We’ll start by adding a new file to our recon module named nmap.py with the following contents.

 1import luigi
 2from luigi.util import inherits
 3
 4from recon.masscan import ParseMasscanOutput
 5
 6@inherits(ParseMasscanOutput)
 7class ThreadedNmap(luigi.Task):
 8    threads = luigi.Parameter(default=10)
 9
10    def requires(self):
11        args = {
12            "rate": self.rate,
13            "target_file": self.target_file,
14            "top_ports": self.top_ports,
15            "interface": self.interface,
16            "ports": self.ports,
17        }
18        return ParseMasscanOutput(**args)
19
20    def output(self):
21        return luigi.LocalTarget(f"{self.target_file}-nmap-results")

If you’ve read Part I and Part II already, nothing in the code above should be surprising. We’re creating a new Task that depends on the ParseMasscanOutput Task. We’re adding a new Parameter and naming it threads and giving it a default of 10. The threads variable will correspond to the number of threads used to run multiple nmaps in parallel. In the output method, we specify the folder on the filesystem that this Task produces. Other than that, this is pretty standard fare.

Threading Made Simple

Next up, we have the run method. The run method is where our heavy lifting is done.

The code below is a simple sanity check for the value passed to the threads Parameter.

23    def run(self):
24        """ Parses pickled target info dictionary and runs targeted nmap scans against only open ports. """
25        try:
26            self.threads = abs(int(self.threads))
27        except TypeError:
28            return logging.error("The value supplied to --threads must be a non-negative integer.")

Next, we load the pickled dictionary of target information from disk and deserialize it.

30        ip_dict = pickle.load(open(self.input().path, "rb"))

After that, we build out a template for our nmap commands. The list below defines the structure we’ll use for each of execution of nmap. There are two placeholders. The first placeholder is where we’ll specify the protocol, either -sT for tcp or -sU for udp. The second placeholder is where we’ll specify the ports to scan.

32        nmap_command = [
33            "nmap",
34            "--open",
35            "PLACEHOLDER-IDX-2" "-n",
36            "-sC",
37            "-T",
38            "4",
39            "-sV",
40            "-Pn",
41            "-p",
42            "PLACEHOLDER-IDX-10",
43            "-oA",
44        ]

As seen above, our nmap commands will resemble what’s below, after replacing the placeholders with meaningful data.

nmap --open -sT -sC -T 4 -sV -Pn -p 43,25,21,53,22 -oA 

Obviously what’s above is not a complete nmap command. There are a few more pieces we need to add to the list before we’re done. The last two pieces of the command are seen below on lines 55 and 57. On line 55, we add the argument to the -oA option, specifying the name of our three output files and the directory in which they’ll live. On line 57, we add the target as the last item in the list, ultimately making it the last part of the command itself. On line 61, we have a python equivalent of mkdir -p. This line won’t error out if the folder already exists and creates parent folders if they’re needed. The folder that’s created is the one we specified in the output method above.

46        commands = list()
47
48        for target, protocol_dict in ip_dict.items():
49            for protocol, ports in protocol_dict.items():
50                tmp_cmd = nmap_command[:]
51                tmp_cmd[2] = "-sT" if protocol == "tcp" else "-sU"
52
53                # arg to -oA, will drop into subdir off curdir
54                tmp_cmd[9] = ports
55                tmp_cmd.append(f"{self.output().path}/nmap.{target}-{protocol}")
56
57                tmp_cmd.append(target)  # target as final arg to nmap
58
59                commands.append(tmp_cmd)
60
61        Path(self.output().path).mkdir(parents=True, exist_ok=True)

The astute reader may be wondering why we’re storing each command list in another list. The answer is that we need all the commands in an iterable in order to add threading to our Task. Python can make simple threading tasks like this incredibly easy (not every threaded program is this simple…). Check out the code below for the implementation.

63        with concurrent.futures.ThreadPoolExecutor(max_workers=self.threads) as executor:
64            executor.map(subprocess.run, commands)

In the code above, we use a ThreadPoolExecutor in order to execute parallel nmaps. The class creates a pool of threads for use. Imagine we have 17 nmap commands to execute. Assuming we passed 10 to the max_workers keyword argument, the first ten will spawn immediately, one per thread. The first nmap command to finish has its thread returned to the pool. Once returned to the pool, the thread will begin the eleventh task. This process repeats until all of the commands are executed.

The call to map on line 64 calls the subprocess.run function max_workers times passing in one command from commands per worker (thread). Pretty neat right? Writing a threaded application doesn’t get much easier than that.

Finalized Code

Here we have the finalized code with comments.

  1import pickle
  2import logging
  3import subprocess
  4import concurrent.futures
  5from pathlib import Path
  6
  7import luigi
  8from luigi.util import inherits
  9
 10from recon.masscan import ParseMasscanOutput
 11
 12
 13@inherits(ParseMasscanOutput)
 14class ThreadedNmap(luigi.Task):
 15    """ Run nmap against specific targets and ports gained from the ParseMasscanOutput Task.
 16
 17    nmap commands are structured like the example below.
 18
 19    nmap --open -sT -sC -T 4 -sV -Pn -p 43,25,21,53,22 -oA htb-targets-nmap-results/nmap.10.10.10.155-tcp 10.10.10.155
 20
 21    The corresponding luigi command is shown below.
 22
 23    PYTHONPATH=$(pwd) luigi --local-scheduler --module recon.nmap ThreadedNmap --target-file htb-targets --top-ports 5000
 24
 25    Args:
 26        threads: number of threads for parallel nmap command execution
 27        rate: desired rate for transmitting packets (packets per second) *--* Required by upstream Task
 28        interface: use the named raw network interface, such as "eth0" *--* Required by upstream Task
 29        top_ports: Scan top N most popular ports *--* Required by upstream Task
 30        ports: specifies the port(s) to be scanned *--* Required by upstream Task
 31        target_file: specifies the file on disk containing a list of ips or domains *--* Required by upstream Task
 32    """
 33
 34    threads = luigi.Parameter(default=10)
 35
 36    def requires(self):
 37        """ ThreadedNmap depends on ParseMasscanOutput to run.
 38
 39        TargetList expects target_file as a parameter.
 40        Masscan expects rate, target_file, interface, and either ports or top_ports as parameters.
 41
 42        Returns:
 43            luigi.Task - ParseMasscanOutput
 44        """
 45        args = {
 46            "rate": self.rate,
 47            "target_file": self.target_file,
 48            "top_ports": self.top_ports,
 49            "interface": self.interface,
 50            "ports": self.ports,
 51        }
 52        return ParseMasscanOutput(**args)
 53
 54    def output(self):
 55        """ Returns the target output for this task.
 56
 57        Naming convention for the output folder is TARGET_FILE-nmap-results.
 58
 59        The output folder will be populated with all of the output files generated by
 60        any nmap commands run.  Because the nmap command uses -oA, there will be three
 61        files per target scanned: .xml, .nmap, .gnmap.
 62
 63        Returns:
 64            luigi.local_target.LocalTarget
 65        """
 66        return luigi.LocalTarget(f"{self.target_file}-nmap-results")
 67
 68    def run(self):
 69        """ Parses pickled target info dictionary and runs targeted nmap scans against only open ports. """
 70        try:
 71            self.threads = abs(int(self.threads))
 72        except TypeError:
 73            return logging.error("The value supplied to --threads must be a non-negative integer.")
 74
 75        ip_dict = pickle.load(open(self.input().path, "rb"))
 76
 77        nmap_command = [  # placeholders will be overwritten with appropriate info in loop below
 78            "nmap",
 79            "--open",
 80            "PLACEHOLDER-IDX-2" "-n",
 81            "-sC",
 82            "-T",
 83            "4",
 84            "-sV",
 85            "-Pn",
 86            "-p",
 87            "PLACEHOLDER-IDX-10",
 88            "-oA",
 89        ]
 90
 91        commands = list()
 92
 93        """
 94        ip_dict structure
 95        {
 96            "IP_ADDRESS":
 97                {'udp': {"161", "5000", ... },
 98                ...
 99                i.e. {protocol: set(ports) }
100        }
101        """
102        for target, protocol_dict in ip_dict.items():
103            for protocol, ports in protocol_dict.items():
104                tmp_cmd = nmap_command[:]
105                tmp_cmd[2] = "-sT" if protocol == "tcp" else "-sU"
106
107                # arg to -oA, will drop into subdir off curdir
108                tmp_cmd[9] = ports
109                tmp_cmd.append(f"{self.output().path}/nmap.{target}-{protocol}")
110
111                tmp_cmd.append(target)  # target as final arg to nmap
112
113                commands.append(tmp_cmd)
114
115        # basically mkdir -p, won't error out if already there
116        Path(self.output().path).mkdir(parents=True, exist_ok=True)
117
118        with concurrent.futures.ThreadPoolExecutor(max_workers=self.threads) as executor:
119            executor.map(subprocess.run, commands)

Stage 4 - Running Searchsploit

Now that we have our nmap scans complete, the first thing we’ll do with them is run searchsploit against the results to check for any low hanging fruit. Let’s go!

nmap.Searchsploit

As we saw above, there is a definite pattern to writing these Tasks now. Let’s start by getting the boilerplate out of the way. We’re still working in the nmap.py file for this Task.

67@inherits(ThreadedNmap)
68class Searchsploit(luigi.Task):
69    def requires(self):
70        args = {
71            "rate": self.rate,
72            "ports": self.ports,
73            "threads": self.threads,
74            "top_ports": self.top_ports,
75            "interface": self.interface,
76            "target_file": self.target_file,
77        }
78        return ThreadedNmap(**args)
79
80    def output(self):
81        return luigi.LocalTarget(f"{self.target_file}-searchsploit-results")

Just like earlier, this code is nothing new. We’re creating a new Task that depends on the ThreadedNmap Task that we just created. Similar to the ThreadedNmap Task, this Task will create a folder of results in which to store each run of searchsploit.

Low Hanging Fruit

Next, we’ll look at the run method, where we execute searchsploit and save the results. If you’re not aware, searchsploit accepts a --nmap option and accepts nmap’s xml results as input. Therefore, we’re going to grab each xml file that we created in the ThreadedNmap task above and pass each one of them to searchsploit for processing.

83    def run(self):
84        for entry in Path(self.input().path).glob("nmap*.xml"):
85            proc = subprocess.run(["searchsploit", "--nmap", str(entry)], stderr=subprocess.PIPE)
86            if proc.stderr:
87                Path(self.output().path).mkdir(parents=True, exist_ok=True)
88
89                # grap the target specifier out of TGT-searchsploit-results/nmap.10.10.10.157-tcp -> i.e. 10.10.10.157
90                target = entry.stem.replace("nmap.", "").replace("-tcp", "").replace("-udp", "")
91
92                Path(
93                    f"{self.output().path}/searchsploit.{target}-{entry.stem[-3:]}.txt"
94                ).write_bytes(proc.stderr)

There’s a lot going on with line 84, so let’s break it down. Based on the results of output from ThreadedNmap (which we access here as this Task’s input()), we look in that directory for any files that start with nmap and end with .xml. Once we have all of the xml files, we iterate over each one.

Line 85 is where we execute searchsploit. The command structure is simple; an example is shown below.

searchsploit --nmap wall-searchsploit-results/nmap.10.10.10.157-tcp.xml 

Line 86 simply checks whether or not our command generated output on STDERR. searchsploit prints to STDERR, so that’s what we’ll need to look to capture any results.

On line 87, we’re creating the directory where our searchsploit results will live. The path we’re using is specified above in the output method. Again, this is python’s equivalent to mkdir -p, so we don’t need to worry about exceptions here.

Next, on line 90, we’re doing some string formatting. The xml files in the directory containing nmap results all conform to the same pattern: nmap.TGT-PROTOCOL.xml. We want to grab the TGT portion of the filename to reuse it for naming our files for this Task. First, we grab the filename (entry.stem) and simply do a .replace() on all of the parts of the string we no longer want. Whether .replace() finds something to replace or not, it returns a string (either altered or unaltered, as appropriate). Knowing that, we can chain .replace() calls together without worrying about if any of the substrings exist or not.

Finally, on line 92, we create the output file where we’ll store our results and write what we saw on STDERR to that file. Hopefully the only potentially confusing part of this line is entry.stem[-3:]. All we’re doing with that particular snippet of code boils down to stripping off the .xml from the filename. Our final naming convention resembles some of the earlier Tasks we created and can be seen below.

htb-targets-searchsploit-results/searchsploit.10.10.10.154-tcp.txt

Finalized Code

Our finalized code can be seen below in all its commented and docstrung glory!

128@inherits(ThreadedNmap)
129class Searchsploit(luigi.Task):
130    """ Run searchcploit against each nmap*.xml file in the TARGET-nmap-results directory and write results to disk.
131
132    searchsploit commands are structured like the example below.
133
134    searchsploit --nmap htb-targets-nmap-results/nmap.10.10.10.155-tcp.xml
135
136    The corresponding luigi command is shown below.
137
138    PYTHONPATH=$(pwd) luigi --local-scheduler --module recon.nmap Searchsploit --target-file htb-targets --top-ports 5000
139
140    Args:
141        threads: number of threads for parallel nmap command execution *--* Required by upstream Task
142        rate: desired rate for transmitting packets (packets per second) *--* Required by upstream Task
143        interface: use the named raw network interface, such as "eth0" *--* Required by upstream Task
144        top_ports: Scan top N most popular ports *--* Required by upstream Task
145        ports: specifies the port(s) to be scanned *--* Required by upstream Task
146        target_file: specifies the file on disk containing a list of ips or domains *--* Required by upstream Task
147    """
148
149    def requires(self):
150        """ Searchsploit depends on ThreadedNmap to run.
151
152        TargetList expects target_file as a parameter.
153        Masscan expects rate, target_file, interface, and either ports or top_ports as parameters.
154        ThreadedNmap expects threads
155
156        Returns:
157            luigi.Task - ThreadedNmap
158        """
159        args = {
160            "rate": self.rate,
161            "ports": self.ports,
162            "threads": self.threads,
163            "top_ports": self.top_ports,
164            "interface": self.interface,
165            "target_file": self.target_file,
166        }
167        return ThreadedNmap(**args)
168
169    def output(self):
170        """ Returns the target output for this task.
171
172        Naming convention for the output folder is TARGET_FILE-searchsploit-results.
173
174        The output folder will be populated with all of the output files generated by
175        any searchsploit commands run.
176
177        Returns:
178            luigi.local_target.LocalTarget
179        """
180        return luigi.LocalTarget(f"{self.target_file}-searchsploit-results")
181
182    def run(self):
183        """ Grabs the xml files created by ThreadedNmap and runs searchsploit --nmap on each one, saving the output. """
184        for entry in Path(self.input().path).glob("nmap*.xml"):
185            proc = subprocess.run(["searchsploit", "--nmap", str(entry)], stderr=subprocess.PIPE)
186            if proc.stderr:
187                Path(self.output().path).mkdir(parents=True, exist_ok=True)
188
189                # grap the target specifier out of TGT-searchsploit-results/nmap.10.10.10.157-tcp -> i.e. 10.10.10.157
190                target = entry.stem.replace("nmap.", "").replace("-tcp", "").replace("-udp", "")
191
192                Path(
193                    f"{self.output().path}/searchsploit.{target}-{entry.stem[-3:]}.txt"
194                ).write_bytes(proc.stderr)

Before we finish up, we can test out our new addition to the pipeline with the following command.

PYTHONPATH=$(pwd) luigi --local-scheduler --module recon.nmap SearchSploit --target-file htb-targets --top-ports 1000

That wraps things up for this post. In the next installment, we’ll incorporate subdomain enumeration into our pipeline!

Additional Resources

  1. ThreadPoolExecutor
  2. SearchSploit – The Manual

comments powered by Disqus