How to Build an Automated Recon Pipeline with Python and Luigi - Part VI (Wrapping Up)

Jan 22, 2020 | 15 minutes read

Tags: how-to, bug bounty, hack the box, python, recon, luigi

Welcome back! If you found your way here without reading the prior posts in this series, you may want to start with some of the links to previous posts (below). This post is part six of a multi-part series demonstrating how to build an automated pipeline for target reconnaissance. The target in question could be the target of a pentest, bug bounty, or capture the flag challenge (shout out to my HTB peoples!). By the end of the series, we’ll have built a functional recon pipeline that can be tailored to fit your own needs.

Previous posts:

Part VI will:

  • Add an interactive shell, improving user experience

Part VI’s git tags:

  • stage-13

To get the repository to the point at which we’ll start, we can run one of the following commands. Which command used depends on if the repository is already present or not.

git clone --branch stage-12
git checkout tags/stage-12


  • Target scope
  • Port scanning I
  • Port scanning II
  • Subdomain enumeration
  • Web scanning
    • Screenshots
    • Subdomain takeover
    • CORS misconfiguration
    • Forced browsing
    • Tech stack identification
  • Uncharted Territory <– this post
  • Data storage
  • Visualization / reporting
  • Slack integration


Alright, if you’ve made it this far, you’re pretty awesome (even if you came here without reading prior posts, you’re still ok in my book!). Because you’re so awesome, I’m going to level with you. Writing these blog posts in addition to the code has really slowed down my progress on this project. It became a chore to write the posts after writing the code / comments / etc… It became such a chore that I slowed down writing the code along with the blog posts, which is truly bizarre for me. I genuinely enjoy sharing knowledge and content, but for some reason this series was draining.

For better or worse, this will be the last blog post about this tool. I plan on continuing to work on the tool and finish the roadmap, but I want to remove the mental barrier of having to do an accompanying post alongside each addition to the codebase. The past week I’ve been strictly writing code without worrying about the post I’m writing now and it was refreshing. We’ll still step through some of the more interesting pieces of the work done this week, but I don’t plan on continuing the series after this one. Thanks for listening and understanding! Without further ado…

Stage 13 - Interactive Shell

I always knew that wrapping the unwieldy luigi commands and making them more friendly/manageable was going to need tackled at some point. I landed on what I think is a really nice solution to the problem that not only gets those commands under control, it also drastically improves usability and quality of life. The solution is the use of cmd2. Luckily, I was working on a separate project and found cmd2. Let’s take a look at what it’s bringing to the table.


cmd2 is a python package for building CLI programs. It extends the python’s cmd package, which is included in the standard library. I’ve written more than a couple python tools that use cmd and was blown away at just how much better cmd2 is than the original. cmd2 has a laundry list of provided features, but we’re primarily concerned with the following

  • Searchable command history (history command and +r) - optionally persistent
  • Parsing commands with arguments using argparse, including support for subcommands
  • Good tab-completion of commands, subcommands, file system paths, and shell commands
  • Automatic tab-completion of argparse flags when using one of the cmd2 argparse decorators
  • Trivial to provide built-in help for all commands
  • Alerts that seamlessly print while user enters text at prompt
  • Colored and stylized output using

If you’re interested in what else cmd2 can do, check out the documentation.

Some of the things above we get for free, others we’ll get with a little work up front. Just like python’s cmd package, we begin by defining our own class that inherits from the package’s Cmd class.

54class ReconShell(cmd2.Cmd):
55    def __init__(self, *args, **kwargs):
56        self.prompt = "recon-pipeline> "

Note: we’re going to skip over a decent amount of code that makes up the final product in favor of covering what cmd2 does to really make the tool shine

The install Command

After defining our class, any functions that begin with do_ become one of our shell’s commands. We’ll start by creating an install command. The install command is going to handle installation of the myriad tools that our pipeline executes under the hood.

Any command that our program knows about can be tab-completed while in the shell. For instance, hitting tab twice in our shell will show the available commands. Typing i and then hitting tab would complete out to install.

175    def do_install(self, args):
176        """ Install any/all of the libraries/tools necessary to make the recon-pipeline function. """

By default, the docstring for each command becomes the help statement for its associated command. However, ours will differ slightly because we’ll be using cmd2’s extension of ArgumentParser named Cmd2ArgumentParser.

install Options

Speaking of Cmd2ArgumentParser, here’s ours for install.

144# options for ReconShell's 'install' command
145install_parser = cmd2.Cmd2ArgumentParser()
147    "tool", help="which tool to install", choices=list(tools.keys()) + ["all"]

We’re defining an instance of the parser that accepts a positional argument consisting of the key values of our tools dictionary (passed in as the argument’s list of choices). A few example entries from tools are shown below.

 1tools = {
 2    "luigi-service": {
 3        "installed": False,
 4        "dependencies": ["luigi"],
 5        "commands": [
 6            f"cp {str(Path(__file__).parent.parent / 'luigid.service')} /lib/systemd/system/luigid.service",
 7            f"cp $(which luigid) /usr/local/bin",
 8            "systemctl daemon-reload",
 9            "systemctl start luigid.service",
10            "systemctl enable luigid.service",
11        ],
12        "shell": True,
13    },
14    "luigi": {"installed": False, "dependencies": ["pipenv"], "commands": ["pipenv install luigi"]},
15    "pipenv": {
16        "installed": False,
17        "dependencies": None,
18        "commands": ["apt-get install -y -q pipenv"],
19    },

From the snippet above, we know we’ll have at least luigi-service, luigi, and pipenv as possibilities for install’s sole positional argument.

With nothing more than defining the command’s positional argument and a list of choices that it accepts, install now tab-completes all of the available tools that it knows how to install, pretty baller, no?


With that brief bit of background, we’re ready to see what makes install work.

174    @cmd2.with_argparser(install_parser)
175    def do_install(self, args):
176        """ Install any/all of the libraries/tools necessary to make the recon-pipeline function. """
178        # imported tools variable is in global scope, and we reassign over it later
179        global tools
181        # create .cache dir in the home directory, on the off chance it doesn't exist
182        cachedir = Path.home() / ".cache/"
183        cachedir.mkdir(parents=True, exist_ok=True)
185        persistent_tool_dict = cachedir / ".tool-dict.pkl"

Above, we see a decorator that associates the parser we wrote earlier with this command. We also define the path on disk to where we’ll store the tools dictionary. We’ll update this dictionary when a tool is installed and save any changes to disk as a pickled object.

Next, we’ll handle the case when a user runs install all.

187        if args.tool == "all":
188            # show all tools have been queued for installation
189            [
190                self.async_alert(style(f"[-] {x} queued", fg="bright_white"))
191                for x in tools.keys()
192                if not tools.get(x).get("installed")
193            ]
195            for tool in tools.keys():
196                self.do_install(tool)
198            return

Note: self.async_alert displays an important message to the user while they are at a command line prompt. To the user it appears as if an alert message is printed above the prompt and their current input text and cursor location is left alone.

The args variable is where Cmd2ArgumentParser stores the options/arguments passed in during command execution. First, we’ll print all of the tools we plan to install. Next, we simply call the same function for each of the tools in the tools dictionary. Finally, we return, because we don’t want the function to proceed beyond this point (everything is done installing).

Next up, if we’ve made it this far, we can attempt to load the pickled tools dictionary from disk.

200        if persistent_tool_dict.exists():
201            tools = pickle.loads(persistent_tool_dict.read_bytes())

After loading the tools dictionary, we’ll handle any dependencies defined for the tool we’re attempting to install (the example entries above show a nested dependency luigi-service -> luigi -> pipenv).

203        if tools.get(args.tool).get("dependencies"):
204            # get all of the requested tools dependencies
206            for dependency in tools.get(args.tool).get("dependencies"):
207                if tools.get(dependency).get("installed"):
208                    # already installed, skip it
209                    continue
211                self.async_alert(
212                    style(
213                        f"[!] {args.tool} has an unmet dependency; installing {dependency}",
214                        fg="yellow",
215                        bold=True,
216                    )
217                )
219                # install the dependency before continuing with installation
220                self.do_install(dependency)

Similar to how we handled the all argument, we loop over each dependency, check if it’s already installed, and if it isn’t re-call do_install with the dependency.

With that out of the way, we can get to the logic that handles command execution!

222        if tools.get(args.tool).get("installed"):
223            return self.async_alert(style(f"[!] {args.tool} is already installed.", fg="yellow"))
224        else:
226            # list of return values from commands run during each tool installation
227            # used to determine whether the tool installed correctly or not
228            retvals = list()
230            self.async_alert(style(f"[*] Installing {args.tool}...", fg="bright_yellow"))
232            for command in tools.get(args.tool).get("commands"):
233                # run all commands required to install the tool
235                # print each command being run
236                self.async_alert(style(f"[=] {command}", fg="cyan"))
238                if tools.get(args.tool).get("shell"):
240                    # go tools use subshells (cmd1 && cmd2 && cmd3 ...) during install, so need shell=True
241                    proc = subprocess.Popen(
242                        command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE
243                    )
244                else:
246                    # "normal" command, split up the string as usual and run it
247                    proc = subprocess.Popen(
248                        shlex.split(command), stdout=subprocess.PIPE, stderr=subprocess.PIPE
249                    )
251                out, err = proc.communicate()
253                if err:
254                    self.async_alert(style(f"[!] {err.decode().strip()}", fg="bright_red"))
256                retvals.append(proc.returncode)

There’s a lot of code above, but it’s not too difficult to decipher. Once we determine that the command isn’t already installed, we drop into the else block (line 224). At this point we define the retvals list. This list is used to accumulate the return values from each command ran during the tool’s installation. Consider masscan’s list of commands necessary for installation:

 1tools = {
 2    ...
 3    "masscan": {
 4        "installed": False,
 5        "dependencies": None,
 6        "commands": [
 7            "git clone /tmp/masscan",
 8            "make -s -j -C /tmp/masscan",
 9            f"mv /tmp/masscan/bin/masscan {tool_paths.get('masscan')}",
10            "rm -rf /tmp/masscan",
11        ],
12    },
13    ...

The loop that begins on line 232 loops over the given tool’s list of commands. Some of the commands can be run via subprocess.Popen normally, while others require us to pass shell=True to the Popen constructor. The resulting logic branch spans from lines 238 to 249.

Finally, we grab the STDOUT and STDERR of the current command for status reporting and append the return value to retvals.

In our example masscan entry above, any one of the four commands to be run could conceivably fail. Knowing that, we capture the return value for each one and ensure that all of them are zero. If they’re all zero, all steps of tool installation completed successfully.

258        if all(x == 0 for x in retvals):
259            # all return values in retvals are 0, i.e. all exec'd successfully; tool has been installed
261            self.async_alert(style(f"[+] {args.tool} installed!", fg="bright_green"))
263            tools[args.tool]["installed"] = True
264        else:
265            # unsuccessful tool install
267            tools[args.tool]["installed"] = False
269            self.async_alert(
270                style(
271                    f"[!!] one (or more) of {args.tool}'s commands failed and may have not installed properly; check output from the offending command above...",
272                    fg="bright_red",
273                    bold=True,
274                )
275            )

The last step is to store the tools dictionary back to disk.

277        # store any tool installs/failures (back) to disk
278        pickle.dump(tools,"wb"))

The scan Command

Next up, we’ll look at the scan command. This command is the command that will execute those painful luigi commands on our behalf. We’ll need to accomplish a lot of the same steps for this command that we did for install. Our first task will be to define a parser. scan’s parser is going to need to handle any possible option/argument that we use for any of the scans that we’ve written into the pipeline. We won’t do into the tedium of looking at each option, but we will highlight a few interesting ways of doing tab-completion in cmd2.

Tab-Completion Strategies

When using one of the argparse-based decorators provided by cmd2 for argument processing, cmd2 provides automatic tab-completion of option names. This means that things like --wordlist can be tab-completed by typing --w and pressing tab. Additionally, we can easily see all of the available options for any command by typing -- and pressing tab twice.

In addition to flag tab-completion, cmd2 has a couple different helpers for achieving tab-completion of arguments, the simplest is the cmd2.Cmd.path_complete. For any option that expects its argument to be on the file system, we can pass the path_complete method as part of the parser’s add_argument call.

163    "--results-dir",
164    completer_method=cmd2.Cmd.path_complete,
165    help="directory in which to save scan results",

The example above defines the --results-dir option for the scan command. By defining the completer_method as cmd2.Cmd.path_complete, whenever we specify --results-dir we’ll be able to tab-complete the directory just like we were in bash (or w/e fancy shell you’re using…).

Another potential way of getting tab-completion is to provide a function that returns a list of choices.

171    "--interface",
172    choices_function=lambda: [x[1] for x in socket.if_nameindex()],
173    help="which interface masscan should use",

socket.if_nameindex() returns something similar to what’s below

[(1, 'lo'), (2, 'eth0'), (3, 'mpqemubr0-dummy'), (4, 'mpqemubr0')]

So, assuming we’re working with the list of tuples above, the --interface option will tab-complete using lo, eth0, mpqemubr0-dummy, and mpqemubr0 by calling our lambda function.

Tab-completion of argument values can be configured by using one of five parameters to argparse.ArgumentParser.add_argument()

  • choices
  • choices_function
  • choices_method
  • completer_function
  • completer_method


Now that we know how a few tab-completion strategies, we can create a function that will return a list of possible scans for our scan command to use!

First, we’ll create the get_scans function. get_scans should iterate over our recon package and its sub-modules to find all of the classes whose name ends in Scan. By using a part of the class name as a filter while dynamically building the list of scans, we’re creating a contract with our future selves that states the name of any scan must end in the word scan, i.e. MasscanScan, AmassScan, etc…

120def get_scans():
121    """ Iterates over the recon package and its modules to find all of the *Scan classes.
123    *** A contract exists here that says any scans need to end with the word scan in order to be found by this function.
125    Returns:
126        dict() containing mapping of {modulename: classname} for all potential recon-pipeline commands
127        ex:  defaultdict(<class 'list'>, {'AmassScan': ['recon.amass'], 'MasscanScan': ['recon.masscan'], ... })
128    """
129    scans = defaultdict(list)
131    # recursively walk packages; import each module in each package
132    # walk_packages yields ModuleInfo objects for all modules recursively on path
133    # prefix is a string to output on the front of every module name on output.
134    for loader, module_name, is_pkg in pkgutil.walk_packages(path=recon.__path__, prefix="recon."):
135        importlib.import_module(module_name)
137    # walk all modules, grabbing classes that we've written and add them to the classlist defaultdict
138    # getmembers returns all members of an object in a list of tuples (name, value)
139    for name, obj in inspect.getmembers(sys.modules[__name__]):
140        if inspect.ismodule(obj) and not name.startswith("_"):
141            # we're only interested in modules that don't begin with _ i.e. magic methods __len__ etc...
143            for subname, subobj in inspect.getmembers(obj):
144                if inspect.isclass(subobj) and subname.lower().endswith("scan"):
145                    # now we only care about classes that end in [Ss]can
146                    scans[subname].append(name)
148    return scans

The crux of the function above is that we use some of python’s built in introspection capabilities to dynamically build a list of our scans.

The astute reader may be wondering why we’re bothering to return the module name along with the class name. We’ll see why as we peruse the implementation of do_scan a little later in the post.

Finally, we’ll add the positional argument scantype to our parser.

151# options for ReconShell's 'scan' command
152scan_parser = cmd2.Cmd2ArgumentParser()
153scan_parser.add_argument("scantype", choices_function=get_scans)

With that done, we have tab-completion of scan’s positional argument! That’s pretty effin neat.


We’ll start with the function definition and its decorator, just like we did earlier with do_install. We’ll call get_scans and use it to build out the beginning of our luigi command. The only thing below that’s likely to be confusing/opaque is the call to args.__statement__.arg_list. We already know that args is how we access the options and their arguments passed to a given command. cmd2 creates a Statement object when processing options/arguments. args includes the Statement object that was created when parsing the command line and stores it in the __statement__ attribute. An example args is shown below, followed by the beginning of our function.

1Namespace(__statement__=Statement(args='WebanalyzeScan', raw='scan WebanalyzeScan ', command='scan', arg_list=['WebanalyzeScan --target-file tesla --top-ports 1000 --interface eth0'], multiline_command='', terminator='', suffix='', pipe_to='', output='', output_to=''), exempt_list=None, extensions=None, interface='eth0', local_scheduler=False, ports=None, proxy=None, rate=None, recursive=False, results_dir=None, scan_timeout=None, scantype='WebanalyzeScan', target_file=None, threads=None, top_ports='1000', verbose=False, wordlist=None)
136    @cmd2.with_argparser(scan_parser)
137    def do_scan(self, args):
138        """ Scan something.
140        Possible scans include
141            AmassScan           CORScannerScan      GobusterScan        SearchsploitScan
142            ThreadedNmapScan    WebanalyzeScan      AquatoneScan        FullScan
143            MasscanScan         SubjackScan         TKOSubsScan         HTBScan
144        """
145        self.async_alert(
146            style(
147                "If anything goes wrong, rerun your command with --verbose to enable debug statements.",
148                fg="cyan",
149                dim=True,
150            )
151        )
153        # get_scans() returns mapping of {classname: [modulename, ...]} in the recon module
154        # each classname corresponds to a potential recon-pipeline command, i.e. AmassScan, CORScannerScan ...
155        scans = get_scans()
157        # command is a list that will end up looking something like what's below
158        # luigi --module recon.web.webanalyze WebanalyzeScan --target-file tesla --top-ports 1000 --interface eth0
159        command = ["luigi", "--module", scans.get(args.scantype)[0]]
160        command.extend(args.__statement__.arg_list)

After we’ve snagged all the options/arguments passed as part of scan, we can run the resulting luigi command.

163        if args.verbose:
164            # verbose is not a luigi option, need to remove it
165            command.pop(command.index("--verbose"))

What’s been shown so far is the main logic of scan, there is some additional code that handles making the output pretty, but we’re not going to cover it in depth. Hopefully you enjoyed reading some of these posts and/or you find the tool useful. Drop me a line in chat @ NetSec Focus or on Twitter anytime!

comments powered by Disqus