Tags: how to, bug bounty, hack the box, python, recon, luigi
Welcome to part one of a multi-part series demonstrating how to build an automated pipeline for target reconnaissance. The target in question could be the target of a pentest, bug bounty, or capture the flag challenge (shout out to my HTB peoples!). Each post in this series has an associated git tag in the repository for readers’ ease of use. By the end of the series, we’ll have built a functional recon pipeline that can be tailored to fit whatever needs you have.
Part I will:
Part I’s git tags:
As this is a ‘how-to’ series, don’t be concerned if you don’t know about a particular topic to be covered. All of the steps are clearly laid out. The roadmap below outlines topics covered in future posts.
Roadmap:
All right, enough with the intro, let’s dive in!
Note to Readers: If you find yourself wanting to know more about classes and Object Oriented Programming (OOP) @0xghostwriter recommends this youtube series on the subject. Special thanks to ghostwriter for reaching out and sharing!
Luigi is a python library written by the folks at Spotify. Its purpose is to chain multiple tasks together and automate them. The tasks can be just about anything. According to the documentation:
Luigi is a Python (2.7, 3.6, 3.7 tested) package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.
Imagine you have a tool that needs to run to produce output. Another tool uses that output as its input (i.e., nmap scan produces xml; xml sent to the next tool as input). Consider the next logical step; a third tool uses the output from the second tool as its input. This is the type of scenario that Luigi was built to handle.
A naive approach to automating this sort of behavior is to write a wrapper script that executes each tool in turn, hoping that no tool in the chain runs into any errors. If it does, the script likely needs to be rerun from the beginning. Luigi, on the other hand, can recover from the last successful chain in the pipeline. For instance, say that we’ve run masscan
and nmap
successfully but the pipeline breaks while running the third tool nikto
. On the next run of the pipeline, Luigi picks up from where it left off, skipping the two successful scans.
Luigi also has a lot of pretty cool features, such as its task scheduler, dependency visualizer, process synchronization, error notifications, task status monitoring, admin web panel and a whole bunch of other stuff. We’ll be using some of these pieces naturally as they come up in development. In short, Luigi is pretty legit. Before we move past Luigi, we need to discuss a few fundamental ideas about how it works; let’s do that now.
There are two fundamental building blocks of Luigi; Tasks and Targets. Each Target corresponds to a file on disk or some observable checkpoint (row in a database, file in an S3 bucket, remote target responsiveness, etc). Targets are fairly straightforward.
Tasks are the more interesting of the two concepts. Tasks are a single unit of work. Tasks define what happens during that section of the pipeline. Tasks take Targets as input, and (usually) create Targets as output. Additionally, Tasks can specify their dependence on another class. Here is a visualization of a simple Task dependency and the related Targets.
In the image, the Database dump Task expects a DB Target as input. After successful execution, it produces the dump.txt Target. Compute Toplist Task uses the dump.txt Target as its input. The Compute Toplist Task creates the toplist.txt Target. Also, the Compute Toplist Task requires the Dump Database Task. We’ll see many of these relationships written out in code as we progress.
A simple idea to understand about Luigi is that one can specify what one wants to build, and then backtrack to find out what is required to fulfill the request. If we were executing our above example, we would tell Luigi that we want to run the Compute Toplist Task. Luigi would then walk that Task’s dependencies backward (including any other dependencies found along the way) until reaching the beginning of the pipeline. Once luigi finds the beginning Task, execution begins. If this sounds similar to how GNU’s Make utility works, it should, Luigi’s creator based Luigi’s design on Make.
That’s enough background to get us started. We’ll be diving into code later that demonstrates some of what we’ve already discussed. Before we can get to the code, we need to set up our development environment; let’s begin!
This guide assumes a few things about your operating system/environment.
We won’t cover how to install python (though on linux, it should just ‘be there’), we also won’t cover startup scripts for different init systems. If you don’t meet one or more of these requirements, that’s ok. Just understand that where you deviate from requirements, you’re on your own (you can @me on twitter if you’re hard stuck and we’ll work it out).
Our first step is to install luigi. We’ll do this inside of a python virtual environment. My virtual environment manager preference is pipenv. Let’s get pipenv installed.
apt install pipenv
After that, we’ll clone the git repository we’ll be working with throughout these posts. We’re going to be using git tags to track significant checkpoints within the code. As such, the command below is how we’ll grab the baseline repository.
git clone --branch pipenv-install https://github.com/epi052/recon-pipeline.git
git options used:
clone
Clone a repository into a new directory
--branch
checkout <branch> instead of the remote's HEAD (can be used for tags as well)
Now we have a place to work! Let’s use the Pipfiles included in our repository to install luigi.
cd recon-pipeline
pipenv install
If everything went well, we should see output similar to what’s below.
Creating a virtualenv for this project…
Pipfile: /opt/recon-pipeline/Pipfile
Using /usr/bin/python3.7m (3.7.3) to create virtualenv…
⠴ Creating virtual environment...Using base prefix '/usr'
New python executable in /home/epi/.local/share/virtualenvs/recon-pipeline-nDSyRWzr/bin/python3.7m
Also creating executable in /home/epi/.local/share/virtualenvs/recon-pipeline-nDSyRWzr/bin/python
Installing setuptools, pip, wheel...
done.
Running virtualenv with interpreter /usr/bin/python3.7m
✔ Successfully created virtual environment!
Virtualenv location: /home/epi/.local/share/virtualenvs/recon-pipeline-nDSyRWzr
Installing dependencies from Pipfile.lock (e32771)…
🐍 ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 7/7 — 00:00:01
To activate this project's virtualenv, run pipenv shell.
Alternatively, run a command inside the virtualenv with pipenv run.
To make use of our virtual environment, we use the command below (it’s also in the pipenv output).
pipenv shell
A simple test while in our virtual environment tells us if everything worked correctly.
python -c 'import luigi'
If there is no error output, we’ve successfully installed luigi!
Now that we have luigi installed, we can create our first Task. We need a way to feed input into our pipeline. Specifically, we’ll want to define the scope of our target. According to hackerone scope is defined as
A collection of assets that hackers are to hack on.
For us, this boils down to either a list of ip addresses or a list of domains. Instead of trying to automate some method of pulling in scope files from bugcrowd, hackerone, synack, or some other platform, we can instead manually create the scope file and place it on disk for luigi to ingest. This approach allows us to use the pipeline for any of the bug bounty platforms, pentest targets, hack the box/CTFs, etc…
Though, there is a contract of sorts that we’ll place upon ourselves with the scope file. We’ll eventually take different actions later in the pipeline based on whether or not the file contains ip addresses or domains. That means that, for the sake of simplicity, the scope file should only contain either ip addresses or domain names, not both.
To structure our directory layout, we’ll begin by creating a python module inside of our repository.
mkdir recon
touch recon/__init__.py
After that, inside of our recon module, we’ll create targets.py
.
touch recon/targets.py
Now our directory structure should look like this:
recon-pipeline
├── LICENSE
├── Pipfile
├── Pipfile.lock
├── README.md
└── recon
├── __init__.py
└── targets.py
Let’s spend a few minutes to look at the basics of the luigi Task class.
A Task describes a unit of work and is the base unit of work in luigi. To create a luigi Task, we’ll need to create a class that inherits from luigi.Task. We’ll also need to override a few methods:
run()
- contains the logic to be performed by this Taskoutput()
- the output Target that this task creates (e.g., a file, database entry, etc…)requires()
- the list of Tasks that this Task depends onEach piece of functionality we add to the pipeline is some form of Task, so it’s essential to cover the basics before continuing.
Now that we have a file to work in, and we’ve covered the bare-bones essentials of the Task class, let’s start taking a look at some code!
targets.py
holds our Task class that handles our scope file. Recall that this file is generated manually by the user. Typically, luigi Tasks get their input from some source, so ours is a special case for which the luigi creators planned. In luigi, when we need to say that a source outside of luigi generates the Task’s output, we use an ExternalTask. An ExternalTask is a subclass of luigi.Task discussed above, and doesn’t require overriding the run()
method.
1import shutil
2import logging
3import ipaddress
4
5import luigi
6
7
8class TargetList(luigi.ExternalTask):
9 target_file = luigi.Parameter()
10 -------------8<-------------
Each luigi Task can have Parameters. A Parameter handles creating the class’s constructor and a command-line parser option for that particular Task. We’ll see how to use Parameters from the command line shortly.
11 def output(self):
12 try:
13 with open(self.target_file) as f:
14 first_line = f.readline()
15 ipaddress.ip_interface(first_line.strip()) # is it a valid ip/network?
16 except OSError as e:
17 # can't open file; log error / return nothing
18 return logging.error(f"opening {self.target_file}: {e.strerror}")
19 except ValueError as e:
20 # exception thrown by ip_interface; domain name assumed
21 logging.debug(e)
22 with_suffix = f"{self.target_file}.domains"
23 else:
24 # no exception thrown; ip address found
25 with_suffix = f"{self.target_file}.ips"
26
27 shutil.copy(self.target_file, with_suffix) # copy file with new extension
28 return luigi.LocalTarget(with_suffix)
Parameters are how we’ll pass user-controlled input to our class. In this case, it is the path to our scope file. A LocalTarget represents a local file on the file system. The LocalTarget here is what this particular Task produced and what it passes to tasks further down the pipeline.
The high-level description of this Task is that it opens the file specified by the user in the --target-file
command-line option (seen below). It reads the first line to determine whether the file contains ip addresses or domain names (remember our contract of only one or the other?). After making that determination, it copies the target_file
with either .ips
or .domains
appended to the filename. That’s it. The LocalTarget returned from this Task is available to the next Task in the pipeline by calling self.input()
.
We can update our local source code to what’s seen above (with docstrings/comments) by running the following command.
git checkout stage-0
To run the pipeline, we’ll need to set our PYTHONPATH
environment variable to the path of our project on disk. We can set the environment variable in a few ways; outlined below are two solutions.
PYTHONPATH=/path/to/recon-pipline
to any luigi pipeline command being run.export PYTHONPATH=/path/to/recon-pipeline
to your .bashrc
We also need to specify --local-scheduler
on the command line. While the --local-scheduler
flag is useful for development purposes, it’s not recommended for production usage. There is also a centralized scheduler that runs as a system service and serves two purposes:
For now, we’ll stick with --local-scheduler
. As our pipeline becomes larger, we’ll swap over to the central scheduler.
With our PYTHONPATH
setup, luigi commands take on the following structure (prepend PYTHONPATH
if not exported from .bashrc
):
luigi --module PACKAGENAME.MODULENAME CLASSNAME *args
We can get options for each module by running luigi --module PACKAGENAME.MODULENAME CLASSNAME --help
An example help statement:
luigi --module recon.targets TargetList --help
══════════════════════════════════════════════
usage: luigi [--local-scheduler] [--module CORE_MODULE] [--help] [--help-all]
[--TargetList-target-file TARGETLIST_TARGET_FILE]
[--target-file TARGET_FILE]
[Required root task]
positional arguments:
Required root task Task family to run. Is not optional.
optional arguments:
--local-scheduler Use an in-memory central scheduler. Useful for
testing.
--module CORE_MODULE Used for dynamic loading of modules
--help Show most common flags and all task-specific flags
--help-all Show all command line flags
--TargetList-target-file TARGETLIST_TARGET_FILE
--target-file TARGET_FILE
Notice the --target-file
option that we specified as a Parameter in our code above. Putting it all together, we can see an example scope file command, where tesla
is the name of the file, and it is located in the current directory (ensure you’re in your python virtual environment).
echo 127.0.0.1 > tesla
PYTHONPATH=$(pwd) luigi --local-scheduler --module recon.targets TargetList --target-file tesla
═══════════════════════════════════════════════════════════════════════════════════════════════
DEBUG: Checking if TargetList(target_file=tesla) is complete
INFO: Informed scheduler that task TargetList_tesla_591d3b1ff1 has status DONE
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Asking scheduler for work...
DEBUG: Done
DEBUG: There are no more tasks to run at this time
INFO: Worker Worker(salt=092373507, workers=1, host=main, username=epi, pid=13645) was stopped. Shutting down Keep-Alive thread
INFO:
===== Luigi Execution Summary =====
Scheduled 1 tasks of which:
* 1 complete ones were encountered:
- 1 TargetList(target_file=tesla)
Did not run any tasks
This progress looks :) because there were no failed tasks or missing dependencies
===== Luigi Execution Summary =====
After running the command above, we see a new file in our current directory named tesla.ips
.
cat tesla.ips
═════════════
127.0.0.1
Here we have the finalized code with comments.
1import shutil
2import logging
3import ipaddress
4
5import luigi
6
7
8class TargetList(luigi.ExternalTask):
9 """ External task. `TARGET_FILE` is generated manually by the user from target's scope. """
10
11 target_file = luigi.Parameter()
12
13 def output(self):
14 """ Returns the target output for this task. target_file.ips || target_file.domains
15
16 In this case, it expects a file to be present in the local filesystem.
17 By convention, TARGET_NAME should be something like tesla or some other
18 target identifier. The returned target output will either be target_file.ips
19 or target_file.domains, depending on what is found on the first line of the file.
20
21 Example: Given a TARGET_FILE of tesla where the first line is tesla.com; tesla.domains
22 is written to disk.
23
24 Returns:
25 luigi.local_target.LocalTarget
26 """
27 try:
28 with open(self.target_file) as f:
29 first_line = f.readline()
30 ipaddress.ip_interface(first_line.strip()) # is it a valid ip/network?
31 except OSError as e:
32 # can't open file; log error / return nothing
33 return logging.error(f"opening {self.target_file}: {e.strerror}")
34 except ValueError as e:
35 # exception thrown by ip_interface; domain name assumed
36 logging.debug(e)
37 with_suffix = f"{self.target_file}.domains"
38 else:
39 # no exception thrown; ip address found
40 with_suffix = f"{self.target_file}.ips"
41
42 shutil.copy(self.target_file, with_suffix) # copy file with new extension
43 return luigi.LocalTarget(with_suffix)
That wraps things up for this post. In the next installment, we’ll add masscan
into our pipeline!