Tags: how-to, bug bounty, hack the box, python, recon, luigi
Welcome back! If you found your way here without reading the prior posts in this series, you may want to start with some of the links to previous posts (below). This post is part six of a multi-part series demonstrating how to build an automated pipeline for target reconnaissance. The target in question could be the target of a pentest, bug bounty, or capture the flag challenge (shout out to my HTB peoples!). By the end of the series, we’ll have built a functional recon pipeline that can be tailored to fit your own needs.
Previous posts:
Part VI will:
Part VI’s git tags:
To get the repository to the point at which we’ll start, we can run one of the following commands. Which command used depends on if the repository is already present or not.
git clone --branch stage-12 https://github.com/epi052/recon-pipeline.git
git checkout tags/stage-12
Roadmap:
Alright, if you’ve made it this far, you’re pretty awesome (even if you came here without reading prior posts, you’re still ok in my book!). Because you’re so awesome, I’m going to level with you. Writing these blog posts in addition to the code has really slowed down my progress on this project. It became a chore to write the posts after writing the code / comments / etc… It became such a chore that I slowed down writing the code along with the blog posts, which is truly bizarre for me. I genuinely enjoy sharing knowledge and content, but for some reason this series was draining.
For better or worse, this will be the last blog post about this tool. I plan on continuing to work on the tool and finish the roadmap, but I want to remove the mental barrier of having to do an accompanying post alongside each addition to the codebase. The past week I’ve been strictly writing code without worrying about the post I’m writing now and it was refreshing. We’ll still step through some of the more interesting pieces of the work done this week, but I don’t plan on continuing the series after this one. Thanks for listening and understanding! Without further ado…
I always knew that wrapping the unwieldy luigi commands and making them more friendly/manageable was going to need tackled at some point. I landed on what I think is a really nice solution to the problem that not only gets those commands under control, it also drastically improves usability and quality of life. The solution is the use of cmd2. Luckily, I was working on a separate project and found cmd2. Let’s take a look at what it’s bringing to the table.
cmd2 is a python package for building CLI programs. It extends the python’s cmd package, which is included in the standard library. I’ve written more than a couple python tools that use cmd and was blown away at just how much better cmd2 is than the original. cmd2 has a laundry list of provided features, but we’re primarily concerned with the following
If you’re interested in what else cmd2 can do, check out the documentation.
Some of the things above we get for free, others we’ll get with a little work up front. Just like python’s cmd package, we begin by defining our own class that inherits from the package’s Cmd class.
54class ReconShell(cmd2.Cmd):
55 def __init__(self, *args, **kwargs):
56 self.prompt = "recon-pipeline> "
Note: we’re going to skip over a decent amount of code that makes up the final product in favor of covering what cmd2 does to really make the tool shine
After defining our class, any functions that begin with do_
become one of our shell’s commands. We’ll start by creating an install
command. The install
command is going to handle installation of the myriad tools that our pipeline executes under the hood.
Any command that our program knows about can be tab-completed while in the shell. For instance, hitting tab twice in our shell will show the available commands. Typing
i
and then hitting tab would complete out toinstall
.
175 def do_install(self, args):
176 """ Install any/all of the libraries/tools necessary to make the recon-pipeline function. """
By default, the docstring for each command becomes the help statement for its associated command. However, ours will differ slightly because we’ll be using cmd2’s extension of ArgumentParser named Cmd2ArgumentParser.
Speaking of Cmd2ArgumentParser, here’s ours for install
.
144# options for ReconShell's 'install' command
145install_parser = cmd2.Cmd2ArgumentParser()
146install_parser.add_argument(
147 "tool", help="which tool to install", choices=list(tools.keys()) + ["all"]
148)
We’re defining an instance of the parser that accepts a positional argument consisting of the key values of our tools
dictionary (passed in as the argument’s list of choices). A few example entries from tools
are shown below.
1tools = {
2 "luigi-service": {
3 "installed": False,
4 "dependencies": ["luigi"],
5 "commands": [
6 f"cp {str(Path(__file__).parent.parent / 'luigid.service')} /lib/systemd/system/luigid.service",
7 f"cp $(which luigid) /usr/local/bin",
8 "systemctl daemon-reload",
9 "systemctl start luigid.service",
10 "systemctl enable luigid.service",
11 ],
12 "shell": True,
13 },
14 "luigi": {"installed": False, "dependencies": ["pipenv"], "commands": ["pipenv install luigi"]},
15 "pipenv": {
16 "installed": False,
17 "dependencies": None,
18 "commands": ["apt-get install -y -q pipenv"],
19 },
20-------------8<-------------
From the snippet above, we know we’ll have at least luigi-service, luigi, and pipenv as possibilities for install
’s sole positional argument.
With nothing more than defining the command’s positional argument and a list of choices that it accepts, install
now tab-completes all of the available tools that it knows how to install, pretty baller, no?
With that brief bit of background, we’re ready to see what makes install
work.
174 @cmd2.with_argparser(install_parser)
175 def do_install(self, args):
176 """ Install any/all of the libraries/tools necessary to make the recon-pipeline function. """
177
178 # imported tools variable is in global scope, and we reassign over it later
179 global tools
180
181 # create .cache dir in the home directory, on the off chance it doesn't exist
182 cachedir = Path.home() / ".cache/"
183 cachedir.mkdir(parents=True, exist_ok=True)
184
185 persistent_tool_dict = cachedir / ".tool-dict.pkl"
Above, we see a decorator that associates the parser we wrote earlier with this command. We also define the path on disk to where we’ll store the tools
dictionary. We’ll update this dictionary when a tool is installed and save any changes to disk as a pickled object.
Next, we’ll handle the case when a user runs install all
.
187 if args.tool == "all":
188 # show all tools have been queued for installation
189 [
190 self.async_alert(style(f"[-] {x} queued", fg="bright_white"))
191 for x in tools.keys()
192 if not tools.get(x).get("installed")
193 ]
194
195 for tool in tools.keys():
196 self.do_install(tool)
197
198 return
Note: self.async_alert displays an important message to the user while they are at a command line prompt. To the user it appears as if an alert message is printed above the prompt and their current input text and cursor location is left alone.
The args
variable is where Cmd2ArgumentParser stores the options/arguments passed in during command execution. First, we’ll print all of the tools we plan to install. Next, we simply call the same function for each of the tools in the tools
dictionary. Finally, we return, because we don’t want the function to proceed beyond this point (everything is done installing).
Next up, if we’ve made it this far, we can attempt to load the pickled tools dictionary from disk.
200 if persistent_tool_dict.exists():
201 tools = pickle.loads(persistent_tool_dict.read_bytes())
After loading the tools dictionary, we’ll handle any dependencies defined for the tool we’re attempting to install (the example entries above show a nested dependency luigi-service -> luigi -> pipenv).
203 if tools.get(args.tool).get("dependencies"):
204 # get all of the requested tools dependencies
205
206 for dependency in tools.get(args.tool).get("dependencies"):
207 if tools.get(dependency).get("installed"):
208 # already installed, skip it
209 continue
210
211 self.async_alert(
212 style(
213 f"[!] {args.tool} has an unmet dependency; installing {dependency}",
214 fg="yellow",
215 bold=True,
216 )
217 )
218
219 # install the dependency before continuing with installation
220 self.do_install(dependency)
Similar to how we handled the all argument, we loop over each dependency, check if it’s already installed, and if it isn’t re-call do_install
with the dependency.
With that out of the way, we can get to the logic that handles command execution!
222 if tools.get(args.tool).get("installed"):
223 return self.async_alert(style(f"[!] {args.tool} is already installed.", fg="yellow"))
224 else:
225
226 # list of return values from commands run during each tool installation
227 # used to determine whether the tool installed correctly or not
228 retvals = list()
229
230 self.async_alert(style(f"[*] Installing {args.tool}...", fg="bright_yellow"))
231
232 for command in tools.get(args.tool).get("commands"):
233 # run all commands required to install the tool
234
235 # print each command being run
236 self.async_alert(style(f"[=] {command}", fg="cyan"))
237
238 if tools.get(args.tool).get("shell"):
239
240 # go tools use subshells (cmd1 && cmd2 && cmd3 ...) during install, so need shell=True
241 proc = subprocess.Popen(
242 command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE
243 )
244 else:
245
246 # "normal" command, split up the string as usual and run it
247 proc = subprocess.Popen(
248 shlex.split(command), stdout=subprocess.PIPE, stderr=subprocess.PIPE
249 )
250
251 out, err = proc.communicate()
252
253 if err:
254 self.async_alert(style(f"[!] {err.decode().strip()}", fg="bright_red"))
255
256 retvals.append(proc.returncode)
There’s a lot of code above, but it’s not too difficult to decipher. Once we determine that the command isn’t already installed, we drop into the else block (line 224). At this point we define the retvals
list. This list is used to accumulate the return values from each command ran during the tool’s installation. Consider masscan
’s list of commands necessary for installation:
1tools = {
2 ...
3 "masscan": {
4 "installed": False,
5 "dependencies": None,
6 "commands": [
7 "git clone https://github.com/robertdavidgraham/masscan /tmp/masscan",
8 "make -s -j -C /tmp/masscan",
9 f"mv /tmp/masscan/bin/masscan {tool_paths.get('masscan')}",
10 "rm -rf /tmp/masscan",
11 ],
12 },
13 ...
The loop that begins on line 232 loops over the given tool’s list of commands. Some of the commands can be run via subprocess.Popen normally, while others require us to pass shell=True
to the Popen constructor. The resulting logic branch spans from lines 238 to 249.
Finally, we grab the STDOUT and STDERR of the current command for status reporting and append the return value to retvals
.
In our example masscan
entry above, any one of the four commands to be run could conceivably fail. Knowing that, we capture the return value for each one and ensure that all of them are zero. If they’re all zero, all steps of tool installation completed successfully.
258 if all(x == 0 for x in retvals):
259 # all return values in retvals are 0, i.e. all exec'd successfully; tool has been installed
260
261 self.async_alert(style(f"[+] {args.tool} installed!", fg="bright_green"))
262
263 tools[args.tool]["installed"] = True
264 else:
265 # unsuccessful tool install
266
267 tools[args.tool]["installed"] = False
268
269 self.async_alert(
270 style(
271 f"[!!] one (or more) of {args.tool}'s commands failed and may have not installed properly; check output from the offending command above...",
272 fg="bright_red",
273 bold=True,
274 )
275 )
The last step is to store the tools
dictionary back to disk.
277 # store any tool installs/failures (back) to disk
278 pickle.dump(tools, persistent_tool_dict.open("wb"))
Next up, we’ll look at the scan
command. This command is the command that will execute those painful luigi commands on our behalf. We’ll need to accomplish a lot of the same steps for this command that we did for install
. Our first task will be to define a parser. scan
’s parser is going to need to handle any possible option/argument that we use for any of the scans that we’ve written into the pipeline. We won’t do into the tedium of looking at each option, but we will highlight a few interesting ways of doing tab-completion in cmd2.
When using one of the argparse-based decorators provided by cmd2 for argument processing, cmd2 provides automatic tab-completion of option names. This means that things like --wordlist
can be tab-completed by typing --w
and pressing tab. Additionally, we can easily see all of the available options for any command by typing --
and pressing tab twice.
In addition to flag tab-completion, cmd2 has a couple different helpers for achieving tab-completion of arguments, the simplest is the cmd2.Cmd.path_complete
. For any option that expects its argument to be on the file system, we can pass the path_complete
method as part of the parser’s add_argument
call.
162scan_parser.add_argument(
163 "--results-dir",
164 completer_method=cmd2.Cmd.path_complete,
165 help="directory in which to save scan results",
166)
The example above defines the --results-dir
option for the scan
command. By defining the completer_method as cmd2.Cmd.path_complete
, whenever we specify --results-dir
we’ll be able to tab-complete the directory just like we were in bash (or w/e fancy shell you’re using…).
Another potential way of getting tab-completion is to provide a function that returns a list of choices.
170scan_parser.add_argument(
171 "--interface",
172 choices_function=lambda: [x[1] for x in socket.if_nameindex()],
173 help="which interface masscan should use",
174)
socket.if_nameindex()
returns something similar to what’s below
[(1, 'lo'), (2, 'eth0'), (3, 'mpqemubr0-dummy'), (4, 'mpqemubr0')]
So, assuming we’re working with the list of tuples above, the --interface
option will tab-complete using lo, eth0, mpqemubr0-dummy, and mpqemubr0 by calling our lambda function.
Tab-completion of argument values can be configured by using one of five parameters to argparse.ArgumentParser.add_argument()
Now that we know how a few tab-completion strategies, we can create a function that will return a list of possible scans for our scan
command to use!
First, we’ll create the get_scans
function. get_scans
should iterate over our recon package and its sub-modules to find all of the classes whose name ends in Scan. By using a part of the class name as a filter while dynamically building the list of scans, we’re creating a contract with our future selves that states the name of any scan must end in the word scan, i.e. MasscanScan, AmassScan, etc…
120def get_scans():
121 """ Iterates over the recon package and its modules to find all of the *Scan classes.
122
123 *** A contract exists here that says any scans need to end with the word scan in order to be found by this function.
124
125 Returns:
126 dict() containing mapping of {modulename: classname} for all potential recon-pipeline commands
127 ex: defaultdict(<class 'list'>, {'AmassScan': ['recon.amass'], 'MasscanScan': ['recon.masscan'], ... })
128 """
129 scans = defaultdict(list)
130
131 # recursively walk packages; import each module in each package
132 # walk_packages yields ModuleInfo objects for all modules recursively on path
133 # prefix is a string to output on the front of every module name on output.
134 for loader, module_name, is_pkg in pkgutil.walk_packages(path=recon.__path__, prefix="recon."):
135 importlib.import_module(module_name)
136
137 # walk all modules, grabbing classes that we've written and add them to the classlist defaultdict
138 # getmembers returns all members of an object in a list of tuples (name, value)
139 for name, obj in inspect.getmembers(sys.modules[__name__]):
140 if inspect.ismodule(obj) and not name.startswith("_"):
141 # we're only interested in modules that don't begin with _ i.e. magic methods __len__ etc...
142
143 for subname, subobj in inspect.getmembers(obj):
144 if inspect.isclass(subobj) and subname.lower().endswith("scan"):
145 # now we only care about classes that end in [Ss]can
146 scans[subname].append(name)
147
148 return scans
The crux of the function above is that we use some of python’s built in introspection capabilities to dynamically build a list of our scans.
The astute reader may be wondering why we’re bothering to return the module name along with the class name. We’ll see why as we peruse the implementation of
do_scan
a little later in the post.
Finally, we’ll add the positional argument scantype
to our parser.
151# options for ReconShell's 'scan' command
152scan_parser = cmd2.Cmd2ArgumentParser()
153scan_parser.add_argument("scantype", choices_function=get_scans)
With that done, we have tab-completion of scan
’s positional argument! That’s pretty effin neat.
We’ll start with the function definition and its decorator, just like we did earlier with do_install
. We’ll call get_scans
and use it to build out the beginning of our luigi command. The only thing below that’s likely to be confusing/opaque is the call to args.__statement__.arg_list
. We already know that args
is how we access the options and their arguments passed to a given command. cmd2 creates a Statement object when processing options/arguments. args
includes the Statement object that was created when parsing the command line and stores it in the __statement__
attribute. An example args
is shown below, followed by the beginning of our function.
1Namespace(__statement__=Statement(args='WebanalyzeScan', raw='scan WebanalyzeScan ', command='scan', arg_list=['WebanalyzeScan --target-file tesla --top-ports 1000 --interface eth0'], multiline_command='', terminator='', suffix='', pipe_to='', output='', output_to=''), exempt_list=None, extensions=None, interface='eth0', local_scheduler=False, ports=None, proxy=None, rate=None, recursive=False, results_dir=None, scan_timeout=None, scantype='WebanalyzeScan', target_file=None, threads=None, top_ports='1000', verbose=False, wordlist=None)
136 @cmd2.with_argparser(scan_parser)
137 def do_scan(self, args):
138 """ Scan something.
139
140 Possible scans include
141 AmassScan CORScannerScan GobusterScan SearchsploitScan
142 ThreadedNmapScan WebanalyzeScan AquatoneScan FullScan
143 MasscanScan SubjackScan TKOSubsScan HTBScan
144 """
145 self.async_alert(
146 style(
147 "If anything goes wrong, rerun your command with --verbose to enable debug statements.",
148 fg="cyan",
149 dim=True,
150 )
151 )
152
153 # get_scans() returns mapping of {classname: [modulename, ...]} in the recon module
154 # each classname corresponds to a potential recon-pipeline command, i.e. AmassScan, CORScannerScan ...
155 scans = get_scans()
156
157 # command is a list that will end up looking something like what's below
158 # luigi --module recon.web.webanalyze WebanalyzeScan --target-file tesla --top-ports 1000 --interface eth0
159 command = ["luigi", "--module", scans.get(args.scantype)[0]]
160 command.extend(args.__statement__.arg_list)
After we’ve snagged all the options/arguments passed as part of scan
, we can run the resulting luigi command.
163 if args.verbose:
164 # verbose is not a luigi option, need to remove it
165 command.pop(command.index("--verbose"))
166
167 subprocess.run(command)
What’s been shown so far is the main logic of scan
, there is some additional code that handles making the output pretty, but we’re not going to cover it in depth. Hopefully you enjoyed reading some of these posts and/or you find the tool useful. Drop me a line in chat @ NetSec Focus or on Twitter anytime!