Tags: how-to, bug bounty, hack the box, python, recon, luigi
Welcome back! If you found your way here without reading the prior posts in this series, you may want to start with some of the links to previous posts (below). This post is part four of a multi-part series demonstrating how to build an automated pipeline for target reconnaissance. The target in question could be the target of a pentest, bug bounty, or capture the flag challenge (shout out to my HTB peoples!). By the end of the series, we’ll have built a functional recon pipeline that can be tailored to fit your own needs.
Previous posts:
Part IV will:
Part IV’s git tags:
To get the repository to the point at which we’ll start, we can run one of the following commands. Which command used depends on if the repository is already present or not.
git clone --branch stage-4 https://github.com/epi052/recon-pipeline.git
git checkout tags/stage-4
Roadmap:
This post really marks the point at which I anticipate readers taking the pipeline and tweaking it to suit their needs. There are tons of methodologies that can be used to enumerate subdomains when given a top-level domain name (check out some at pentester.land’s compilation of recon workflows). This post will cover adding OWASP’s amass scanner to the pipeline. I don’t plan on covering subdomain enumeration any further than that. Mainly because this isn’t a series of posts about finding subdomains, it’s about building a pipeline. To that end, we’ll gather subdomains with amass and then, in later posts, proceed to doing interesting things with the subdomains identified. I invite you to use what you’ve learned so far and incorporate your own subdomain tactics into your own pipeline (or if you’re feeling generous, submit them back in the form of a Pull Request!).
Before we get to the code, let’s get amass installed. I’m lazy, and snaps are pretty easy to manage, so we’ll install the amass snap.
If you want to install a different way than what’s shown, head over to the Installation Guide.
snap install amass
Once that’s complete, installation is done. Nice, eh?
First, we’ll take a moment to figure out what we want our scans to do. There are a lot of options for amass, but we’re going to focus on active subdomain enumeration. A run of amass against tesla.com would look something like what’s below.
Side Note: If you’ve got the time to spend, the talk from BugCrowd’s LevelUp 0x04 shows a lot of different ways to integrate amass into your recon workflow and is likely to answer any questions you have about amass; check it out here
amass enum -active -ip -brute -min-for-recursive 3 -df tesla -json amass.tesla.json
amass options used:
enum
Perform DNS enumeration and network mapping of systems exposed to the Internet
-active
Enable active recon methods
-ip
Show the IP addresses for discovered names
-brute
Perform brute force subdomain enumeration
-min-for-recursive N
Number of labels in a subdomain before recursive brute forcing
-df
Path to a file providing root domain names
-json
Path to the JSON output file
Most of the options are self-explanatory. -min-for-recursive
may lead to some confusion, so we’ll turn to the amass project leader Jeff Foley for a brief explanation.
Brute forcing will begin on example.com right away. Recursive brute forcing takes place on additional labels, such as the cs.example.com or careers.example.com subdomain names. What if you do not want to start recursive brute forcing on every new subdomain name you discover? What if you would like some evidence that careers.example.com is worth brute forcing?
If you specify the ‘-min-for-recursive 2’ flag, two labals need to show up on careers.example.com before recursive brute forcing will begin, such as the www.careers.example.com and support.careers.example.com subdomain names. The flag allows you to control when recursive brute forcing will be triggered.
So, the lower the number passed to -min-for-recursive
, the more aggressive our recursion profile. Good to know.
We’ll move forward with the command structure above, however, if it’s too agressive for your particular use case, please feel free to tweak it as you see fit. The amass user’s guide is a great resource if you want to change the command at all.
With our plan in place, let’s look at the code. For our AmassScan class, we’ll use the ExternalProgramTask class as our base, just like our Masscan class.
11@inherits(TargetList)
12class AmassScan(ExternalProgramTask):
13 exempt_list = luigi.Parameter(default="")
There is an important thing to note in our code above, and that is how execution of the pipeline will flow. When we specify that AmassScan inherits from TargetList, we’re saying that AmassScan will be hierarchically located directly below targets.TargetList and a sibling of masscan.Masscan (remember the first two posts? I know it’s been a minute).
We essentially create a second branch in our pipeline that handles domains while the other handles ip addresses.
For now, this is sufficient. Later on in this post we’ll cover how to tie the two branches together!
We’re including a new Parameter in AmassScan called exempt_list
. The reason for this Parameter is that some bug bounty scopes have expressly verboten subdomains and/or top-level domains. At the time of this writing, the Xfinity program on bugcrowd forbade any exploitation of login.xfinity.com
(shown below).
When a program has out of scope domains/subdomains, we don’t want to waste time by including them in our pipeline. That’s where amass’s -blf
option comes in! -blf
accepts a Path to a file providing blacklisted subdomains. Using our earlier amass example as a baseline, an amass run against xfinity may look something like what’s below.
amass enum -active -ip -brute -min-for-recursive 3 -df xfinity -json amass.xfinity.json -blf xfinity.blacklist
Next, let’s check our standard functions that make up these Tasks.
15 def requires(self):
16 return TargetList(self.target_file)
17
18 def output(self):
19 return luigi.LocalTarget(f"amass.{self.target_file}.json")
Staying true to previous Tasks, we’ll let luigi know that executing this Task will produce a file named amass.TARGET_FILE.json
. Additionally, a TARGET_FILE must be present.
Now we’ll explore the .run
method, which as you know by now, constitutes the core logic of the Task. Recall that when we inherit from ExternalProgramTask, all we need to do is return a list from the overridden .program_args
method. That list is then passed to the subprocess module for execution.
21 def program_args(self):
22 command = [
23 "amass",
24 "enum",
25 "-active",
26 "-ip",
27 "-brute",
28 "-min-for-recursive",
29 "3",
30 "-df",
31 self.input().path,
32 "-json",
33 f"amass.{self.target_file}.json",
34 ]
35
36 if self.exempt_list:
37 command.append("-blf") # Path to a file providing blacklisted subdomains
38 command.append(self.exempt_list)
39
40 return command
There’s not much going on here. The command is broken up across a single list. The result of running targets.TargetList is passed to the -df
option and we specifiy the output path of our JSON file. Lastly, if there are out-of-scope domains, the -blf
option and its argument are appended to the list. That’s it; eazy peazy lemon squeezy!
As usual, here’s the finalized code.
1import json
2import ipaddress
3
4import luigi
5from luigi.util import inherits
6from luigi.contrib.external_program import ExternalProgramTask
7
8from recon.targets import TargetList
9
10
11@inherits(TargetList)
12class AmassScan(ExternalProgramTask):
13 """ Run amass scan to perform subdomain enumeration of given domain(s).
14
15 Expects TARGET_FILE.domains file to be a text file with one top-level domain per line.
16
17 Commands are similar to the following
18
19 amass enum -ip -brute -active -min-for-recursive 3 -df tesla -json amass.tesla.json
20
21 Args:
22 exempt_list: Path to a file providing blacklisted subdomains, one per line.
23 target_file: specifies the file on disk containing a list of ips or domains *--* Required by upstream Task
24 """
25
26 exempt_list = luigi.Parameter(default="")
27
28 def requires(self):
29 """ AmassScan depends on TargetList to run.
30
31 TargetList expects target_file as a parameter.
32
33 Returns:
34 luigi.ExternalTask - TargetList
35 """
36 return TargetList(self.target_file)
37
38 def output(self):
39 """ Returns the target output for this task.
40
41 Naming convention for the output file is amass.TARGET_FILE.json.
42
43 Returns:
44 luigi.local_target.LocalTarget
45 """
46 return luigi.LocalTarget(f"amass.{self.target_file}.json")
47
48 def program_args(self):
49 """ Defines the options/arguments sent to amass after processing.
50
51 Returns:
52 list: list of options/arguments, beginning with the name of the executable to run
53 """
54 command = [
55 "amass",
56 "enum",
57 "-active",
58 "-ip",
59 "-brute",
60 "-min-for-recursive",
61 "3",
62 "-df",
63 self.input().path,
64 "-json",
65 f"amass.{self.target_file}.json",
66 ]
67
68 if self.exempt_list:
69 command.append("-blf") # Path to a file providing blacklisted subdomains
70 command.append(self.exempt_list)
71
72 return command
With amass execution complete, we now need to process the results. Our goal in this section is to take amass’s JSON results and yank out each ip address (v4 and v6) as well as each subdomain. The reasoning is that tools further down the pipeline may expect one or the other, so we’ll be prepared in either case. Let’s goooooooo!
We’ll begin with more of the same standard code we’re used to.
78@inherits(AmassScan)
79class ParseAmassOutput(luigi.Task):
80 def requires(self):
81 args = {"target_file": self.target_file, "exempt_list": self.exempt_list}
82 return AmassScan(**args)
Nothing out of the ordinary with the code above. However, we want this particular Task to produce three files, one for ipv4, one for ipv6 and a third for subdomains. We haven’t returned anything except single files or folders thus far, but luigi makes it simple to do exactly what we need and is demonstrated below.
7 def output(self):
8 return {
9 "target-ips": luigi.LocalTarget(f"{self.target_file}.ips"),
10 "target-ip6s": luigi.LocalTarget(f"{self.target_file}.ip6s"),
11 "target-subdomains": luigi.LocalTarget(f"{self.target_file}.subdomains"),
12 }
To round out our ParseAmassOutput
class, we have the .run
method. Our job here is to parse the JSON file produced by AmassScan
and categorize the results into ip address and subdomain files.
Before we can start parsing the JSON, we need to take a look at the output file and see what we’re dealing with. Below we see an example entry produced by amass.
{
"Timestamp": "2019-09-22T19:20:13-05:00",
"name": "beta-partners.tesla.com",
"domain": "tesla.com",
"addresses": [
{
"ip": "209.133.79.58",
"cidr": "209.133.79.0/24",
"asn": 394161,
"desc": "TESLA - Tesla"
}
],
"tag": "ext",
"source": "Previous Enum"
}
As stated earlier, our goal is to strip out the subdomains and ip addresses from the JSON file. We’ll begin with creating a set
to contain each individual collection of items. We use a set
as our container for ports because, by definition, a set is an unordered collection of unique values. So, if for any reason we see the same port/protocol while parsing, it won’t result in additional overhead for the rest of the pipeline.
14 unique_ips = set()
15 unique_ip6s = set()
16 unique_subs = set()
With the data structure selected and initialized, we can open up the JSON file for reading along with one file per set
to which we’ll write results.
18 amass_json = self.input().open()
19 ip_file = self.output().get("target-ips").open("w")
20 ip6_file = self.output().get("target-ip6s").open("w")
21 subdomain_file = self.output().get("target-subdomains").open("w")
Everything is in place now to iterate over the JSON entries and parse out what the information we care about. Recall that ‘name’ is the subdomain returned by amass
and ‘ip’ can contain either IPv4 or IPv6, so we check for each and add to the appropriate set
.
23 with amass_json as aj, ip_file as ip_out, ip6_file as ip6_out, subdomain_file as subdomain_out:
24 for line in aj:
25 entry = json.loads(line)
26 unique_subs.add(entry.get("name"))
27
28 for address in entry.get("addresses"):
29 ipaddr = address.get("ip")
30 if isinstance(ipaddress.ip_address(ipaddr), ipaddress.IPv4Address): # ipv4 addr
31 unique_ips.add(ipaddr)
32 elif isinstance(ipaddress.ip_address(ipaddr), ipaddress.IPv6Address): # ipv6
33 unique_ip6s.add(ipaddr)
Finally, we can send our results to their respective files.
35 for ip in unique_ips:
36 print(ip, file=ip_out)
37
38 for sub in unique_subs:
39 print(sub, file=subdomain_out)
40
41 for ip6 in unique_ip6s:
42 print(ip6, file=ip6_out)
Here we have the final code.
78@inherits(AmassScan)
79class ParseAmassOutput(luigi.Task):
80 """ Read amass JSON results and create categorized entries into ip|subdomain files.
81
82 Args:
83 target_file: specifies the file on disk containing a list of ips or domains *--* Required by upstream Task
84 exempt_list: Path to a file providing blacklisted subdomains, one per line. *--* Optional for upstream Task
85 """
86
87 def requires(self):
88 """ ParseAmassOutput depends on AmassScan to run.
89
90 TargetList expects target_file as a parameter.
91 AmassScan accepts exempt_list as an optional parameter.
92
93 Returns:
94 luigi.ExternalTask - TargetList
95 """
96
97 args = {"target_file": self.target_file, "exempt_list": self.exempt_list}
98 return AmassScan(**args)
99
100 def output(self):
101 """ Returns the target output files for this task.
102
103 Naming conventions for the output files are:
104 TARGET_FILE.ips
105 TARGET_FILE.ip6s
106 TARGET_FILE.subdomains
107
108 Returns:
109 dict(str: luigi.local_target.LocalTarget)
110 """
111 return {
112 "target-ips": luigi.LocalTarget(f"{self.target_file}.ips"),
113 "target-ip6s": luigi.LocalTarget(f"{self.target_file}.ip6s"),
114 "target-subdomains": luigi.LocalTarget(f"{self.target_file}.subdomains"),
115 }
116
117 def run(self):
118 """ Parse the json file produced by AmassScan and categorize the results into ip|subdomain files.
119
120 An example (prettified) entry from the json file is shown below
121 {
122 "Timestamp": "2019-09-22T19:20:13-05:00",
123 "name": "beta-partners.tesla.com",
124 "domain": "tesla.com",
125 "addresses": [
126 {
127 "ip": "209.133.79.58",
128 "cidr": "209.133.79.0/24",
129 "asn": 394161,
130 "desc": "TESLA - Tesla"
131 }
132 ],
133 "tag": "ext",
134 "source": "Previous Enum"
135 }
136 """
137 unique_ips = set()
138 unique_ip6s = set()
139 unique_subs = set()
140
141 amass_json = self.input().open()
142 ip_file = self.output().get("target-ips").open("w")
143 ip6_file = self.output().get("target-ip6s").open("w")
144 subdomain_file = self.output().get("target-subdomains").open("w")
145
146 with amass_json as aj, ip_file as ip_out, ip6_file as ip6_out, subdomain_file as subdomain_out:
147 for line in aj:
148 entry = json.loads(line)
149 unique_subs.add(entry.get("name"))
150
151 for address in entry.get("addresses"):
152 ipaddr = address.get("ip")
153 if isinstance(ipaddress.ip_address(ipaddr), ipaddress.IPv4Address): # ipv4 addr
154 unique_ips.add(ipaddr)
155 elif isinstance(ipaddress.ip_address(ipaddr), ipaddress.IPv6Address): # ipv6
156 unique_ip6s.add(ipaddr)
157
158 # send gathered results to their appropriate destination
159 for ip in unique_ips:
160 print(ip, file=ip_out)
161
162 for sub in unique_subs:
163 print(sub, file=subdomain_out)
164
165 for ip6 in unique_ip6s:
166 print(ip6, file=ip6_out)
In this section, we’re covering the changes we need to make in order to link the two branches. It may be easier to follow while looking at the commit’s diff on github.
As discussed earlier, we have two divergent paths that our pipeline execution can take. It would be much cooler if we could execute the subdomain path and then have it feed into the ip address path (assuming we started with a domain). Fortunately, we can make that dream a reality.
This time around, making our pipeline do what we want is much less intuitive than most of the other luigi code we’ve written. Fear not! Our answer lies in luigi’s handling of dynamic dependencies. Below is an excerpt from the luigi docs.
Sometimes you might not know exactly what other tasks to depend on until runtime. In that case, Luigi provides a mechanism to specify dynamic dependencies. If you yield another Task in the
Task.run
method, the current task will be suspended and the other task will be run. You can also yield a list of tasks.
So, all we’ll need to do is alter masscan.Masscan
a bit to dynamically run the domain path if we receive a list of domains. Recall that our domain path turns subdomains into ips, which can then be fed into masscan.Masscan
. Let’s see what that looks like in practice.
First off, we’ll need to import subprocess
and our amass.ParseAmassOutput
class. We need subprocess
because we’re going to change masscan.Masscan
to inherit from luigi.Task
instead of ExternalProgramTask
. That means that we’ll need to handle our own execution of the masscan
binary in the .run
method. Additionally, we can remove from luigi.contrib.external_program import ExternalProgramTask
while we’re updating the import section.
Next, we’ll need to update our inherits
decorator. We need to add ParseAmassOutput
to our decorator in order to include that Task’s additional Parameters.
After that, we’ll need to change the class from which we’re inheriting. Use of dynamic dependencies dictates that we inherit from luigi.Task
in order to have a .run
method to override.
Due to how we’re going to handle linking the two branches, we can actually remove the entire requires
function.
With that complete, we’ll change the method program_args
to run
.
At last, we’re at the real meat of specifying our dynamic dependencies. We’ll begin by yield
ing from (running) the targets.TargetList
Task. The result of the yield
statement is the same as if we called self.input()
from a normal Task. We can then use the result of running targets.TargetList
to determine if we should run amass.ParseAmassOutput
or not!
We have two more small changes to make. The first of those is that we need to change the file that is passed to masscan’s -iL
option. Currently, we pass it self.input().path
, which corresponds to whatever targets.TargetList
would have returned as a result of running the (now deleted) .requires
method.
Additionally, we need to run subprocess.run
ourselves, because we no longer inherit from ExternalProgramTask
.
With all of those changes in place, we’re left with a dependency graph that looks something like this, huzzah!
Here we have the final code.
1import json
2import pickle
3import logging
4import subprocess
5from collections import defaultdict
6
7import luigi
8from luigi.util import inherits
9
10from recon.targets import TargetList
11from recon.amass import ParseAmassOutput
12from recon.config import top_tcp_ports, top_udp_ports, masscan_config
13
14
15@inherits(TargetList, ParseAmassOutput)
16class Masscan(luigi.Task):
17 """ Run masscan against a target specified via the TargetList Task.
18 Masscan commands are structured like the example below. When specified, --top_ports is processed and
19 then ultimately passed to --ports.
20 masscan -v --open-only --banners --rate 1000 -e tun0 -oJ masscan.tesla.json --ports 80,443,22,21 -iL tesla.ips
21 The corresponding luigi command is shown below.
22 PYTHONPATH=$(pwd) luigi --local-scheduler --module recon.masscan Masscan --target-file tesla --ports 80,443,22,21
23 Args:
24 rate: desired rate for transmitting packets (packets per second)
25 interface: use the named raw network interface, such as "eth0"
26 top_ports: Scan top N most popular ports
27 ports: specifies the port(s) to be scanned
28 target_file: specifies the file on disk containing a list of ips or domains *--* Required by upstream Task
29 exempt_list: Path to a file providing blacklisted subdomains, one per line. *--* Optional for upstream Task
30 """
31
32 rate = luigi.Parameter(default=masscan_config.get("rate"))
33 interface = luigi.Parameter(default=masscan_config.get("iface"))
34 top_ports = luigi.IntParameter(default=0) # IntParameter -> top_ports expected as int
35 ports = luigi.Parameter(default="")
36
37 def __init__(self, *args, **kwargs):
38 super(Masscan, self).__init__(*args, **kwargs)
39 self.masscan_output = f"masscan.{self.target_file}.json"
40
41 def output(self):
42 """ Returns the target output for this task.
43 Naming convention for the output file is masscan.TARGET_FILE.json.
44 Returns:
45 luigi.local_target.LocalTarget
46 """
47 return luigi.LocalTarget(self.masscan_output)
48
49 def run(self):
50 """ Defines the options/arguments sent to masscan after processing.
51 Returns:
52 list: list of options/arguments, beginning with the name of the executable to run
53 """
54 if self.ports and self.top_ports:
55 # can't have both
56 logging.error("Only --ports or --top-ports is permitted, not both.")
57 exit(1)
58
59 if not self.ports and not self.top_ports:
60 # need at least one
61 logging.error("Must specify either --top-ports or --ports.")
62 exit(2)
63
64 if self.top_ports < 0:
65 # sanity check
66 logging.error("--top-ports must be greater than 0")
67 exit(3)
68
69 if self.top_ports:
70 # if --top-ports used, format the top_*_ports lists as strings and then into a proper masscan --ports option
71 top_tcp_ports_str = ",".join(str(x) for x in top_tcp_ports[: self.top_ports])
72 top_udp_ports_str = ",".join(str(x) for x in top_udp_ports[: self.top_ports])
73
74 self.ports = f"{top_tcp_ports_str},U:{top_udp_ports_str}"
75 self.top_ports = 0
76
77 target_list = yield TargetList(target_file=self.target_file)
78
79 if target_list.path.endswith("domains"):
80 yield ParseAmassOutput(target_file=self.target_file, exempt_list=self.exempt_list)
81
82 command = [
83 "masscan",
84 "-v",
85 "--open",
86 "--banners",
87 "--rate",
88 self.rate,
89 "-e",
90 self.interface,
91 "-oJ",
92 self.masscan_output,
93 "--ports",
94 self.ports,
95 "-iL",
96 target_list.path.replace("domains", "ips"),
97 ]
98
99 subprocess.run(command)
That wraps things up for this post. In the next installment, we’ll get started with the web scanning portion of our pipeline!