Tags: how-to, bug bounty, hack the box, python, recon, luigi
Welcome back! If you found your way here without reading the prior posts in this series, you may want to start with some of the links to previous posts (below). This post is part five of a multi-part series demonstrating how to build an automated pipeline for target reconnaissance. The target in question could be the target of a pentest, bug bounty, or capture the flag challenge (shout out to my HTB peoples!). By the end of the series, we’ll have built a functional recon pipeline that can be tailored to fit your own needs.
Previous posts:
Part V will:
Part V’s git tags:
To get the repository to the point at which we’ll start, we can run one of the following commands. Which command used depends on if the repository is already present or not.
git clone --branch stage-6 https://github.com/epi052/recon-pipeline.git
git checkout tags/stage-6
Roadmap:
In this post, we’ll add quite a few web scanning modules. This post won’t go into the same level of code detail seen in prior posts due to the amount of things to get through. Additionally, many of the hurdles to implementation are ones we’ve seen already. I don’t see much value in rehashing what’s been covered already, though we will spend time where new problems arise.
Before we can scan any targets, we’ll need to figure out which targets have a web server up and listening. We’ll also need to define what constitutes a web server port as far as our pipeline is concerned. Let’s get to it.
To define web ports, we’ll simply add a line to config.py
that we can import later. I’ve chosen the following five ports to be considered “web” ports for scanning purposes. Feel free to tailor this to your needs, as it’s certainly not an exhaustive list.
9web_ports = {'80', '443', '8080', '8000', '8443'}
With our web ports defined, let’s take a brief moment to consider project layout. Because we’re going to have quite a few web scanning modules, now’s a great time to organize them. Let’s add a submodule to keep things organized. While we’re at it, we can add our targets.py
file that will house the logic to gather up our web targets.
mkdir web
touch web/__init__.py
touch web/targets.py
Our project directory should now look like what’s shown below.
recon-pipeline/
├── LICENSE
├── Pipfile
├── Pipfile.lock
├── README.md
└── recon
├── amass.py
├── config.py
├── __init__.py
├── masscan.py
├── nmap.py
├── targets.py
└── web
├── __init__.py
└── targets.py
Nice, that was easy enough.
Now, we can start looking at how to gather all of our targets that have at least one of our chosen web ports listening.
11@inherits(ParseMasscanOutput, ParseAmassOutput)
12class GatherWebTargets(luigi.Task):
Our new class expects masscan
and amass
to have run already and will consume their output. Our new task will load the results from amass
and masscan
, parse each of them for targets that have one of the web ports open, and then add any matches to a set
.
57 def run(self):
58 """ Gather all potential web targets into a single file to pass farther down the pipeline. """
59 targets = set()
60
61 ip_dict = pickle.load(open(self.input().get("masscan-output").path, "rb"))
62
63 """
64 structure over which we're looping
65 {
66 "IP_ADDRESS":
67 {'udp': {"161", "5000", ... },
68 ...
69 i.e. {protocol: set(ports) }
70 }
71 """
72 for target, protocol_dict in ip_dict.items():
73 for protocol, ports in protocol_dict.items():
74 for port in ports:
75 if protocol == 'udp':
76 continue
77 if port == "80":
78 targets.add(target)
79 elif port in web_ports:
80 targets.add(f"{target}:{port}")
81
82 for amass_result in self.input().get("amass-output").values():
83 with amass_result.open() as f:
84 for target in f:
85 # we care about all results returned from amass
86 targets.add(target.strip())
If we find port 80, in masscan’s results, we add the target itself (the default port for a lot of the tooling we’ll include later is 80). On the other hand, if we find any other port that’s a web port, but not 80, we store the target and the port. Other than the decision behind that bit of logic, there’s not much going on here. Once we’ve parsed all the valid web targets, we’ll write them to disk in a flat text file.
Here we have the complete code for gathering web targets.
1import pickle
2
3import luigi
4from luigi.util import inherits
5
6from recon.config import web_ports
7from recon.amass import ParseAmassOutput
8from recon.masscan import ParseMasscanOutput
9
10
11@inherits(ParseMasscanOutput, ParseAmassOutput)
12class GatherWebTargets(luigi.Task):
13 """ Gather all subdomains as well as any ip addresses known to have a configured web port open.
14
15 Args:
16 exempt_list: Path to a file providing blacklisted subdomains, one per line. *--* Optional for upstream Task
17 top_ports: Scan top N most popular ports *--* Required by upstream Task
18 ports: specifies the port(s) to be scanned *--* Required by upstream Task
19 interface: use the named raw network interface, such as "eth0" *--* Required by upstream Task
20 rate: desired rate for transmitting packets (packets per second) *--* Required by upstream Task
21 target_file: specifies the file on disk containing a list of ips or domains *--* Required by upstream Task
22 """
23
24 def requires(self):
25 """ GatherWebTargets depends on ParseMasscanOutput and ParseAmassOutput to run.
26
27 ParseMasscanOutput expects rate, target_file, interface, and either ports or top_ports as parameters.
28 ParseAmassOutput accepts exempt_list and expects target_file
29
30 Returns:
31 dict(str: ParseMasscanOutput, str: ParseAmassOutput)
32 """
33 args = {
34 "rate": self.rate,
35 "target_file": self.target_file,
36 "top_ports": self.top_ports,
37 "interface": self.interface,
38 "ports": self.ports,
39 }
40 return {
41 "masscan-output": ParseMasscanOutput(**args),
42 "amass-output": ParseAmassOutput(
43 exempt_list=self.exempt_list, target_file=self.target_file
44 ),
45 }
46
47 def output(self):
48 """ Returns the target output for this task.
49
50 Naming convention for the output file is webtargets.TARGET_FILE.txt.
51
52 Returns:
53 luigi.local_target.LocalTarget
54 """
55 return luigi.LocalTarget(f"webtargets.{self.target_file}.txt")
56
57 def run(self):
58 """ Gather all potential web targets into a single file to pass farther down the pipeline. """
59 targets = set()
60
61 ip_dict = pickle.load(open(self.input().get("masscan-output").path, "rb"))
62
63 """
64 structure over which we're looping
65 {
66 "IP_ADDRESS":
67 {'udp': {"161", "5000", ... },
68 ...
69 i.e. {protocol: set(ports) }
70 }
71 """
72 for target, protocol_dict in ip_dict.items():
73 for protocol, ports in protocol_dict.items():
74 for port in ports:
75 if protocol == "udp":
76 continue
77 if port == "80":
78 targets.add(target)
79 elif port in web_ports:
80 targets.add(f"{target}:{port}")
81
82 for amass_result in self.input().get("amass-output").values():
83 with amass_result.open() as f:
84 for target in f:
85 # we care about all results returned from amass
86 targets.add(target.strip())
87
88 with self.output().open("w") as f:
89 for target in targets:
90 f.write(f"{target}\n")
When scanning large swaths of target space, automating visiting each site that’s live via screengrabs saves an incredible amount of time. My tool of choice for this task is Michael Henriksen’s aquatone
. I really like how the screenshots are grouped together based on similarity. It makes it even easier to quickly parse targets and select what looks good. According to the repository’s README:
Aquatone is a tool for visual inspection of websites across a large amount of hosts and is convenient for quickly gaining an overview of HTTP-based attack surface.
If you don’t have it already, go over to aquatone’s github page and follow their install instructions.
First, we’ll take a moment to figure out what we want our scans to do. There are quite a few options for aquatone
, but we don’t need many in order to accomplish our goal. A run of aquatone
against tesla.com would look something like what’s below.
cat webtargets.tesla.txt | aquatone -scan-timeout 900 -threads 20
aquatone options used:
-scan-timeout
Timeout in miliseconds for port scans (default 100)
-threads
Number of concurrent threads (default number of logical CPUs)
Notice that aquatone
takes input via STDIN, that’s something we’ll have to address. Also, you may be curious why we care about scan-timeout. I’ve experienced the same behavior described in this issue and adjusted accordingly. Since then, I haven’t seen any issues, so I’ve chosen to include the value by default.
The first thing we’ll want to do when constructing our new class is to account for the options we expect to be able to send. Specifically, we want to include Parameters for threads and scan-timeout. It may seem like we should already have threads available to us, but we can reference the Task Dependency Graph below to see that none of the Tasks we inherit from use a threads Parameter.
We can also use this opportunity to begin storing values for some of the oft-used (and unoft-changed) Parameters, creating a set of defaults. Additionally, we can add the paths to our tools, since many of them are going to live outside our our PATH. Let’s go ahead and modify config.py
to include our changes.
4defaults = {
5 'threads': '10',
6 'aquatone-scan-timeout': '900'
7}
8
9web_ports = {'80', '443', '8080', '8000', '8443'}
10
11tool_paths = {
12 'aquatone': '/opt/aquatone'
13}
After that, we can define the start of our AquatoneScan
class.
11@inherits(GatherWebTargets)
12class AquatoneScan(luigi.Task):
13 threads = luigi.Parameter(default=defaults.get("threads", ""))
14 scan_timeout = luigi.Parameter(default=defaults.get("aquatone-scan-timeout", ""))
We should also go back and change the other classes that use a threads Parameter to make use of the new default at some point.
As seen above, we inherit from Task
. We inherit from Task
in order to handle redirecting our list of targets ot the process’s STDIN.
66 def run(self):
67 Path(self.output().path).mkdir(parents=True, exist_ok=True)
68
69 command = [
70 tool_paths.get("aquatone"),
71 "-scan-timeout",
72 self.scan_timeout,
73 "-threads",
74 self.threads,
75 "-silent",
76 "-out",
77 self.output().path,
78 ]
79
80 with self.input().open() as target_list:
81 subprocess.run(command, stdin=target_list)
Above is our first use of the tool_paths
configuration dictionary. It saves us from hardcoding paths and adds some flexibility. We’re using subprocess.run
’s stdin keyword argument in order to redirect our list of targetrs to the subprocess’s STDIN. We didn’t discuss -silent
and -out
, though their functions should be pretty clear.
After all is said and done, we’re left with the code below.
1import subprocess
2from pathlib import Path
3
4import luigi
5from luigi.util import inherits
6
7from recon.config import tool_paths, defaults
8from recon.web.targets import GatherWebTargets
9
10
11@inherits(GatherWebTargets)
12class AquatoneScan(luigi.Task):
13 """ Screenshot all web targets and generate HTML report.
14
15 aquatone commands are structured like the example below.
16
17 cat webtargets.tesla.txt | /opt/aquatone -scan-timeout 900 -threads 20
18
19 An example of the corresponding luigi command is shown below.
20
21 PYTHONPATH=$(pwd) luigi --local-scheduler --module recon.web.aquatone AquatoneScan --target-file tesla --top-ports 1000
22
23 Args:
24 threads: number of threads for parallel aquatone command execution
25 scan_timeout: timeout in miliseconds for aquatone port scans
26 exempt_list: Path to a file providing blacklisted subdomains, one per line. *--* Optional for upstream Task
27 top_ports: Scan top N most popular ports *--* Required by upstream Task
28 ports: specifies the port(s) to be scanned *--* Required by upstream Task
29 interface: use the named raw network interface, such as "eth0" *--* Required by upstream Task
30 rate: desired rate for transmitting packets (packets per second) *--* Required by upstream Task
31 target_file: specifies the file on disk containing a list of ips or domains *--* Required by upstream Task
32 """
33
34 threads = luigi.Parameter(default=defaults.get("threads", ""))
35 scan_timeout = luigi.Parameter(default=defaults.get("aquatone-scan-timeout", ""))
36
37 def requires(self):
38 """ AquatoneScan depends on GatherWebTargets to run.
39
40 GatherWebTargets accepts exempt_list and expects rate, target_file, interface,
41 and either ports or top_ports as parameters
42
43 Returns:
44 luigi.Task - GatherWebTargets
45 """
46 args = {
47 "rate": self.rate,
48 "target_file": self.target_file,
49 "top_ports": self.top_ports,
50 "interface": self.interface,
51 "ports": self.ports,
52 "exempt_list": self.exempt_list,
53 }
54 return GatherWebTargets(**args)
55
56 def output(self):
57 """ Returns the target output for this task.
58
59 Naming convention for the output file is amass.TARGET_FILE.json.
60
61 Returns:
62 luigi.local_target.LocalTarget
63 """
64 return luigi.LocalTarget(f"aquatone-{self.target_file}-results")
65
66 def run(self):
67 """ Defines the options/arguments sent to aquatone after processing.
68
69 cat webtargets.tesla.txt | /opt/aquatone -scan-timeout 900 -threads 20
70
71 Returns:
72 list: list of options/arguments, beginning with the name of the executable to run
73 """
74 Path(self.output().path).mkdir(parents=True, exist_ok=True)
75
76 command = [
77 tool_paths.get("aquatone"),
78 "-scan-timeout",
79 self.scan_timeout,
80 "-threads",
81 self.threads,
82 "-silent",
83 "-out",
84 self.output().path,
85 ]
86
87 with self.input().open() as target_list:
88 subprocess.run(command, stdin=target_list)
In this section, we’ll add another low-hanging-fruit type of scan. To accomplish that, we’ll actually include two tools: tko-subs
and subjack
.
We’ll need to install both tools and add the paths to our config.py
. If you don’t have them already, go over to tko-subs’s and subjack’s github pages and follow their install instructions.
We’ll preemptively include tko-subs-dir and subjack-fingerprints because we’ll need those paths to properly run their associated tools later on.
16tool_paths = {
17 'aquatone': '/opt/aquatone',
18 'tko-subs': '/root/go/bin/tko-subs',
19 'tko-subs-dir': '/root/go/src/github.com/anshumanbh/tko-subs',
20 'subjack': '/root/go/bin/subjack',
21 'subjack-fingerprints': '/root/go/src/github.com/haccer/subjack/fingerprints.json'
22}
As usual, let’s take a look at the options we’ll be using for tko-subs
.
tko-subs -domains=tesla.subdomains -data=/root/go/src/github.com/anshumanbh/tko-subs/providers-data.csv -output=tkosubs.tesla.csv
tko-subs options used:
-domains
path to the file containing domains to check
-data
path to the data csv
-output
path to output file (csv)
Because we’re simply running a command, all we need to do is return a list from TKOSubsScan.program_args
.
59 def program_args(self):
60 command = [
61 tool_paths.get("tko-subs"),
62 f"-domains={self.input().path}",
63 f"-data={tool_paths.get('tko-subs-dir')}/providers-data.csv",
64 f"-output={self.output().path}",
65 ]
66
67 return command
That’s it!
Again, we’ll start off with the shape of the command we intend to run.
subjack -w webtargets.tesla.txt -t 20 -timeout 30 -o subjack.tesla.txt -ssl -v -c /root/go/src/github.com/haccer/subjack/fingerprints.json -a
subjack options used:
-w
list of subdomains
-t
number of threads (default: 10)pyp9P5#5cmJVswe!
-timeout
seconds to wait before timeout connection (default: 10)
-o
where to save results
-ssl
enforces HTTPS requests which may return a different set of results/increase accuracy
-v
display more information per each request
-c
path to configuration file
-a
skips CNAME check and sends requests to every URL
Just like tko-subs
, we’re simply adding a command to our pipeline. There’s nothing special in the code that we haven’t covered before.
And here we have the implementation of our subdomain_takeover module.
1import luigi
2from luigi.util import inherits
3from luigi.contrib.external_program import ExternalProgramTask
4
5from recon.config import tool_paths, defaults
6from recon.web.targets import GatherWebTargets
7
8
9@inherits(GatherWebTargets)
10class TKOSubsScan(ExternalProgramTask):
11 """ Use tko-subs to scan for potential subdomain takeovers.
12
13 tko-subs commands are structured like the example below.
14
15 tko-subs -domains=tesla.subdomains -data=/root/go/src/github.com/anshumanbh/tko-subs/providers-data.csv -output=tkosubs.tesla.csv
16
17 An example of the corresponding luigi command is shown below.
18
19 PYTHONPATH=$(pwd) luigi --local-scheduler --module recon.web.subdomain_takeover TKOSubsScan --target-file tesla --top-ports 1000 --interface eth0
20
21 Args:
22 exempt_list: Path to a file providing blacklisted subdomains, one per line. *--* Optional for upstream Task
23 top_ports: Scan top N most popular ports *--* Required by upstream Task
24 ports: specifies the port(s) to be scanned *--* Required by upstream Task
25 interface: use the named raw network interface, such as "eth0" *--* Required by upstream Task
26 rate: desired rate for transmitting packets (packets per second) *--* Required by upstream Task
27 target_file: specifies the file on disk containing a list of ips or domains *--* Required by upstream Task
28 """
29
30 def requires(self):
31 """ TKOSubsScan depends on GatherWebTargets to run.
32
33 GatherWebTargets accepts exempt_list and expects rate, target_file, interface,
34 and either ports or top_ports as parameters
35
36 Returns:
37 luigi.Task - GatherWebTargets
38 """
39 args = {
40 "rate": self.rate,
41 "target_file": self.target_file,
42 "top_ports": self.top_ports,
43 "interface": self.interface,
44 "ports": self.ports,
45 "exempt_list": self.exempt_list,
46 }
47 return GatherWebTargets(**args)
48
49 def output(self):
50 """ Returns the target output for this task.
51
52 Naming convention for the output file is tkosubs.TARGET_FILE.csv.
53
54 Returns:
55 luigi.local_target.LocalTarget
56 """
57 return luigi.LocalTarget(f"tkosubs.{self.target_file}.csv")
58
59 def program_args(self):
60 """ Defines the options/arguments sent to tko-subs after processing.
61
62 Returns:
63 list: list of options/arguments, beginning with the name of the executable to run
64 """
65
66 command = [
67 tool_paths.get("tko-subs"),
68 f"-domains={self.input().path}",
69 f"-data={tool_paths.get('tko-subs-dir')}/providers-data.csv",
70 f"-output={self.output().path}",
71 ]
72
73 return command
74
75
76@inherits(GatherWebTargets)
77class SubjackScan(ExternalProgramTask):
78 """ Use subjack to scan for potential subdomain takeovers.
79
80 subjack commands are structured like the example below.
81
82 subjack -w webtargets.tesla.txt -t 100 -timeout 30 -o subjack.tesla.txt -ssl
83
84 An example of the corresponding luigi command is shown below.
85
86 PYTHONPATH=$(pwd) luigi --local-scheduler --module recon.web.subdomain_takeover SubjackScan --target-file tesla --top-ports 1000 --interface eth0
87
88 Args:
89 threads: number of threads for parallel subjack command execution
90 exempt_list: Path to a file providing blacklisted subdomains, one per line. *--* Optional for upstream Task
91 top_ports: Scan top N most popular ports *--* Required by upstream Task
92 ports: specifies the port(s) to be scanned *--* Required by upstream Task
93 interface: use the named raw network interface, such as "eth0" *--* Required by upstream Task
94 rate: desired rate for transmitting packets (packets per second) *--* Required by upstream Task
95 target_file: specifies the file on disk containing a list of ips or domains *--* Required by upstream Task
96 """
97
98 threads = luigi.Parameter(default=defaults.get("threads", ""))
99
100 def requires(self):
101 """ SubjackScan depends on GatherWebTargets to run.
102
103 GatherWebTargets accepts exempt_list and expects rate, target_file, interface,
104 and either ports or top_ports as parameters
105
106 Returns:
107 luigi.Task - GatherWebTargets
108 """
109 args = {
110 "rate": self.rate,
111 "target_file": self.target_file,
112 "top_ports": self.top_ports,
113 "interface": self.interface,
114 "ports": self.ports,
115 "exempt_list": self.exempt_list,
116 }
117 return GatherWebTargets(**args)
118
119 def output(self):
120 """ Returns the target output for this task.
121
122 Naming convention for the output file is subjack.TARGET_FILE.txt.
123
124 Returns:
125 luigi.local_target.LocalTarget
126 """
127 return luigi.LocalTarget(f"subjack.{self.target_file}.txt")
128
129 def program_args(self):
130 """ Defines the options/arguments sent to subjack after processing.
131
132 Returns:
133 list: list of options/arguments, beginning with the name of the executable to run
134 """
135
136 command = [
137 tool_paths.get("subjack"),
138 "-w",
139 self.input().path,
140 "-t",
141 self.threads,
142 "-a",
143 "-timeout",
144 "30",
145 "-o",
146 self.output().path,
147 "-v",
148 "-ssl",
149 "-c",
150 tool_paths.get("subjack-fingerprints"),
151 ]
152
153 return command
Continuing with our simple scans, we’ll add a CORS scanner to the mix. The CORS scanner we’ll use is CORScanner.
Similar to our other tools, we’ll go ahead and add the path to our CORScanner (assuming it’s already been downloaded from GitHub).
16tool_paths = {
17 'aquatone': '/opt/aquatone',
18 'tko-subs': '/root/go/bin/tko-subs',
19 'tko-subs-dir': '/root/go/src/github.com/anshumanbh/tko-subs',
20 'subjack': '/root/go/bin/subjack',
21 'subjack-fingerprints': '/root/go/src/github.com/haccer/subjack/fingerprints.json',
22 'CORScanner': '/opt/CORScanner/cors_scan.py',
23}
Once again, we’ll figure out how we prefer to run the tool.
python3 cors_scan.py -i webtargets.tesla.txt -t 20 -o corscanner.tesla.json
cors_scan.py options used:
-i
URL/domain list file to check their CORS policy
-t
Number of threads to use for CORS scan
-o
Save the results to json file
Similar to many of the web scans, the integreation into the pipeline is dead simple.
68 def program_args(self):
69 command = [
70 "python3",
71 tool_paths.get("CORScanner"),
72 "-i",
73 self.input().path,
74 "-t",
75 self.threads,
76 "-o",
77 self.output().path,
78 ]
79
80 return command
1import luigi
2from luigi.util import inherits
3from luigi.contrib.external_program import ExternalProgramTask
4
5from recon.config import tool_paths, defaults
6from recon.web.targets import GatherWebTargets
7
8
9@inherits(GatherWebTargets)
10class CORScannerScan(ExternalProgramTask):
11 """ Use CORScanner to scan for potential CORS misconfigurations.
12
13 CORScanner commands are structured like the example below.
14
15 python cors_scan.py -i webtargets.tesla.txt -t 100
16
17 An example of the corresponding luigi command is shown below.
18
19 PYTHONPATH=$(pwd) luigi --local-scheduler --module recon.web.corscanner CORScannerScan --target-file tesla --top-ports 1000 --interface eth0
20
21 Install:
22 git clone https://github.com/chenjj/CORScanner.git
23 cd CORScanner
24 pip install -r requirements.txt
25 pip install future
26
27 Args:
28 threads: number of threads for parallel subjack command execution
29 exempt_list: Path to a file providing blacklisted subdomains, one per line. *--* Optional for upstream Task
30 top_ports: Scan top N most popular ports *--* Required by upstream Task
31 ports: specifies the port(s) to be scanned *--* Required by upstream Task
32 interface: use the named raw network interface, such as "eth0" *--* Required by upstream Task
33 rate: desired rate for transmitting packets (packets per second) *--* Required by upstream Task
34 target_file: specifies the file on disk containing a list of ips or domains *--* Required by upstream Task
35 """
36
37 threads = luigi.Parameter(default=defaults.get("threads", ""))
38
39 def requires(self):
40 """ CORScannerScan depends on GatherWebTargets to run.
41
42 GatherWebTargets accepts exempt_list and expects rate, target_file, interface,
43 and either ports or top_ports as parameters
44
45 Returns:
46 luigi.Task - GatherWebTargets
47 """
48 args = {
49 "rate": self.rate,
50 "target_file": self.target_file,
51 "top_ports": self.top_ports,
52 "interface": self.interface,
53 "ports": self.ports,
54 "exempt_list": self.exempt_list,
55 }
56 return GatherWebTargets(**args)
57
58 def output(self):
59 """ Returns the target output for this task.
60
61 Naming convention for the output file is corscanner.TARGET_FILE.json.
62
63 Returns:
64 luigi.local_target.LocalTarget
65 """
66 return luigi.LocalTarget(f"corscanner.{self.target_file}.json")
67
68 def program_args(self):
69 """ Defines the options/arguments sent to tko-subs after processing.
70
71 Returns:
72 list: list of options/arguments, beginning with the name of the executable to run
73 """
74
75 command = [
76 "python3",
77 tool_paths.get("CORScanner"),
78 "-i",
79 self.input().path,
80 "-t",
81 self.threads,
82 "-o",
83 self.output().path,
84 ]
85
86 return command
Forced browsing is the next step in our pipeline. Everyone has their favorite tool for this task, and I’m no different. My favorite is easily gobuster. A while back, I wrote a recursive wrapper around gobuster as well. We’ll use both tools here to allow us the option of performing recursive forced browsing if we choose.
Don’t forget to update config.py
along with grabbing gobuster and recursive-gobuster.
16tool_paths = {
17 -------------8<-------------
18 'gobuster': '/usr/local/go/bin/gobuster',
19 'recursive-gobuster': '/usr/local/bin/recursive-gobuster.pyz',
20}
Here’s how the default gobuster command will look when we run it.
gobuster dir -q -e -k -t 20 -u www.tesla.com -w /usr/share/seclists/Discovery/Web-Content/common.txt -o gobuster.tesla.txt
gobuster dir options used:
-q
Don't print the banner and other noise
-e
Expanded mode, print full URLs
-k
Skip SSL certificate verification
-t
Number of concurrent threads (default 10)
-u
The target URL
-w
Path to the wordlist
-o
Output file to write results to
Even though the default is outlined above, we’ll want a few additional options for certain situations. We also need to be able to configure the values sent to some of our default options. In order to allow that additional customization of commands, we’ll add some new Parameters.
15@inherits(GatherWebTargets)
16class GobusterScan(luigi.Task):
17 proxy = luigi.Parameter(default=defaults.get("proxy", ""))
18 threads = luigi.Parameter(default=defaults.get("threads", ""))
19 wordlist = luigi.Parameter(default=defaults.get("gobuster-wordlist", ""))
20 extensions = luigi.Parameter(default=defaults.get("gobuster-extensions", ""))
21 recursive = luigi.BoolParameter(default=False)
There are a few issues we need to solve with running gobuster. First, we’ll need to handle IPv6 addresses. IPv6 addresses can be browsed to when they’re enclosed within square brackets. Here’s what that looks like in code.
102 try:
103 if isinstance(ipaddress.ip_address(target), ipaddress.IPv6Address): # ipv6
104 target = f"[{target}]"
105 except ValueError:
106 # domain names raise ValueErrors, just assume we have a domain and keep on keepin on
107 pass
Next, we’ll have two different branches of logic to build the base command list. The branch depends upon whether we’ve specified --recursive
or not.
109 if self.recursive:
110 command = [
111 tool_paths.get("recursive-gobuster"),
112 "-w",
113 self.wordlist,
114 f"{url_scheme}{target}",
115 ]
116 else:
117 command = [
118 tool_paths.get("gobuster"),
119 "dir",
120 "-q",
121 "-e",
122 "-k",
123 "-u",
124 f"{url_scheme}{target}",
125 "-w",
126 self.wordlist,
127 "-o",
128 Path(self.output().path).joinpath(
129 f"gobuster.{url_scheme.replace('//', '_').replace(':', '')}{target}.txt"
130 ),
131 ]
The rest of the code is more or less straightforward and can be seen in the next section.
1import os
2import logging
3import ipaddress
4import subprocess
5from pathlib import Path
6from concurrent.futures import ThreadPoolExecutor
7
8import luigi
9from luigi.util import inherits
10
11from recon.config import tool_paths, defaults
12from recon.web.targets import GatherWebTargets
13
14
15@inherits(GatherWebTargets)
16class GobusterScan(luigi.Task):
17 """ Use gobuster to perform forced browsing.
18
19 gobuster commands are structured like the example below.
20
21 gobuster dir -q -e -k -t 20 -u www.tesla.com -w /usr/share/seclists/Discovery/Web-Content/common.txt -p http://127.0.0.1:8080 -o gobuster.tesla.txt -x php,html
22
23 An example of the corresponding luigi command is shown below.
24
25 PYTHONPATH=$(pwd) luigi --local-scheduler --module recon.web.gobuster GobusterScan --target-file tesla --top-ports 1000 \
26 --interface eth0 --proxy http://127.0.0.1:8080 --extensions php,html \
27 --wordlist /usr/share/seclists/Discovery/Web-Content/common.txt --threads 20
28
29 Install:
30 go get github.com/OJ/gobuster
31 git clone https://github.com/epi052/recursive-gobuster.git
32
33 Args:
34 threads: number of threads for parallel gobuster command execution
35 wordlist: wordlist used for forced browsing
36 extensions: additional extensions to apply to each item in the wordlist
37 recursive: whether or not to recursively gobust the target (may produce a LOT of traffic... quickly)
38 exempt_list: Path to a file providing blacklisted subdomains, one per line. *--* Optional for upstream Task
39 top_ports: Scan top N most popular ports *--* Required by upstream Task
40 ports: specifies the port(s) to be scanned *--* Required by upstream Task
41 interface: use the named raw network interface, such as "eth0" *--* Required by upstream Task
42 rate: desired rate for transmitting packets (packets per second) *--* Required by upstream Task
43 target_file: specifies the file on disk containing a list of ips or domains *--* Required by upstream Task
44 """
45
46 proxy = luigi.Parameter(default=defaults.get("proxy", ""))
47 threads = luigi.Parameter(default=defaults.get("threads", ""))
48 wordlist = luigi.Parameter(default=defaults.get("gobuster-wordlist", ""))
49 extensions = luigi.Parameter(default=defaults.get("gobuster-extensions", ""))
50 recursive = luigi.BoolParameter(default=False)
51
52 def requires(self):
53 """ GobusterScan depends on GatherWebTargets to run.
54
55 GatherWebTargets accepts exempt_list and expects rate, target_file, interface,
56 and either ports or top_ports as parameters
57
58 Returns:
59 luigi.Task - GatherWebTargets
60 """
61 args = {
62 "rate": self.rate,
63 "target_file": self.target_file,
64 "top_ports": self.top_ports,
65 "interface": self.interface,
66 "ports": self.ports,
67 "exempt_list": self.exempt_list,
68 }
69 return GatherWebTargets(**args)
70
71 def output(self):
72 """ Returns the target output for this task.
73
74 If recursion is disabled, the naming convention for the output file is gobuster.TARGET_FILE.txt
75 Otherwise the output file is recursive-gobuster_TARGET_FILE.log
76
77 Results are stored in their own directory: gobuster-TARGET_FILE-results
78
79 Returns:
80 luigi.local_target.LocalTarget
81 """
82 return luigi.LocalTarget(f"gobuster-{self.target_file}-results")
83
84 def run(self):
85 """ Defines the options/arguments sent to gobuster after processing.
86
87 Returns:
88 list: list of options/arguments, beginning with the name of the executable to run
89 """
90 try:
91 self.threads = abs(int(self.threads))
92 except TypeError:
93 return logging.error("The value supplied to --threads must be a non-negative integer.")
94
95 commands = list()
96
97 with self.input().open() as f:
98 for target in f:
99 target = target.strip()
100
101 try:
102 if isinstance(ipaddress.ip_address(target), ipaddress.IPv6Address): # ipv6
103 target = f"[{target}]"
104 except ValueError:
105 # domain names raise ValueErrors, just assume we have a domain and keep on keepin on
106 pass
107
108 for url_scheme in ("https://", "http://"):
109 if self.recursive:
110 command = [
111 tool_paths.get("recursive-gobuster"),
112 "-w",
113 self.wordlist,
114 f"{url_scheme}{target}",
115 ]
116 else:
117 command = [
118 tool_paths.get("gobuster"),
119 "dir",
120 "-q",
121 "-e",
122 "-k",
123 "-u",
124 f"{url_scheme}{target}",
125 "-w",
126 self.wordlist,
127 "-o",
128 Path(self.output().path).joinpath(
129 f"gobuster.{url_scheme.replace('//', '_').replace(':', '')}{target}.txt"
130 ),
131 ]
132
133 if self.extensions:
134 command.extend(["-x", self.extensions])
135
136 if self.proxy:
137 command.extend(["-p", self.proxy])
138
139 commands.append(command)
140
141 Path(self.output().path).mkdir(parents=True, exist_ok=True)
142
143 if self.recursive:
144 # workaround for recursive gobuster not accepting output directory
145 cwd = Path().cwd()
146 os.chdir(self.output().path)
147
148 with ThreadPoolExecutor(max_workers=self.threads) as executor:
149 executor.map(subprocess.run, commands)
150
151 if self.recursive:
152 os.chdir(str(cwd))
Our final scan will focus on identifying technologies used within a webapp such as determining CMS, backend framework, etc… I suspect most folks are using wappalyzer for this when manually browsing sites (myself included). Since webanalyze is a port of wappalyzer written in Go, it seems an obvious choice for the task.
We’ll update our config.py
one last time after installing webanalyze.
16tool_paths = {
17 'aquatone': '/opt/aquatone',
18 'tko-subs': '/root/go/bin/tko-subs',
19 'tko-subs-dir': '/root/go/src/github.com/anshumanbh/tko-subs',
20 'subjack': '/root/go/bin/subjack',
21 'subjack-fingerprints': '/root/go/src/github.com/haccer/subjack/fingerprints.json',
22 'CORScanner': '/opt/CORScanner/cors_scan.py',
23 'gobuster': '/usr/local/go/bin/gobuster',
24 'recursive-gobuster': '/usr/local/bin/recursive-gobuster.pyz',
25 'webanalyze': '/root/go/bin/webanalyze'
26}
There are two webanalyze commands we’ll be running.
First, the command that updates apps.json
, which is the file that webanalyze uses for signatures and relationships.
webanalyze -update
webanalyze options used:
-update
downloads a current version of apps.json from the wappalyzer repository to the current folder.
Second, the webanalyze command itself.
webanalyze -host https://tesla.com
webanalyze options used:
-host
single host to test
A quirk of webanalyze is that it only prints results to stderr, so we’ll need to capture the results and store them in a file manually. The code to do that is below. It also normalizes the URLs to easier to manage filenames.
77 def _wrapped_subprocess(self, cmd):
78 with open(f"webanalyze.{cmd[2].replace('//', '_').replace(':', '')}.txt", "wb") as f:
79 subprocess.run(cmd, stderr=f)
The wrapper for subprocess.run above is called using our standard template for adding threading to a scan (shown below).
117 with ThreadPoolExecutor(max_workers=self.threads) as executor:
118 executor.map(self._wrapped_subprocess, commands)
Other than that, all of the issues with running this command are problems we’ve overcome already elsewhere. So, without further ado, the finalized code!
1import os
2import logging
3import ipaddress
4import subprocess
5from pathlib import Path
6from concurrent.futures import ThreadPoolExecutor
7
8import luigi
9from luigi.util import inherits
10
11from recon.config import tool_paths, defaults
12from recon.web.targets import GatherWebTargets
13
14
15@inherits(GatherWebTargets)
16class WebanalyzeScan(luigi.Task):
17 """ Use webanalyze to determine the technology stack on the given target(s).
18
19 webanalyze commands are structured like the example below.
20
21 webanalyze -host www.tesla.com -output json
22
23 An example of the corresponding luigi command is shown below.
24
25 PYTHONPATH=$(pwd) luigi --local-scheduler --module recon.web.webanalyze WebanalyzeScan --target-file tesla --top-ports 1000 --interface eth0
26
27 Install:
28
29 go get -u github.com/rverton/webanalyze
30
31 # loads new apps.json file from wappalyzer project
32 webanalyze -update
33
34 Args:
35 threads: number of threads for parallel webanalyze command execution
36 exempt_list: Path to a file providing blacklisted subdomains, one per line. *--* Optional for upstream Task
37 top_ports: Scan top N most popular ports *--* Required by upstream Task
38 ports: specifies the port(s) to be scanned *--* Required by upstream Task
39 interface: use the named raw network interface, such as "eth0" *--* Required by upstream Task
40 rate: desired rate for transmitting packets (packets per second) *--* Required by upstream Task
41 target_file: specifies the file on disk containing a list of ips or domains *--* Required by upstream Task
42 """
43
44 threads = luigi.Parameter(default=defaults.get("threads", ""))
45
46 def requires(self):
47 """ WebanalyzeScan depends on GatherWebTargets to run.
48
49 GatherWebTargets accepts exempt_list and expects rate, target_file, interface,
50 and either ports or top_ports as parameters
51
52 Returns:
53 luigi.Task - GatherWebTargets
54 """
55 args = {
56 "rate": self.rate,
57 "target_file": self.target_file,
58 "top_ports": self.top_ports,
59 "interface": self.interface,
60 "ports": self.ports,
61 "exempt_list": self.exempt_list,
62 }
63 return GatherWebTargets(**args)
64
65 def output(self):
66 """ Returns the target output for this task.
67
68 The naming convention for the output file is webanalyze.TARGET_FILE.txt
69
70 Results are stored in their own directory: webanalyze-TARGET_FILE-results
71
72 Returns:
73 luigi.local_target.LocalTarget
74 """
75 return luigi.LocalTarget(f"webanalyze-{self.target_file}-results")
76
77 def _wrapped_subprocess(self, cmd):
78 with open(f"webanalyze.{cmd[2].replace('//', '_').replace(':', '')}.txt", "wb") as f:
79 subprocess.run(cmd, stderr=f)
80
81 def run(self):
82 """ Defines the options/arguments sent to webanalyze after processing.
83
84 Returns:
85 list: list of options/arguments, beginning with the name of the executable to run
86 """
87 try:
88 self.threads = abs(int(self.threads))
89 except TypeError:
90 return logging.error("The value supplied to --threads must be a non-negative integer.")
91
92 commands = list()
93
94 with self.input().open() as f:
95 for target in f:
96 target = target.strip()
97
98 try:
99 if isinstance(ipaddress.ip_address(target), ipaddress.IPv6Address): # ipv6
100 target = f"[{target}]"
101 except ValueError:
102 # domain names raise ValueErrors, just assume we have a domain and keep on keepin on
103 pass
104
105 for url_scheme in ("https://", "http://"):
106 command = [tool_paths.get("webanalyze"), "-host", f"{url_scheme}{target}"]
107
108 commands.append(command)
109
110 Path(self.output().path).mkdir(parents=True, exist_ok=True)
111
112 cwd = Path().cwd()
113 os.chdir(self.output().path)
114
115 if not Path("apps.json").exists():
116 subprocess.run(f"{tool_paths.get('webanalyze')} -update".split())
117
118 with ThreadPoolExecutor(max_workers=self.threads) as executor:
119 executor.map(self._wrapped_subprocess, commands)
120
121 os.chdir(str(cwd))