Blog


How to Build an Automated Recon Pipeline with Python and Luigi - Part V (Web Scanning)

Jan 22, 2020 | 25 minutes read

Tags: how-to, bug bounty, hack the box, python, recon, luigi

Welcome back! If you found your way here without reading the prior posts in this series, you may want to start with some of the links to previous posts (below). This post is part five of a multi-part series demonstrating how to build an automated pipeline for target reconnaissance. The target in question could be the target of a pentest, bug bounty, or capture the flag challenge (shout out to my HTB peoples!). By the end of the series, we’ll have built a functional recon pipeline that can be tailored to fit your own needs.

Previous posts:

Part V will:

  • Add screenshotting capability
  • Add subdomain takeover checks
  • Add CORS misconfiguration scanning
  • Add forced browsing / directory busting
  • Add enumeration of the tech stack in use

Part V’s git tags:

  • stage-7
  • stage-8
  • stage-9
  • stage-10
  • stage-11
  • stage-12

To get the repository to the point at which we’ll start, we can run one of the following commands. Which command used depends on if the repository is already present or not.

git clone --branch stage-6 https://github.com/epi052/recon-pipeline.git
git checkout tags/stage-6

Roadmap:

  • Target scope
  • Port scanning I
  • Port scanning II
  • Subdomain enumeration
  • Web scanning <– this post
    • Screenshots
    • Subdomain takeover
    • CORS misconfiguration
    • Forced browsing
    • Tech stack identification
  • Data storage
  • Visualization / reporting
  • Slack integration

In this post, we’ll add quite a few web scanning modules. This post won’t go into the same level of code detail seen in prior posts due to the amount of things to get through. Additionally, many of the hurdles to implementation are ones we’ve seen already. I don’t see much value in rehashing what’s been covered already, though we will spend time where new problems arise.

Stage 7 - Gather Web Targets

Before we can scan any targets, we’ll need to figure out which targets have a web server up and listening. We’ll also need to define what constitutes a web server port as far as our pipeline is concerned. Let’s get to it.

Web Ports

To define web ports, we’ll simply add a line to config.py that we can import later. I’ve chosen the following five ports to be considered “web” ports for scanning purposes. Feel free to tailor this to your needs, as it’s certainly not an exhaustive list.

9web_ports = {'80', '443', '8080', '8000', '8443'}

The web Submodule

With our web ports defined, let’s take a brief moment to consider project layout. Because we’re going to have quite a few web scanning modules, now’s a great time to organize them. Let’s add a submodule to keep things organized. While we’re at it, we can add our targets.py file that will house the logic to gather up our web targets.

mkdir web
touch web/__init__.py
touch web/targets.py

Our project directory should now look like what’s shown below.

recon-pipeline/
├── LICENSE
├── Pipfile
├── Pipfile.lock
├── README.md
└── recon
    ├── amass.py
    ├── config.py
    ├── __init__.py
    ├── masscan.py
    ├── nmap.py
    ├── targets.py
    └── web
        ├── __init__.py
        └── targets.py

Nice, that was easy enough.

web.targets.GatherWebTargets

Now, we can start looking at how to gather all of our targets that have at least one of our chosen web ports listening.

11@inherits(ParseMasscanOutput, ParseAmassOutput)
12class GatherWebTargets(luigi.Task):

Our new class expects masscan and amass to have run already and will consume their output. Our new task will load the results from amass and masscan, parse each of them for targets that have one of the web ports open, and then add any matches to a set.

57    def run(self):
58        """ Gather all potential web targets into a single file to pass farther down the pipeline. """
59        targets = set()
60
61        ip_dict = pickle.load(open(self.input().get("masscan-output").path, "rb"))
62
63        """
64        structure over which we're looping
65        {
66            "IP_ADDRESS":
67                {'udp': {"161", "5000", ... },
68                ...
69                i.e. {protocol: set(ports) }
70        }
71        """
72        for target, protocol_dict in ip_dict.items():
73            for protocol, ports in protocol_dict.items():
74                for port in ports:
75                    if protocol == 'udp':
76                        continue
77                    if port == "80":
78                        targets.add(target)
79                    elif port in web_ports:
80                        targets.add(f"{target}:{port}")
81
82        for amass_result in self.input().get("amass-output").values():
83            with amass_result.open() as f:
84                for target in f:
85                    # we care about all results returned from amass
86                    targets.add(target.strip())

If we find port 80, in masscan’s results, we add the target itself (the default port for a lot of the tooling we’ll include later is 80). On the other hand, if we find any other port that’s a web port, but not 80, we store the target and the port. Other than the decision behind that bit of logic, there’s not much going on here. Once we’ve parsed all the valid web targets, we’ll write them to disk in a flat text file.

Finalized Code

Here we have the complete code for gathering web targets.

 1import pickle
 2
 3import luigi
 4from luigi.util import inherits
 5
 6from recon.config import web_ports
 7from recon.amass import ParseAmassOutput
 8from recon.masscan import ParseMasscanOutput
 9
10
11@inherits(ParseMasscanOutput, ParseAmassOutput)
12class GatherWebTargets(luigi.Task):
13    """ Gather all subdomains as well as any ip addresses known to have a configured web port open.
14
15    Args:
16        exempt_list: Path to a file providing blacklisted subdomains, one per line. *--* Optional for upstream Task
17        top_ports: Scan top N most popular ports *--* Required by upstream Task
18        ports: specifies the port(s) to be scanned *--* Required by upstream Task
19        interface: use the named raw network interface, such as "eth0" *--* Required by upstream Task
20        rate: desired rate for transmitting packets (packets per second) *--* Required by upstream Task
21        target_file: specifies the file on disk containing a list of ips or domains *--* Required by upstream Task
22    """
23
24    def requires(self):
25        """ GatherWebTargets depends on ParseMasscanOutput and ParseAmassOutput to run.
26
27        ParseMasscanOutput expects rate, target_file, interface, and either ports or top_ports as parameters.
28        ParseAmassOutput accepts exempt_list and expects target_file
29
30        Returns:
31            dict(str: ParseMasscanOutput, str: ParseAmassOutput)
32        """
33        args = {
34            "rate": self.rate,
35            "target_file": self.target_file,
36            "top_ports": self.top_ports,
37            "interface": self.interface,
38            "ports": self.ports,
39        }
40        return {
41            "masscan-output": ParseMasscanOutput(**args),
42            "amass-output": ParseAmassOutput(
43                exempt_list=self.exempt_list, target_file=self.target_file
44            ),
45        }
46
47    def output(self):
48        """ Returns the target output for this task.
49
50        Naming convention for the output file is webtargets.TARGET_FILE.txt.
51
52        Returns:
53            luigi.local_target.LocalTarget
54        """
55        return luigi.LocalTarget(f"webtargets.{self.target_file}.txt")
56
57    def run(self):
58        """ Gather all potential web targets into a single file to pass farther down the pipeline. """
59        targets = set()
60
61        ip_dict = pickle.load(open(self.input().get("masscan-output").path, "rb"))
62
63        """
64        structure over which we're looping
65        {
66            "IP_ADDRESS":
67                {'udp': {"161", "5000", ... },
68                ...
69                i.e. {protocol: set(ports) }
70        }
71        """
72        for target, protocol_dict in ip_dict.items():
73            for protocol, ports in protocol_dict.items():
74                for port in ports:
75                    if protocol == "udp":
76                        continue
77                    if port == "80":
78                        targets.add(target)
79                    elif port in web_ports:
80                        targets.add(f"{target}:{port}")
81
82        for amass_result in self.input().get("amass-output").values():
83            with amass_result.open() as f:
84                for target in f:
85                    # we care about all results returned from amass
86                    targets.add(target.strip())
87
88        with self.output().open("w") as f:
89            for target in targets:
90                f.write(f"{target}\n")

Stage 8 - Screenshots

When scanning large swaths of target space, automating visiting each site that’s live via screengrabs saves an incredible amount of time. My tool of choice for this task is Michael Henriksen’s aquatone. I really like how the screenshots are grouped together based on similarity. It makes it even easier to quickly parse targets and select what looks good. According to the repository’s README:

Aquatone is a tool for visual inspection of websites across a large amount of hosts and is convenient for quickly gaining an overview of HTTP-based attack surface.

Install aquatone

If you don’t have it already, go over to aquatone’s github page and follow their install instructions.

Scanning with aquatone

First, we’ll take a moment to figure out what we want our scans to do. There are quite a few options for aquatone, but we don’t need many in order to accomplish our goal. A run of aquatone against tesla.com would look something like what’s below.

cat webtargets.tesla.txt | aquatone -scan-timeout 900 -threads 20 
aquatone options used:

    -scan-timeout
        Timeout in miliseconds for port scans (default 100)
    -threads
        Number of concurrent threads (default number of logical CPUs)

Notice that aquatone takes input via STDIN, that’s something we’ll have to address. Also, you may be curious why we care about scan-timeout. I’ve experienced the same behavior described in this issue and adjusted accordingly. Since then, I haven’t seen any issues, so I’ve chosen to include the value by default.

web.aquatone.AquatoneScan

The first thing we’ll want to do when constructing our new class is to account for the options we expect to be able to send. Specifically, we want to include Parameters for threads and scan-timeout. It may seem like we should already have threads available to us, but we can reference the Task Dependency Graph below to see that none of the Tasks we inherit from use a threads Parameter.

task-dependency-2

We can also use this opportunity to begin storing values for some of the oft-used (and unoft-changed) Parameters, creating a set of defaults. Additionally, we can add the paths to our tools, since many of them are going to live outside our our PATH. Let’s go ahead and modify config.py to include our changes.

 4defaults = {
 5    'threads': '10',
 6    'aquatone-scan-timeout': '900'
 7}
 8
 9web_ports = {'80', '443', '8080', '8000', '8443'}
10
11tool_paths = {
12    'aquatone': '/opt/aquatone'
13}

After that, we can define the start of our AquatoneScan class.

11@inherits(GatherWebTargets)
12class AquatoneScan(luigi.Task):
13    threads = luigi.Parameter(default=defaults.get("threads", ""))
14    scan_timeout = luigi.Parameter(default=defaults.get("aquatone-scan-timeout", ""))

We should also go back and change the other classes that use a threads Parameter to make use of the new default at some point.

As seen above, we inherit from Task. We inherit from Task in order to handle redirecting our list of targets ot the process’s STDIN.

66    def run(self):
67        Path(self.output().path).mkdir(parents=True, exist_ok=True)
68
69        command = [
70            tool_paths.get("aquatone"),
71            "-scan-timeout",
72            self.scan_timeout,
73            "-threads",
74            self.threads,
75            "-silent",
76            "-out",
77            self.output().path,
78        ]
79
80        with self.input().open() as target_list:
81            subprocess.run(command, stdin=target_list)

Above is our first use of the tool_paths configuration dictionary. It saves us from hardcoding paths and adds some flexibility. We’re using subprocess.run’s stdin keyword argument in order to redirect our list of targetrs to the subprocess’s STDIN. We didn’t discuss -silent and -out, though their functions should be pretty clear.

Finalized Code

After all is said and done, we’re left with the code below.

 1import subprocess
 2from pathlib import Path
 3
 4import luigi
 5from luigi.util import inherits
 6
 7from recon.config import tool_paths, defaults
 8from recon.web.targets import GatherWebTargets
 9
10
11@inherits(GatherWebTargets)
12class AquatoneScan(luigi.Task):
13    """ Screenshot all web targets and generate HTML report.
14
15    aquatone commands are structured like the example below.
16
17    cat webtargets.tesla.txt | /opt/aquatone -scan-timeout 900 -threads 20 
18
19    An example of the corresponding luigi command is shown below.
20
21    PYTHONPATH=$(pwd) luigi --local-scheduler --module recon.web.aquatone AquatoneScan --target-file tesla --top-ports 1000
22
23    Args:
24        threads: number of threads for parallel aquatone command execution
25        scan_timeout: timeout in miliseconds for aquatone port scans
26        exempt_list: Path to a file providing blacklisted subdomains, one per line. *--* Optional for upstream Task
27        top_ports: Scan top N most popular ports *--* Required by upstream Task
28        ports: specifies the port(s) to be scanned *--* Required by upstream Task
29        interface: use the named raw network interface, such as "eth0" *--* Required by upstream Task
30        rate: desired rate for transmitting packets (packets per second) *--* Required by upstream Task
31        target_file: specifies the file on disk containing a list of ips or domains *--* Required by upstream Task
32    """
33
34    threads = luigi.Parameter(default=defaults.get("threads", ""))
35    scan_timeout = luigi.Parameter(default=defaults.get("aquatone-scan-timeout", ""))
36
37    def requires(self):
38        """ AquatoneScan depends on GatherWebTargets to run.
39
40        GatherWebTargets accepts exempt_list and expects rate, target_file, interface,
41                         and either ports or top_ports as parameters
42
43        Returns:
44            luigi.Task - GatherWebTargets
45        """
46        args = {
47            "rate": self.rate,
48            "target_file": self.target_file,
49            "top_ports": self.top_ports,
50            "interface": self.interface,
51            "ports": self.ports,
52            "exempt_list": self.exempt_list,
53        }
54        return GatherWebTargets(**args)
55
56    def output(self):
57        """ Returns the target output for this task.
58
59        Naming convention for the output file is amass.TARGET_FILE.json.
60
61        Returns:
62            luigi.local_target.LocalTarget
63        """
64        return luigi.LocalTarget(f"aquatone-{self.target_file}-results")
65
66    def run(self):
67        """ Defines the options/arguments sent to aquatone after processing.
68
69        cat webtargets.tesla.txt | /opt/aquatone -scan-timeout 900 -threads 20
70
71        Returns:
72            list: list of options/arguments, beginning with the name of the executable to run
73        """
74        Path(self.output().path).mkdir(parents=True, exist_ok=True)
75
76        command = [
77            tool_paths.get("aquatone"),
78            "-scan-timeout",
79            self.scan_timeout,
80            "-threads",
81            self.threads,
82            "-silent",
83            "-out",
84            self.output().path,
85        ]
86
87        with self.input().open() as target_list:
88            subprocess.run(command, stdin=target_list)

Stage 9 - Subdomain Takeover

In this section, we’ll add another low-hanging-fruit type of scan. To accomplish that, we’ll actually include two tools: tko-subs and subjack.

Installation

We’ll need to install both tools and add the paths to our config.py. If you don’t have them already, go over to tko-subs’s and subjack’s github pages and follow their install instructions.

We’ll preemptively include tko-subs-dir and subjack-fingerprints because we’ll need those paths to properly run their associated tools later on.

16tool_paths = {
17    'aquatone': '/opt/aquatone',
18    'tko-subs': '/root/go/bin/tko-subs',
19    'tko-subs-dir': '/root/go/src/github.com/anshumanbh/tko-subs',
20    'subjack': '/root/go/bin/subjack',
21    'subjack-fingerprints': '/root/go/src/github.com/haccer/subjack/fingerprints.json'
22}

web.subdomain_takeover.TKOSubsScan

As usual, let’s take a look at the options we’ll be using for tko-subs.

tko-subs -domains=tesla.subdomains -data=/root/go/src/github.com/anshumanbh/tko-subs/providers-data.csv -output=tkosubs.tesla.csv
tko-subs options used:

    -domains
        path to the file containing domains to check
    -data
        path to the data csv
    -output
        path to output file (csv)

Because we’re simply running a command, all we need to do is return a list from TKOSubsScan.program_args.

59    def program_args(self):
60        command = [
61            tool_paths.get("tko-subs"),
62            f"-domains={self.input().path}",
63            f"-data={tool_paths.get('tko-subs-dir')}/providers-data.csv",
64            f"-output={self.output().path}",
65        ]
66
67        return command

That’s it!

web.subdomain_takeover.SubjackScan

Again, we’ll start off with the shape of the command we intend to run.

subjack -w webtargets.tesla.txt -t 20 -timeout 30 -o subjack.tesla.txt -ssl -v -c /root/go/src/github.com/haccer/subjack/fingerprints.json -a 
subjack options used:

    -w 
        list of subdomains
    -t
        number of threads (default: 10)pyp9P5#5cmJVswe!
    -timeout
        seconds to wait before timeout connection (default: 10)
    -o
        where to save results
    -ssl
        enforces HTTPS requests which may return a different set of results/increase accuracy
    -v
        display more information per each request
    -c
        path to configuration file
    -a
        skips CNAME check and sends requests to every URL

Just like tko-subs, we’re simply adding a command to our pipeline. There’s nothing special in the code that we haven’t covered before.

Finalized Code

And here we have the implementation of our subdomain_takeover module.

  1import luigi
  2from luigi.util import inherits
  3from luigi.contrib.external_program import ExternalProgramTask
  4
  5from recon.config import tool_paths, defaults
  6from recon.web.targets import GatherWebTargets
  7
  8
  9@inherits(GatherWebTargets)
 10class TKOSubsScan(ExternalProgramTask):
 11    """ Use tko-subs to scan for potential subdomain takeovers.
 12
 13    tko-subs commands are structured like the example below.
 14
 15    tko-subs -domains=tesla.subdomains -data=/root/go/src/github.com/anshumanbh/tko-subs/providers-data.csv -output=tkosubs.tesla.csv
 16
 17    An example of the corresponding luigi command is shown below.
 18
 19    PYTHONPATH=$(pwd) luigi --local-scheduler --module recon.web.subdomain_takeover TKOSubsScan --target-file tesla --top-ports 1000 --interface eth0
 20
 21    Args:
 22        exempt_list: Path to a file providing blacklisted subdomains, one per line. *--* Optional for upstream Task
 23        top_ports: Scan top N most popular ports *--* Required by upstream Task
 24        ports: specifies the port(s) to be scanned *--* Required by upstream Task
 25        interface: use the named raw network interface, such as "eth0" *--* Required by upstream Task
 26        rate: desired rate for transmitting packets (packets per second) *--* Required by upstream Task
 27        target_file: specifies the file on disk containing a list of ips or domains *--* Required by upstream Task
 28    """
 29
 30    def requires(self):
 31        """ TKOSubsScan depends on GatherWebTargets to run.
 32
 33        GatherWebTargets accepts exempt_list and expects rate, target_file, interface,
 34                         and either ports or top_ports as parameters
 35
 36        Returns:
 37            luigi.Task - GatherWebTargets
 38        """
 39        args = {
 40            "rate": self.rate,
 41            "target_file": self.target_file,
 42            "top_ports": self.top_ports,
 43            "interface": self.interface,
 44            "ports": self.ports,
 45            "exempt_list": self.exempt_list,
 46        }
 47        return GatherWebTargets(**args)
 48
 49    def output(self):
 50        """ Returns the target output for this task.
 51
 52        Naming convention for the output file is tkosubs.TARGET_FILE.csv.
 53
 54        Returns:
 55            luigi.local_target.LocalTarget
 56        """
 57        return luigi.LocalTarget(f"tkosubs.{self.target_file}.csv")
 58
 59    def program_args(self):
 60        """ Defines the options/arguments sent to tko-subs after processing.
 61
 62        Returns:
 63            list: list of options/arguments, beginning with the name of the executable to run
 64        """
 65
 66        command = [
 67            tool_paths.get("tko-subs"),
 68            f"-domains={self.input().path}",
 69            f"-data={tool_paths.get('tko-subs-dir')}/providers-data.csv",
 70            f"-output={self.output().path}",
 71        ]
 72
 73        return command
 74
 75
 76@inherits(GatherWebTargets)
 77class SubjackScan(ExternalProgramTask):
 78    """ Use subjack to scan for potential subdomain takeovers.
 79
 80    subjack commands are structured like the example below.
 81
 82    subjack -w webtargets.tesla.txt -t 100 -timeout 30 -o subjack.tesla.txt -ssl
 83
 84    An example of the corresponding luigi command is shown below.
 85
 86    PYTHONPATH=$(pwd) luigi --local-scheduler --module recon.web.subdomain_takeover SubjackScan --target-file tesla --top-ports 1000 --interface eth0
 87
 88    Args:
 89        threads: number of threads for parallel subjack command execution
 90        exempt_list: Path to a file providing blacklisted subdomains, one per line. *--* Optional for upstream Task
 91        top_ports: Scan top N most popular ports *--* Required by upstream Task
 92        ports: specifies the port(s) to be scanned *--* Required by upstream Task
 93        interface: use the named raw network interface, such as "eth0" *--* Required by upstream Task
 94        rate: desired rate for transmitting packets (packets per second) *--* Required by upstream Task
 95        target_file: specifies the file on disk containing a list of ips or domains *--* Required by upstream Task
 96    """
 97
 98    threads = luigi.Parameter(default=defaults.get("threads", ""))
 99
100    def requires(self):
101        """ SubjackScan depends on GatherWebTargets to run.
102
103        GatherWebTargets accepts exempt_list and expects rate, target_file, interface,
104                         and either ports or top_ports as parameters
105
106        Returns:
107            luigi.Task - GatherWebTargets
108        """
109        args = {
110            "rate": self.rate,
111            "target_file": self.target_file,
112            "top_ports": self.top_ports,
113            "interface": self.interface,
114            "ports": self.ports,
115            "exempt_list": self.exempt_list,
116        }
117        return GatherWebTargets(**args)
118
119    def output(self):
120        """ Returns the target output for this task.
121
122        Naming convention for the output file is subjack.TARGET_FILE.txt.
123
124        Returns:
125            luigi.local_target.LocalTarget
126        """
127        return luigi.LocalTarget(f"subjack.{self.target_file}.txt")
128
129    def program_args(self):
130        """ Defines the options/arguments sent to subjack after processing.
131
132        Returns:
133            list: list of options/arguments, beginning with the name of the executable to run
134        """
135
136        command = [
137            tool_paths.get("subjack"),
138            "-w",
139            self.input().path,
140            "-t",
141            self.threads,
142            "-a",
143            "-timeout",
144            "30",
145            "-o",
146            self.output().path,
147            "-v",
148            "-ssl",
149            "-c",
150            tool_paths.get("subjack-fingerprints"),
151        ]
152
153        return command

Stage 10 - CORS Misconfiguration

Continuing with our simple scans, we’ll add a CORS scanner to the mix. The CORS scanner we’ll use is CORScanner.

Installation

Similar to our other tools, we’ll go ahead and add the path to our CORScanner (assuming it’s already been downloaded from GitHub).

16tool_paths = {
17    'aquatone': '/opt/aquatone',
18    'tko-subs': '/root/go/bin/tko-subs',
19    'tko-subs-dir': '/root/go/src/github.com/anshumanbh/tko-subs',
20    'subjack': '/root/go/bin/subjack',
21    'subjack-fingerprints': '/root/go/src/github.com/haccer/subjack/fingerprints.json',
22    'CORScanner': '/opt/CORScanner/cors_scan.py',
23}

web.corscanner.CORScannerScan

Once again, we’ll figure out how we prefer to run the tool.

python3 cors_scan.py -i webtargets.tesla.txt -t 20 -o corscanner.tesla.json
cors_scan.py options used:

    -i
        URL/domain list file to check their CORS policy
    -t
        Number of threads to use for CORS scan
    -o
        Save the results to json file

Similar to many of the web scans, the integreation into the pipeline is dead simple.

68    def program_args(self):
69        command = [
70            "python3",
71            tool_paths.get("CORScanner"),
72            "-i",
73            self.input().path,
74            "-t",
75            self.threads,
76            "-o",
77            self.output().path,
78        ]
79
80        return command

Finalized Code

 1import luigi
 2from luigi.util import inherits
 3from luigi.contrib.external_program import ExternalProgramTask
 4
 5from recon.config import tool_paths, defaults
 6from recon.web.targets import GatherWebTargets
 7
 8
 9@inherits(GatherWebTargets)
10class CORScannerScan(ExternalProgramTask):
11    """ Use CORScanner to scan for potential CORS misconfigurations.
12
13    CORScanner commands are structured like the example below.
14
15    python cors_scan.py -i webtargets.tesla.txt -t 100
16
17    An example of the corresponding luigi command is shown below.
18
19    PYTHONPATH=$(pwd) luigi --local-scheduler --module recon.web.corscanner CORScannerScan --target-file tesla --top-ports 1000 --interface eth0
20
21    Install:
22        git clone https://github.com/chenjj/CORScanner.git
23        cd CORScanner
24        pip install -r requirements.txt
25        pip install future
26
27    Args:
28        threads: number of threads for parallel subjack command execution
29        exempt_list: Path to a file providing blacklisted subdomains, one per line. *--* Optional for upstream Task
30        top_ports: Scan top N most popular ports *--* Required by upstream Task
31        ports: specifies the port(s) to be scanned *--* Required by upstream Task
32        interface: use the named raw network interface, such as "eth0" *--* Required by upstream Task
33        rate: desired rate for transmitting packets (packets per second) *--* Required by upstream Task
34        target_file: specifies the file on disk containing a list of ips or domains *--* Required by upstream Task
35    """
36
37    threads = luigi.Parameter(default=defaults.get("threads", ""))
38
39    def requires(self):
40        """ CORScannerScan depends on GatherWebTargets to run.
41
42        GatherWebTargets accepts exempt_list and expects rate, target_file, interface,
43                         and either ports or top_ports as parameters
44
45        Returns:
46            luigi.Task - GatherWebTargets
47        """
48        args = {
49            "rate": self.rate,
50            "target_file": self.target_file,
51            "top_ports": self.top_ports,
52            "interface": self.interface,
53            "ports": self.ports,
54            "exempt_list": self.exempt_list,
55        }
56        return GatherWebTargets(**args)
57
58    def output(self):
59        """ Returns the target output for this task.
60
61        Naming convention for the output file is corscanner.TARGET_FILE.json.
62
63        Returns:
64            luigi.local_target.LocalTarget
65        """
66        return luigi.LocalTarget(f"corscanner.{self.target_file}.json")
67
68    def program_args(self):
69        """ Defines the options/arguments sent to tko-subs after processing.
70
71        Returns:
72            list: list of options/arguments, beginning with the name of the executable to run
73        """
74
75        command = [
76            "python3",
77            tool_paths.get("CORScanner"),
78            "-i",
79            self.input().path,
80            "-t",
81            self.threads,
82            "-o",
83            self.output().path,
84        ]
85
86        return command

Stage 11 - Forced Browsing

Forced browsing is the next step in our pipeline. Everyone has their favorite tool for this task, and I’m no different. My favorite is easily gobuster. A while back, I wrote a recursive wrapper around gobuster as well. We’ll use both tools here to allow us the option of performing recursive forced browsing if we choose.

Installation

Don’t forget to update config.py along with grabbing gobuster and recursive-gobuster.

16tool_paths = {
17    -------------8<-------------
18    'gobuster': '/usr/local/go/bin/gobuster',
19    'recursive-gobuster': '/usr/local/bin/recursive-gobuster.pyz',
20}

web.gobuster.GobusterScan

Here’s how the default gobuster command will look when we run it.

    gobuster dir -q -e -k -t 20 -u www.tesla.com -w /usr/share/seclists/Discovery/Web-Content/common.txt -o gobuster.tesla.txt
gobuster dir options used:

    -q
        Don't print the banner and other noise
    -e
        Expanded mode, print full URLs
    -k
        Skip SSL certificate verification
    -t
        Number of concurrent threads (default 10)
    -u
        The target URL
    -w
        Path to the wordlist
    -o
        Output file to write results to 

Even though the default is outlined above, we’ll want a few additional options for certain situations. We also need to be able to configure the values sent to some of our default options. In order to allow that additional customization of commands, we’ll add some new Parameters.

15@inherits(GatherWebTargets)
16class GobusterScan(luigi.Task):
17    proxy = luigi.Parameter(default=defaults.get("proxy", ""))
18    threads = luigi.Parameter(default=defaults.get("threads", ""))
19    wordlist = luigi.Parameter(default=defaults.get("gobuster-wordlist", ""))
20    extensions = luigi.Parameter(default=defaults.get("gobuster-extensions", ""))
21    recursive = luigi.BoolParameter(default=False)

There are a few issues we need to solve with running gobuster. First, we’ll need to handle IPv6 addresses. IPv6 addresses can be browsed to when they’re enclosed within square brackets. Here’s what that looks like in code.

102                try:
103                    if isinstance(ipaddress.ip_address(target), ipaddress.IPv6Address):  # ipv6
104                        target = f"[{target}]"
105                except ValueError:
106                    # domain names raise ValueErrors, just assume we have a domain and keep on keepin on
107                    pass

Next, we’ll have two different branches of logic to build the base command list. The branch depends upon whether we’ve specified --recursive or not.

109                    if self.recursive:
110                        command = [
111                            tool_paths.get("recursive-gobuster"),
112                            "-w",
113                            self.wordlist,
114                            f"{url_scheme}{target}",
115                        ]
116                    else:
117                        command = [
118                            tool_paths.get("gobuster"),
119                            "dir",
120                            "-q",
121                            "-e",
122                            "-k",
123                            "-u",
124                            f"{url_scheme}{target}",
125                            "-w",
126                            self.wordlist,
127                            "-o",
128                            Path(self.output().path).joinpath(
129                                f"gobuster.{url_scheme.replace('//', '_').replace(':', '')}{target}.txt"
130                            ),
131                        ]

The rest of the code is more or less straightforward and can be seen in the next section.

Finalized Code

  1import os
  2import logging
  3import ipaddress
  4import subprocess
  5from pathlib import Path
  6from concurrent.futures import ThreadPoolExecutor
  7
  8import luigi
  9from luigi.util import inherits
 10
 11from recon.config import tool_paths, defaults
 12from recon.web.targets import GatherWebTargets
 13
 14
 15@inherits(GatherWebTargets)
 16class GobusterScan(luigi.Task):
 17    """ Use gobuster to perform forced browsing.
 18
 19    gobuster commands are structured like the example below.
 20
 21    gobuster dir -q -e -k -t 20 -u www.tesla.com -w /usr/share/seclists/Discovery/Web-Content/common.txt -p http://127.0.0.1:8080 -o gobuster.tesla.txt -x php,html
 22
 23    An example of the corresponding luigi command is shown below.
 24
 25    PYTHONPATH=$(pwd) luigi --local-scheduler --module recon.web.gobuster GobusterScan --target-file tesla --top-ports 1000 \
 26                            --interface eth0 --proxy http://127.0.0.1:8080 --extensions php,html \
 27                            --wordlist /usr/share/seclists/Discovery/Web-Content/common.txt --threads 20
 28
 29    Install:
 30        go get github.com/OJ/gobuster
 31        git clone https://github.com/epi052/recursive-gobuster.git
 32
 33    Args:
 34        threads: number of threads for parallel gobuster command execution
 35        wordlist: wordlist used for forced browsing
 36        extensions: additional extensions to apply to each item in the wordlist
 37        recursive: whether or not to recursively gobust the target (may produce a LOT of traffic... quickly)
 38        exempt_list: Path to a file providing blacklisted subdomains, one per line. *--* Optional for upstream Task
 39        top_ports: Scan top N most popular ports *--* Required by upstream Task
 40        ports: specifies the port(s) to be scanned *--* Required by upstream Task
 41        interface: use the named raw network interface, such as "eth0" *--* Required by upstream Task
 42        rate: desired rate for transmitting packets (packets per second) *--* Required by upstream Task
 43        target_file: specifies the file on disk containing a list of ips or domains *--* Required by upstream Task
 44    """
 45
 46    proxy = luigi.Parameter(default=defaults.get("proxy", ""))
 47    threads = luigi.Parameter(default=defaults.get("threads", ""))
 48    wordlist = luigi.Parameter(default=defaults.get("gobuster-wordlist", ""))
 49    extensions = luigi.Parameter(default=defaults.get("gobuster-extensions", ""))
 50    recursive = luigi.BoolParameter(default=False)
 51
 52    def requires(self):
 53        """ GobusterScan depends on GatherWebTargets to run.
 54
 55        GatherWebTargets accepts exempt_list and expects rate, target_file, interface,
 56                         and either ports or top_ports as parameters
 57
 58        Returns:
 59            luigi.Task - GatherWebTargets
 60        """
 61        args = {
 62            "rate": self.rate,
 63            "target_file": self.target_file,
 64            "top_ports": self.top_ports,
 65            "interface": self.interface,
 66            "ports": self.ports,
 67            "exempt_list": self.exempt_list,
 68        }
 69        return GatherWebTargets(**args)
 70
 71    def output(self):
 72        """ Returns the target output for this task.
 73
 74        If recursion is disabled, the naming convention for the output file is gobuster.TARGET_FILE.txt
 75        Otherwise the output file is recursive-gobuster_TARGET_FILE.log
 76
 77        Results are stored in their own directory: gobuster-TARGET_FILE-results
 78
 79        Returns:
 80            luigi.local_target.LocalTarget
 81        """
 82        return luigi.LocalTarget(f"gobuster-{self.target_file}-results")
 83
 84    def run(self):
 85        """ Defines the options/arguments sent to gobuster after processing.
 86
 87        Returns:
 88            list: list of options/arguments, beginning with the name of the executable to run
 89        """
 90        try:
 91            self.threads = abs(int(self.threads))
 92        except TypeError:
 93            return logging.error("The value supplied to --threads must be a non-negative integer.")
 94
 95        commands = list()
 96
 97        with self.input().open() as f:
 98            for target in f:
 99                target = target.strip()
100
101                try:
102                    if isinstance(ipaddress.ip_address(target), ipaddress.IPv6Address):  # ipv6
103                        target = f"[{target}]"
104                except ValueError:
105                    # domain names raise ValueErrors, just assume we have a domain and keep on keepin on
106                    pass
107
108                for url_scheme in ("https://", "http://"):
109                    if self.recursive:
110                        command = [
111                            tool_paths.get("recursive-gobuster"),
112                            "-w",
113                            self.wordlist,
114                            f"{url_scheme}{target}",
115                        ]
116                    else:
117                        command = [
118                            tool_paths.get("gobuster"),
119                            "dir",
120                            "-q",
121                            "-e",
122                            "-k",
123                            "-u",
124                            f"{url_scheme}{target}",
125                            "-w",
126                            self.wordlist,
127                            "-o",
128                            Path(self.output().path).joinpath(
129                                f"gobuster.{url_scheme.replace('//', '_').replace(':', '')}{target}.txt"
130                            ),
131                        ]
132
133                    if self.extensions:
134                        command.extend(["-x", self.extensions])
135
136                    if self.proxy:
137                        command.extend(["-p", self.proxy])
138
139                    commands.append(command)
140
141        Path(self.output().path).mkdir(parents=True, exist_ok=True)
142
143        if self.recursive:
144            # workaround for recursive gobuster not accepting output directory
145            cwd = Path().cwd()
146            os.chdir(self.output().path)
147
148        with ThreadPoolExecutor(max_workers=self.threads) as executor:
149            executor.map(subprocess.run, commands)
150
151        if self.recursive:
152            os.chdir(str(cwd))

Stage 12 - Tech Stack Identification

Our final scan will focus on identifying technologies used within a webapp such as determining CMS, backend framework, etc… I suspect most folks are using wappalyzer for this when manually browsing sites (myself included). Since webanalyze is a port of wappalyzer written in Go, it seems an obvious choice for the task.

Installation

We’ll update our config.py one last time after installing webanalyze.

16tool_paths = {
17    'aquatone': '/opt/aquatone',
18    'tko-subs': '/root/go/bin/tko-subs',
19    'tko-subs-dir': '/root/go/src/github.com/anshumanbh/tko-subs',
20    'subjack': '/root/go/bin/subjack',
21    'subjack-fingerprints': '/root/go/src/github.com/haccer/subjack/fingerprints.json',
22    'CORScanner': '/opt/CORScanner/cors_scan.py',
23    'gobuster': '/usr/local/go/bin/gobuster',
24    'recursive-gobuster': '/usr/local/bin/recursive-gobuster.pyz',
25    'webanalyze': '/root/go/bin/webanalyze'
26}

web.webanalyze.WebanalyzeScan

There are two webanalyze commands we’ll be running.

First, the command that updates apps.json, which is the file that webanalyze uses for signatures and relationships.

webanalyze -update

webanalyze options used:

    -update
        downloads a current version of apps.json from the wappalyzer repository to the current folder.

Second, the webanalyze command itself.

webanalyze -host https://tesla.com
webanalyze options used:

    -host
        single host to test

A quirk of webanalyze is that it only prints results to stderr, so we’ll need to capture the results and store them in a file manually. The code to do that is below. It also normalizes the URLs to easier to manage filenames.

77    def _wrapped_subprocess(self, cmd):
78        with open(f"webanalyze.{cmd[2].replace('//', '_').replace(':', '')}.txt", "wb") as f:
79            subprocess.run(cmd, stderr=f)

The wrapper for subprocess.run above is called using our standard template for adding threading to a scan (shown below).

117        with ThreadPoolExecutor(max_workers=self.threads) as executor:
118            executor.map(self._wrapped_subprocess, commands)

Other than that, all of the issues with running this command are problems we’ve overcome already elsewhere. So, without further ado, the finalized code!

Finalized Code

  1import os
  2import logging
  3import ipaddress
  4import subprocess
  5from pathlib import Path
  6from concurrent.futures import ThreadPoolExecutor
  7
  8import luigi
  9from luigi.util import inherits
 10
 11from recon.config import tool_paths, defaults
 12from recon.web.targets import GatherWebTargets
 13
 14
 15@inherits(GatherWebTargets)
 16class WebanalyzeScan(luigi.Task):
 17    """ Use webanalyze to determine the technology stack on the given target(s).
 18
 19    webanalyze commands are structured like the example below.
 20
 21    webanalyze -host www.tesla.com -output json
 22
 23    An example of the corresponding luigi command is shown below.
 24
 25    PYTHONPATH=$(pwd) luigi --local-scheduler --module recon.web.webanalyze WebanalyzeScan --target-file tesla --top-ports 1000 --interface eth0
 26
 27    Install:
 28
 29        go get -u github.com/rverton/webanalyze
 30
 31        # loads new apps.json file from wappalyzer project
 32        webanalyze -update
 33
 34    Args:
 35        threads: number of threads for parallel webanalyze command execution
 36        exempt_list: Path to a file providing blacklisted subdomains, one per line. *--* Optional for upstream Task
 37        top_ports: Scan top N most popular ports *--* Required by upstream Task
 38        ports: specifies the port(s) to be scanned *--* Required by upstream Task
 39        interface: use the named raw network interface, such as "eth0" *--* Required by upstream Task
 40        rate: desired rate for transmitting packets (packets per second) *--* Required by upstream Task
 41        target_file: specifies the file on disk containing a list of ips or domains *--* Required by upstream Task
 42    """
 43
 44    threads = luigi.Parameter(default=defaults.get("threads", ""))
 45
 46    def requires(self):
 47        """ WebanalyzeScan depends on GatherWebTargets to run.
 48
 49        GatherWebTargets accepts exempt_list and expects rate, target_file, interface,
 50                         and either ports or top_ports as parameters
 51
 52        Returns:
 53            luigi.Task - GatherWebTargets
 54        """
 55        args = {
 56            "rate": self.rate,
 57            "target_file": self.target_file,
 58            "top_ports": self.top_ports,
 59            "interface": self.interface,
 60            "ports": self.ports,
 61            "exempt_list": self.exempt_list,
 62        }
 63        return GatherWebTargets(**args)
 64
 65    def output(self):
 66        """ Returns the target output for this task.
 67
 68        The naming convention for the output file is webanalyze.TARGET_FILE.txt
 69
 70        Results are stored in their own directory: webanalyze-TARGET_FILE-results
 71
 72        Returns:
 73            luigi.local_target.LocalTarget
 74        """
 75        return luigi.LocalTarget(f"webanalyze-{self.target_file}-results")
 76
 77    def _wrapped_subprocess(self, cmd):
 78        with open(f"webanalyze.{cmd[2].replace('//', '_').replace(':', '')}.txt", "wb") as f:
 79            subprocess.run(cmd, stderr=f)
 80
 81    def run(self):
 82        """ Defines the options/arguments sent to webanalyze after processing.
 83
 84        Returns:
 85            list: list of options/arguments, beginning with the name of the executable to run
 86        """
 87        try:
 88            self.threads = abs(int(self.threads))
 89        except TypeError:
 90            return logging.error("The value supplied to --threads must be a non-negative integer.")
 91
 92        commands = list()
 93
 94        with self.input().open() as f:
 95            for target in f:
 96                target = target.strip()
 97
 98                try:
 99                    if isinstance(ipaddress.ip_address(target), ipaddress.IPv6Address):  # ipv6
100                        target = f"[{target}]"
101                except ValueError:
102                    # domain names raise ValueErrors, just assume we have a domain and keep on keepin on
103                    pass
104
105                for url_scheme in ("https://", "http://"):
106                    command = [tool_paths.get("webanalyze"), "-host", f"{url_scheme}{target}"]
107
108                    commands.append(command)
109
110        Path(self.output().path).mkdir(parents=True, exist_ok=True)
111
112        cwd = Path().cwd()
113        os.chdir(self.output().path)
114
115        if not Path("apps.json").exists():
116            subprocess.run(f"{tool_paths.get('webanalyze')} -update".split())
117
118        with ThreadPoolExecutor(max_workers=self.threads) as executor:
119            executor.map(self._wrapped_subprocess, commands)
120
121        os.chdir(str(cwd))

comments powered by Disqus