Deepfuzz: Automatic Generation Of Syntax Valid C Programs For Fuzz Testing

/r/programming is a reddit for discussion and news about computer programming. Please keep submissions on topic and of high quality. Just because it has a computer in it doesn't make it programming. If there is no code in your link, it probably doesn't belong here.

About this Journal ·
Abstracting and Indexing ·
Aims and Scope ·
Article Processing Charges ·
Bibliographic Information ·
Editorial Board ·
Editorial Workflow ·
Publication Ethics ·
Reviewer Resources ·
Submit a Manuscript ·
Subscription Information ·

Open Special Issues ·
Published Special Issues ·

Automatic Test Pattern Generator for Fuzzing Based on Finite State Machine

Department of Electrical Engineering, National Taiwan University, Taipei City, Taiwan

Correspondence should be addressed to ; wt.ude.utn@iellc

Received 5 April 2017; Revised 14 August 2017; Accepted 10 October 2017; Published 13 November 2017

Academic Editor: Emanuele Maiorana

Copyright © 2017 Ming-Hung Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

With the rapid development of the Internet, several emerging technologies are adopted to construct fancy, interactive, and user-friendly websites. Among these technologies, HTML5 is a popular one and is widely used in establishing modern sites. However, the security issues in the new web technologies are also raised and are worthy of investigation. For vulnerability investigation, many previous studies used fuzzing and focused on generation-based approaches to produce test cases for fuzzing; however, these methods require a significant amount of knowledge and mental efforts to develop test patterns for generating test cases. To decrease the entry barrier of conducting fuzzing, in this study, we propose a test pattern generation algorithm based on the concept of finite state machines. We apply graph analysis techniques to extract paths from finite state machines and use these paths to construct test patterns automatically. According to the proposal, fuzzing can be completed through inputting a regular expression corresponding to the test target. To evaluate the performance of our proposal, we conduct an experiment in identifying vulnerabilities of the input attributes in HTML5. According to the results, our approach is not only efficient but also effective for identifying weak validators in HTML5.

1. Introduction

Nowadays, with the rapid development of modern web technologies, a large number of social, interactive, and commercial services and a great amount of information are provided and exchanged between websites and end users. Meanwhile, more and more sensitive data such as payment or personal information has been exchanged and stored online. For example, HTML5 is an emerging interactive syntax which has been adopted in website development for its up-to-date specifications and support of multimedia content. Compared with HTML4, HTML5 defines more specific input attributes such as telephone number, color, and email address. However, on the other hand, some studies have also presented the security issues affiliated with HTML5 [1–5], especially the injection attacks on the websites [6–9]. Even though in HTML5 regular users can only enter a valid value through the user interfaces supported by browsers, malicious users can skip the user interface and directly send malformed HTTP requests to the web server. These requests could contain strings for injection attacks. As the requests are directly sent to the web server and are not checked by the user interfaces, websites should handle the malicious string inputs to avoid injection attacks.

Current website developers rely on developing several regular expressions to validate the correctness of the input strings and filter malicious ones. However, while the validators are designed and constructed by mere brainstorming of engineers without a formal and systematical methodology, there may be some underneath threats based on the loopholes in these expressions due to the human limitations [10]. To address the issue, some studies adopt fuzzing [11, 12], a software testing technique that has been used in quality assurance and vulnerability detection [13, 14]. For its programmability and flexibility in conducting regularized and large-scale tests, fuzzing helps to reduce human efforts and provides quick test case generation [15–17].

In this paper, we design an automatic fuzzing framework that can be used to investigate potential injection vulnerabilities for modern HTML5 websites. We develop a systematic and automatic testing framework that is capable and efficient for engineers to identify vulnerabilities early. To generate test inputs extensively, we leverage the concept of finite state machines (FSM) and graphical analysis algorithms. Our fuzzing procedure starts by analyzing the regular expressions of the corresponding input attributes (e.g., a date format or a telephone number format in HTML5). We next transform the regular expression to a corresponding FSM and take the negation of the FSM, so that we can derive a FSM for generating incorrect input strings to imitate abnormal behaviors or malfunctions of websites. From the negated FSM, we treat it as graphs and perform path extraction according to different algorithms to create invalid input cases.

By incorporating the above algorithms with automation components in our fuzzing framework, we present three major advantages of our system. First, we enhance the accessibility of our system by accepting regular expressions to produce invalid test patterns for fuzzing. Second, we provide the automation in test case generation and injection toward the target server. Third, our framework aggregates and filters potential abnormal responses while fuzzing using a similarity comparison mechanism of HTTP responses. According to the above features, our system offloads the efforts for reviewing the result of testing manually, which is beneficial while conducting large-scale tests.

Our contributions in this study are threefold:(i)We design a framework adopting fuzzing to investigate potential vulnerabilities in websites containing HTML5 and automatically aggregate and summarize results into reports for manual review.(ii)We present a FSM-based pattern generation algorithm to regularly produce invalid input test cases for fuzzing, according to the grammar of input attributes. From the method, labor efforts and time spent on brainstorming invalid input strings can be almost waived when conducting large-scale generation-based fuzzing.(iii)We implement our design and evaluate the system by in-lab experiments and on a popular website. From the results, our system achieves high detection rates on misdesigned websites and completes website testing within several minutes.

The remainder of this paper is organized as follows. We discuss the related studies of fuzzing and the strength of our study in Section 2. In Section 3, we present our framework for fuzzing HTML5 websites. In Section 4, we propose test pattern generation method based on FSM and graphical analysis. We demonstrate evaluation results and discuss the finding of our study in Section 5. In Section 6, we present the results on fuzzing a popular website which is in operation. The conclusion and future issues of this study are depicted in Section 7.

2. Related Works

2.1. Fuzzing

Fuzzing was first proposed by Miller et al. [11, 18] as a software testing technique. The primary procedure of fuzzing is to develop programs to generate a series of test strings automatically and inject them into the target. By observing behaviors and responses of the target, engineers could investigate the abnormal behaviors or unexpected responses to identify potential vulnerabilities of the test target. As fuzzing usually relies on generating a large number of test cases, several fuzzing studies presented methods for creating test cases, which can be categorized into grammar-based fuzzing [12, 15, 16, 19, 20] and mutation-based fuzzing [19, 21].

In this study, we focus on grammar-based fuzzing, which is more suitable for generating strings according to the predefined rules in HTML5. From the past studies on grammar-based fuzzing, test input generation can be divided into two mechanisms: randomized generating [22–24] and exhaustive generating [25]. However, both algorithms require specific knowledge and investigation on the test target. Also, as the injection attacks are usually conducted for certain purposes, the injection strings are mostly in certain forms rather than in randomized formats. Thus, testing engineers often try to provide grammar rules based on brainstorming and self-innovation to concentrate on particular vulnerabilities. However, the procedure usually consumes an increasing amount of time and effort to achieve enough quality and quantity of cases. To address such labor-intensive problem with generating test patterns, past scholars worked on various creating methods [26, 27]. However, the proposals mentioned above are usually designed for a certain product or are not fully applicable to web injection issues so far.

2.2. FSM in Testing

FSM-based methodologies for software testing have been popularly utilized in hardware testing and protocol testing [28, 29]. The main idea of previous methods is to combine the state diagram and directed edges to build a graph, where the vertices denote machine states and the edges correspond to the transitions between machine states [30–32]. To traverse the graph, a number of algorithms have been presented by scholars [33, 34], including testing methods based on the shortest path, the states coverage, and the transition coverage. On the other hand, a number of scholars have also incorporated regular expressions for systemized test generation [35, 36].

Motivated by the vast number of studies using FSM, regular expressions, and graph traversal for program testing, we advantage these concepts in generating test sequences not for software verification but for fuzzing strings, where methods based on FSM and structural analysis are not well investigated. We observe that the input validators of websites are usually developed by regular expressions, and creating fuzzing strings systematically and regularly is a challenging problem. In this study, we propose combining regular expressions, FSM, and graph traversal techniques to construct a systematical test pattern generator. The main advantage of our proposal is to use both FSM and the negated FSM to generate both valid and invalid test patterns. In addition, the automated generating mechanism can mimic the misdesigned scenarios of engineers. Thus, our fuzzer can accept formal expressions of input strings and automatically generate fuzzing strings and a series of weak validators. Our framework also consists of a series of fuzzing procedures, including producing injection sequences, fuzzing targets, and filtering and identifying potential vulnerabilities. According to our methods, we assert that our proposal could help the state-of-the-art fuzzing to reduce both mental efforts of developers and time consumption of conducting large-scale webpage injection testing.

3. Fuzzing Framework for HTML5

In this section, we present our design of a fuzzing framework for HTML5 websites. We concentrate on the input attributes because they are the most common channels for users to communicate with web servers by using forms and input elements.

To construct a smart fuzzing framework for HTML5, we start by investigating the newly defined input attributes in HTML5. Even though more types of input attributes are supported in HTML5 and well implemented in popular browsers, potential injection could be conducted if the crackers do not follow the corresponding user interface to enter values. A practical attack is via sending HTTP requests.

To detect such weakness of these websites, our framework consists of five major modules: (1) a webpage traversal to visit all available pages of a website, (2) a webpage analyzer to investigate entries that would be channels for injecting websites, (3) a test pattern generator to automatically produce test inputs corresponding to the attribute of each input element, (4) an intelligent injector for conducting fuzzing using HTTP requests and collecting responses from servers, and (5) a result filtering module to aggregate and filter the large-scale results from fuzzing. The infrastructure of our system is shown in Figure 1. We describe each component in detail as follows.