Abtract
This research explains how the automated malware analysis techniques, host identity-based encryption (HIE), and instruction set localization (ISL) can help make the analysis more effective, less time-consuming, and effective. They can help solve malware analysts' challenges, making it more effective than traditional methods like an operational model. Since malware attacks have been increasing over time and becoming more complicated, automating malware analysis improves efficiency, ensuring malware attacks are avoided at all costs and securing an organization's networks. The background section introduces the techniques, HIE and ISL, while the literature review section presents scholarly evidence on how these two techniques work and defends the push for the automated malware analysis model. The analysis section also presents the two techniques, explains why they are applied, and explains why they are more practical than the traditional malware analysis methods. The subsequent sections summarize the research findings, present its limitations, suggest future work, and conclude.
Impeding Automated Malware Analysis
Malware analysis refers to the process of understanding how various malicious programs behave (Gandotra et al., 2019). The information gathered through this process is useful in detecting similar malware, repairing damaged systems, and dismantling malicious infrastructure. However, as Kara (2019) reveals, these activities compromise the cyber criminals' profit model, forcing attackers to develop techniques to prevent malware from being analyzed. Song et al. (2012) reveal how defenders have developed various automated malware analysis techniques to solve the reliability problem resulting from the rapid growth of malware attacks. These techniques focus on countering the cyber attackers' malware analysis resistance, although neither side has claimed an absolute advantage. This research paper examines how automated malware analysis can be made more effective and unscalable by using two indistinct techniques, host identity-based encryption (HIE) and instruction set localization (ISL). One of the motivations of this research is that these techniques are not limited to the existing analysis environment but rather focus on resisting any potential analysis technique that may occur in the future. As revealed in the literature review section, various studies have been completed using HIE and ISL techniques, proving their advantages and disadvantages and how they can be implemented to control malware analysis.
Additionally, various adjustments can be made to the two techniques to help them overcome the challenges involving host IDs that combine host and network identifiers. The contribution of this research is that it will ensure that the techniques are implemented, and the information is followed efficiently. One resource that can be used in this case includes the prototype design that will help explain and test ideas on the two techniques that will be the center of the study (Lauff et al., 2019). Through the resource, designing the HIE and ISL techniques might easily assist in preventing automated malware analysis.
Background
Malware analysis is significant as it helps defenders understand how different malware functions and develop defenses to protect an organization's network. Recently, many malicious samples have been distributed, increasing the demand for automated malware analysis techniques (Kara, 2019; Mohaisen et al., 2015). Therefore, malware analysis is critical in cybersecurity, as security analysts must scrutinize a suspicious file to confirm whether it is legitimate or malicious. The analysis also helps the respondents reduce false positives and contemplate the extensive extent of a malware incident. As Song et al. (2012) claim, early software obfuscation techniques like packing makes the static analysis more complicated and help deal with omitted issues about what a defender could learn by running a program (Sujyothi & Acharya, 2017). Later, analyzers introduced the virtual machine-learning obfuscation to prevent static analysis on dumped memory images like unpacked codes. Although these techniques have not rendered static analysis unfeasible, Song et al. (2012) claim that they do not pose any substantial risk to dynamic analysis. Realizing this flaw, malware artists have recommended integrating dynamic analysis detection instead of static analysis (Sujyothi & Acharya, 2017). For instance, some malware samples aim to detect whether they are being debugged, while others monitor whether execution occurs in a virtual environment. Malware instances do not exhibit any malicious behaviors should they be detected successfully (Sujyothi & Acharya, 2017). Although these techniques have succeeded in some instances, they are still open to mitigation since scientists can develop new techniques to make their analysis atmosphere look normal.
The two proposed automated malware analysis techniques in this research, HIE, and ISL have been proven effective in solving two major challenges malware scientists face. These challenges include (1) the difficulty of effectively distinguishing between an analysis environment and a production environment and (2) the challenge of hiding high-level (such as system call and network) behaviors from dynamic analysis (Song et al., 2012). The motivation behind overcoming these two challenges is based on the fact that HIE and ISL techniques' effectiveness ought not to be limited to the existing analysis environment or mechanisms. They should manage to resist any potential analysis technique in the future. As Song et al. (2012) argue, HIE and ISL techniques focus on preventing analysts from effectively analyzing the malware with ubiquitous automated means instead of blocking them from understanding any malware software (Song et al., 2013). And since analyzing an organization's new malware samples created daily is unsustainable, this research project focus on beating automated malware analysis.
Literature Review
Song et al.'s (2012) article explained why the techniques in question, HIE, and ISL, can be applied effectively and is significant in understanding how the malicious programs behave. The authors first explain the operational model's main problem and how these two techniques will effectively solve them. For instance, one of the problems that the decade-old operational model faces is capturing and analyzing malware in two different environments (Song et al., 2012). Thus, rather than detecting the analysis environment in which the automated malware analysis systems were made, HIE and ISL assume a different analysis environment from the original one (d'Antoine et al., 2017). Song et al. (2012) also explain how the ISL technique helps solve major shortcomings like fault tolerance and network identifier generation. Therefore, the author's article plays a significant role in defending the use of HIE and ISL techniques and explains why automated malware analysis is better than the traditional one. It also explains why malware threats have increased, calling for immediate intervention.
On the same note, Mosli et al. (2016) explain how increased organizational digital activities have caused malware threats to growing continually. The authors also confirm that traditional malware detection activities depend on scanning signatures in malware samples or databases. However, due to the rapid growth of malware today, traditional signature scanning methods have become ineffective, as extracting malicious samples has proven to be very labor-intensive (Mosli et al., 2016). The authors explain that the traditional approach is only effective for known malware, whose signatures have already been extracted. They also explain that malware can be detected according to behavior, where different samples are run and their runtime actions observed. While this method helps detect unknown malware, the approach is usually obstructed by samples using the virtual-machine evasion technique (Mosli et al., 2016; d'Antoine et al., 2017). Moreover, running each sample is not sustainable, as it consumes more time and computer resources. Therefore, the authors argue that automated malware analysis is more accurate and faster in solving the problems associated with these two approaches.
Leach et al. (2019) explain how the generic improvement (GI) helps in software improvement, especially automating the repair of bugs and vulnerability analysis, as well as software refinement to boost performance. The authors argue that although the GI-based approach has succeeded in minimizing or eliminating vulnerability, making benign software more secure, the number and complexity of malicious software have increased, insisting on the need for better analysis techniques. Therefore, Leach et al. (2019) insist that GI could be applied to understand malicious code behavior instead of using the GI approach to improve individual software artifacts. Genetic improvement can improve their effectiveness and speed if applied in automated malware analysis procedures. More so, malware attacks have intensified in the recent past, eroding the users' and organizations' privacy and trust in computer systems (Leach et al., 2019). Although manual and automated techniques can be utilized to understand the behaviors of malware samples, more attention is needed since malware usually has evasion techniques to avoid automated analysis. Therefore, it is relevant for the HIE and ISL techniques to consider and counter malware evasion tactics to increase malware analysis outcomes.
Kaur et al. (2017) distinguish statistical malware analysis from dynamic analysis, where the former focus on analyzing a source code to determine its functionality. In contrast, dynamic analysis executes a source code in a sandbox environment. The authors claim that sandbox frameworks help analysts run malware files in isolated environments, ensuring they do not infect the legitimate files (Kaur et al., 2017). The authors analyzed malware samples using a web-based automated framework and ran them through a sandbox environment to study malware behavior and functionality. Therefore, the article is relevant in understanding how automated malware analysis works and how the malicious malware behaves. Webb (2018) states that over 12 million malicious programs have been registered annually since 2014. Due to the increased number, analysts have introduced automated malware analysis techniques, most developed through re-implementation of analysis techniques instead of automating the existing tools that use the same techniques (Webb, 2018). While creating new techniques takes more time and resources, using existing tools is more effective, as they have evolved alongside malware for years. The authors insist that using existing tools in automated malware analysis makes it more effective and efficient.
Analysis
Host Identity-based Encryption (HIE)
Before a malware instance is deployed in any system, the analysts collect adequate information to identify the system uniquely. The information is then used to derive a unique key or host ID used to encrypt various pieces of malware instances (Song et al., 2012). The malware instance gathers the same information to develop a decryption key. Therefore, decryption fails if a malware instance is executed in a different environment, and the malware sample cannot exhibit malicious behavior. According to Song et al. (2012), HIE involves encrypting the whole malware binary, an approach that protects the entire program and benefits the defender (d'Antoine et al., 2017). For instance, once the analysis systems are informed that the malware sample leverages the HIE approach, analysts use the execution fails to confirm the correctness of the host ID, making the brute-forcing of the decryption key easier.
According to Song et al. (2012), one of the advantages of the HIE technique is that it uses modern cryptography, meaning that key derivation knowledge never affects the integrity of the protection. Therefore, defenders can never unlock the malware sample unless they can guess the decryption key (d'Antoine et al., 2017). The second advantage of the HIE technique is that it assigns different decryption keys to every malware instance, meaning that intelligence gained in analyzing a particular instance has no advantage in analyzing the second one. Therefore, if the analyst uses the HIE technique to run a sample in a particular environment, they cannot expect that another malware sample can be run in the same environment.
Song et al. (2012) also argue that HIE does not make the same assumptions as DRM systems, which prevent protection bypass by assuming the highest privilege levels on a system or using special hardware. HIE is more effective than the DRM system as it prevents large volumes of malware from being analyzed within short periods (d'Antoine et al., 2017). However, since the HIE approach is insufficient in resisting forgery, Song et al. (2012) propose the combination of HIE with network-based keys, which produces the institution set localization (ISL) technique.
Institution Set Localization (ISL)
The ISL technique transforms the software into source or native machine code into byte code for an arbitrarily chosen instruction set architecture (Song et al., 2012). One advantage of this technique is that it is nonvulnerable to memory dump since the binary codes are presented in bytecode, an unknown machine language. The approach also does not pose any risks to the dynamic analysis since scientists have already discovered methods to reverse bytecode executions automatically. Song et al. (2012) propose that to eliminate the weaknesses of ISL, a similar technique to HIE could help the analysts bound the virtualized institution to a specific environment. The authors claim that the C&C server should combine HIE with the network identifiers, and the information is used to virtualize the native code representing the malicious commands. Therefore, the bytecode infected to the affected host can only run through that specific host determined by the network identifiers and forgery-resistant host.
Like the HIE approach, the ISL technique guarantees that the tasks can only be executed successfully if the infection time matches the runtime signature due to incorrect interpretation of bytecode (Song et al., 2012). Also, by determining a task's interpretation, like the combination of host and network identifies, the analysts can fully understand it. The authors argue that malware that adopts the ISL technique is more extensible, just like in the PaaS model (Song et al., 2012). Some malware instances may contain little to no information about the actual malicious tasks, complicating the behavior identification. Therefore, such instances increase the resistance to tracking and analysis, and researchers and security experts should work hard to forge both the host and network-level identifiers.
Summary of the Findings
The analysis reveals how the host identity-based encryption (HIE) and institution set localization (ISL) work and how they can effectively solve malware scientists' major challenges. Based on the analysis, various adjustments can be made to the two techniques to help them overcome the challenges involving host IDs that combine host and network identifiers. HIE involves encrypting the whole malware binary, an approach that protects the entire program and benefits the defender. one of the advantages of the HIE technique is that it uses modern cryptography, meaning that key derivation knowledge never affects the integrity of the protection. Unlike DRM systems, which prevent protection bypass by assuming the highest privilege levels on a system or using special hardware, HIE does not make such assumptions. On the other hand, the ISL technique transforms the software into source or native machine code into byte code for an arbitrarily chosen instruction set architecture. Similar to the HIE approach, the ISL technique guarantees that the tasks can only be executed successfully if the infection time matches the runtime signature due to incorrect interpretation of bytecode. Based on the analysis, the malware that adopts the ISL technique is more extensible, just like in the PaaS model. Therefore, both ISL and HIE automatic malware analysis techniques are very effective.
Limitations and Future Works
This research project takes a qualitative approach as a literature review, which has been proven to have various limitations like the limited outcome, improper representation of the target population, and no data analysis. More quantitative or mixed methodologies can be applied to monitor the effectiveness of the two techniques, ensuring they are more effective in analyzing malware samples. Therefore, more data-driven research could be conducted to ascertain the recommendations made in this report. In the future, a more multi-layer approach should be created to enhance accuracy while minimizing false positives. Also, more research should be conducted to make the host identity-based encryption (HIE) and institution set localization (ISL) resistant to malware evolvement in the future.
Conclusion
Based on the literature review and the analysis, the HIE and ISL techniques are more effective than the traditional methods of malware analysis. The research explains precisely how the two techniques can be applied and how they are practical in solving the challenges that malware experts face. More so, the automated malware analysis techniques are more efficient than the traditional models like the operational model in understanding how malware behaves, yet they are also less time-consuming. The research also explains the difference between static and dynamic analysis and the need to conduct malware analysis in different environments. The analysis also explains how various adjustments can be made to the HIE and ISL techniques to overcome the challenges involving host IDs that combine host and network identifiers. HIE involves encrypting the whole malware binary, an approach that protects the entire program and benefits the defender. The ISL technique transforms the software into source or native machine code into byte code for an arbitrarily chosen instruction set architecture. Like the HIE approach, the ISL technique guarantees that the tasks can only be executed successfully if the infection time matches the runtime signature due to incorrect interpretation of bytecode. HIE does not make the same assumptions as DRM systems, which prevent protection bypass by assuming the highest privilege levels on a system or using special hardware.
References
d'Antoine, S., Blackthorne, J., & Yener, B. (2017, November). Out-of-Order Execution as a Cross-VM Side-Channel and Other Applications. In Proceedings of the 1st Reversing and Offensive-oriented Trends Symposium (pp. 1-11).
Gandotra, E., Bansal, D., & Sofat, S. (2019). Malware intelligence: Beyond malware analysis. International Journal of Advanced Intelligence Paradigms, 13(1-2), 80-100.
Kara, I. (2019). A basic malware analysis method. Computer Fraud & Security, 2019(6), 11-19.
Kaur, G., Dhir, R., & Singh, M. (2017). A stress-testing web-based framework for automated malware analysis. Journal of Information and Optimization Sciences, 38(6), 937-944.
Lauff, C., Menold, J., & Wood, K. L. (2019, July). Prototyping canvas: Design tool for planning purposeful prototypes. In Proceedings of the Design Society: International Conference on Engineering Design (Vol. 1, No. 1, pp. 1563-1572). Cambridge University Press.
Leach, K., Dougherty, R., Spensky, C., Forrest, S., & Weimer, W. (2019, May). Evolutionary computation for improving malware analysis. In 2019 IEEE/ACM International Workshop on Genetic Improvement (GI) (pp. 18-19). IEEE.
Mohaisen, A., Alrawi, O., & Mohaisen, M. (2015). AMAL: high-fidelity, behavior-based automated malware analysis, and classification. computers & security, 52, 251-266.
Mosli, R., Li, R., Yuan, B., & Pan, Y. (2016, May). Automated malware detection using artifacts in forensic memory images. In 2016 IEEE Symposium on Technologies for Homeland Security (HST) (pp. 1-6). IEEE.
Song, C., Royal, P., & Lee, W. (2012, August). Impeding Automated Malware Analysis with Environment-sensitive Malware. In HotSec.
Sujyothi, A., & Acharya, S. (2017). Dynamic Malware Analysis and Detection in Virtual Environment. International Journal of Modern Education & Computer Science, 9(3).
Webb, M. S. (2018). Evaluating tool-based automated malware analysis through persistence mechanism detection (Doctoral dissertation, Kansas State University).