Decoding Obfuscated Strings in Adwind

From the latter half of 2015 to 2016, there have been an increasing number of cyber attacks worldwide using Adwind, a Remote Access Tool [1]. JPCERT/CC also received incident reports about emails with this malware in its attachment.

Adwind is malware written in Java language, and it operates in Windows and other OS as well. It has a variety of functions: to download and execute arbitrary files, send infected machine information to C&C servers and extend functions using plug-ins.

One of the characteristics of Adwind is its frequent updates. In an extreme case, an update was released merely in a two-week interval. When investigating Adwind-related incidents, it is important to correctly examine the functions of the Adwind version in use.

The challenge is, however, the strings stored within Adwind, which count up to about 500, are artfully obfuscated, and they need to be decoded in order to analyse the malware’s function. JPCERT/CC created a tool “adwind_string_decoder.py”, which efficiently decodes such obfuscated strings. This blog article describes how this tool works.

Although Adwind has multiple generations [1], this blog article and the tool created will examine the new Adwind versions which have been used in recent attacks since the latter half of 2015.

Obfuscated strings

Most of the strings that Adwind has are obfuscated as in Figure 1. The number of such strings differs depending on the Adwind’s version, but there are about 500. These strings look totally different in each Adwind version.

Figure 1: Decompiled codes containing obfuscated strings
1_obfuscated
Figure 2: Decompiled codes in another Adwind version
2_obfuscated_old

Figure 1 and Figure 2 describe codes which correspond to the same process. Both figures have line feeds inserted and are indented so that the decompiled codes can be read easier. Figure 2 is the Adwind version which was seen in mid-August 2015 and Figure 1 in late November 2015. As previously mentioned, Adwind gets updated frequently, and furthermore, the codes are obfuscated using different keys for each version.

The red-marked sections in Figure 1 and 2 indicate part of the process for collecting information to be sent to C&C servers, and contain a process for obtaining infected usernames. However, when calling the process for collecting information, the object (which is the username) is specified in the obfuscated strings. This makes it impossible to tell from the codes that the malware intends to collect infected usernames, unless they are decoded. Furthermore, it is difficult to decode the strings since the keys used for obfuscation are scattered and different for each Adwind version (as described later).

In incident analysis, damage caused by the infection and the next attack sequence can sometimes be predicted by specifying the information sent to C&C servers. From the analysis perspectives, it is important to closely examine what kind of information the malware is targeting. For this purpose, static code analysis has to be effectively conducted, and about 500 obfuscated strings for various Adwind versions need to be quickly decoded.

Analysing the string-decoding process

Generally in stream cipher, in order to encrypt data m (with a random length), usually a pseudo-random number sequence k (with the same length as m) is created using its encryption key, and an encrypted string is generated by m XOR k. By combining the XOR for m XOR k and the pseudo-random number sequence k (which is described as (m XOR k) XOR k) using XOR operation, the string is decrypted to derive the plaintext m.

Obfuscation in Adwind is conducted in a method which is similar to the abovementioned stream cypher. However, Adwind creates k in a different method, not from an encryption key. In this article, k for Adwind is referred to as an “obfuscating key”.

Codes in Adwind contain some functions which take an obfuscated string as an argument, and returns its decoded string. These functions are referred to as Fi hereafter. One thing to note here is that Fi returns different results even for the same input, if the caller method is different. This means that, in order to do static analysis and obtain the obfuscating key corresponding to a certain string, it is necessary to understand which Fi processes the strings, as well as in which method the string exists to call for Fi. The following describes what kind of obfuscating key Fi generates to process decoding.

Fi takes its caller’s method name and class name as the basis, and generates an obfuscating key by giving them a transform process derived from the following factors. This process is repeated until it gains a certain length required for a key.

Factor 1: Which comes first when concatenating the method name and class name

Factor 2: The value used in the operation for transforming the basis string to a completely different string

All Fi consist mostly of the same codes, however, only those corresponding to the above factors have different codes in each Fi. Furthermore, Factor 2 does not exist in the code as an immediate constant, and is derived through obfuscated codes including bit-operations.

Adwind contains at least 5 varieties of Fi and about 60 methods including obfuscated strings, which means that it has a combination of about 100 obfuscating keys. Although Fi consists of relatively simple codes, this number makes it fairly difficult to remove obfuscation.

Additionally, the two factors mentioned above vary in each Adwind version. Therefore, even if we create a decoding tool for a certain version of Adwind, it cannot be applied to other versions.

On the other hand, Fi has the following characteristics in common:

- Has one argument of a string object

- Is a static function that returns a decoded string as a string object

- Contains a certain API call to obtain the caller’s information

- Has limited varieties of instructions within the function

Based on the features, JPCERT/CC created a tool to automatically decode obfuscated strings using a method which does not rely on the Adwind version as much as possible.

adwind_string_decoder.py

This tool is available on GitHub. Feel free to download for your use.

JPCERTCC/aa-tools - adwind_string_decoder.py

https://github.com/JPCERTCC/aa-tools

In order to use adwind_string_decoder.py, a disassembler, javap, is required which is included in JDK (Java Development Kit). Users are required to set a path to javap, or configure so that the environment variable JAVA_HOME is pointed to the JDK folder.

adwind_string_decoder.py basically processes in the following sequence:

  1. Open the selected jar file and call a disassembler
  2. Scan all the disassembled codes, and extract functions which seem to be decoding functions from the arguments and types
  3. Judge if it really is a decoding process from the kinds of instructions and sequences that appear in the function
  4. If it is a decoding process, derive Factor 1 and 2 to generate obfuscating keys
  5. Scan all the codes again and extract parts which call for the decoding process
  6. Derive each method name and class name, and use them as the basis for obfuscating keys
  7. Generate obfuscating keys and decode the strings

Before using adwind_string_decoder.py – Unpacking Adwind

Typically, Adwind is packed, and its main jar file is hidden in the artifact’s jar file. Since adwind_string_decoder.py does not have the function to unpack Adwind, users are required to run Adwind in an analysis environment beforehand, and extract the jar image that appears in its memory. The jar image tends to disappear easily from the memory, however, it could be easier to extract it if you set a breakpoint in the API which reads the jar file, by using a Java debugger (e.g. jdb).

Executing adwind_string_decoder.py

To decode obfuscated strings, select the unpacked jar file and output file, and execute as follows:

python adwind_string_decoder.py sample.jar output.jasm

Then it outputs disassembled codes which contain decoded strings as comments, as in Figure 3.

Figure 3: Disassembled codes with some decoded strings inserted
3_decoded_disassembly

Also, if you execute without any output files, the output of the disassembled codes will be omitted, and you can output decoded strings only to the standard output, as in Figure 4.

python adwind_string_decoder.py sample.jar
Figure 4: Output of decoded strings only
4_decoded_strings

It is also possible to scan the java codes (outputs of the decompiler), and replace the function call and argument with the decoded string. This option only supports codes in Fully Qualified Name (FQN) format. For example, you can obtain the output in Figure 6 from codes as in Figure 5. Since adwind_string_decoder.py does not have a decompiling function, you need to output a file with the decompiler and store it in a folder beforehand. After selecting that folder and a new folder for outputting the decoded file, execute as follows:

python adwind_string_decoder.py sample.jar source_folder output_folder
Figure 5: Decompiled codes before decoding
5_obfuscated_code
Figure 6: Decompiled codes after decoding
6_decoded_code

Using these decoded strings, it is easy to understand what kind of information the malware intends to collect and send to C&C servers. It is also possible to find out which OS can be infected by the malware, and how the Adwind functions may differ in each OS.

Summary

Since early February 2016, attacks using these new Adwind versions have been less and less seen. This may be good news, however, it seems that there are still new samples found at the end of February 2016. We hope that the tool we introduced here would be of your help in case if you see any new versions of Adwind.

- Kenichi Imamatsu

(Translated by Yukako Uchida)


Reference:

[1] Adwind: FAQ - Securelist

https://securelist.com/blog/research/73660/adwind-faq/

Appendix

SHA-256 Hash value of the sample

  • 033db051fc98b61dab4a290a5d802abe72930338c4a0dd4705c74eacd84578d3
  • f8f99b405c932adb0f8eb147233bfef1cf3547988be4d27efd1d6b05a8817d46

 

Back
Top
Next