Decompilation to Java#

Note

Make sure to read the generic Decompilation section of the manual.

Java 18+

Using JDK 18+? Make sure to append -Djava.security.manager=allow to your jvmopt.txt file (more info in the FAQ).

Using JDK 21+? Make sure to append --add-opens java.base/java.lang=ALL-UNNAMED to your jvmopt.txt file (more info in the FAQ).

Decompiling classes#

By default, the Decompile action in the UI menu triggers the decompilation of an entire class and its constituents (fields, methods, member classes, etc.).

  • Fresh decompilation:
    • With the current options set up in your Engines context: Use Action, Decompile or press the Tab key.
    • With custom options: use Action, Decompile with Options or press the CMD1+Tab.
  • Re-decompile (e.g., after changing options)
    • Execute a "Decompile with Options" action as described above. The current decompilation of the class, if it existed, will be discarded, and a new decompilation will take place.

To decompile a single item (e.g. a single method), perform a "Decompile with options" action and untick the "Decompile top-level container class".

Generic decryption and deobfuscation#

The decompiler attempts to automatically performs data decryption and code unreflection. This process is done automatically by several IR optimizers managed by the decompiler, with the help of code emulation in a built-in sandbox.

Sample malware code decompilation: light-blue methods have been unreflected; Purple strings are the result of generic decryptions

The emulator and sandbox are available in the IR API offered by dexdec (refer to the IDState interface). Several key parameters of the emulator can be customized via the coreplugins/dexdec-emu.cfg file.

  • Copy the file dexdec-emu.cfg.TEMPLATE to dexdec-emu.cfg
  • Edit dexdec-emu.cfg
  • The changes will take effect at the next decompilation

For example, the configuration file allows users to specify:

  • maximum emulation times
  • emulation policy for external methods (by groups, restricted lists, whitelists, and blacklists - e.g., a user can forbid the emulation of any time/date-related method)

Note

Data decryption combined with unreflection yields very effective results against most classes of Dalvik obfuscators. You will find examples of this on our blog.

Java 18+ and the SecurityManager

The dexdec sandbox relies on Java's SecurityManager, a standard JDK component that was deprecated and disabled by default in Java 18. If you must use JDK 18+, make sure to append -Djava.security.manager=allow to your jvmopt.txt file.

Additionally, if you use JDK 21+, you will need to append --add-opens java.base/java.lang=ALL-UNNAMED to that file as well.

CF Unflattening#

Control-flow flattening is an advanced code protection technique used to "de-structure" a method.

A flattened method consists of a dispatcher, reading a virtual program counter, tasked to dispatch the execution to a next group of instructions; those instructions will perform some of the original method's work, as well as update the virtual program counter and loop back to the dispatcher. In effect, protected methods appear like overly large switch statements. The original structures (loops, inner loops, conditionals, inner conditionals, etc.) are gone; the method appears to be flat.

JEB attempts to detect and restructure such methods. CF unflattening relies on many heuristics and has limitations. It is not a bullet-proof optimizer. If you encounter problems, you may disable that obfuscator in the options, and/or report the problem.

Unvirtualization#

Code virtualization is one of the most advanced code protection technique currently available in protection software. JEB attempts to "de-virtualize" (regenerate and decompile) virtualized methods.

Please refer to this blog post for additional details: Reversing and Android app Protector, Part 3 - Code Virtualization.

Virtualized method (left), and the resulting unvirtualized output (right)

Removal of side-effect-free and context-insensitive code#

The DEX plugin attempts to identify methods that are side-effect-free (SEF: they do not write to the program state in any meaningful ways) and context-insensitive (CI: they do not read the program state in any meaningful ways). Invocation of such methods may be safely discarded by the decompiler. The presence of such code may be the result of obfuscation (insertion of calls to dummy methods used to complicate the control flow) or simply the result of a release build discarding pieces of code. The typical example is logging code, whose methods' bodies may have been removed during a release build process, while invocations remain.

By default, SEF/CI calls may be discarded during the IR optimization phase. However, you may want to customize this process and force-keep or force-delete some calls, based on specific analysis needs for a given app. To do that, go to the Android menu, Context Information Database handler, and customize the entries as explained in the documentation. (This buffer maps to the ContextInfoDb property of the DEX plugin.)

Editor widget for the Context Information Database

Unsafe optimizers#

The decompiler uses two types of optimizers: regular optimizers, deemed safe; and aggressive optimizers, which may be unsafe in some scenarios. All emulation-based optimizers are deemed unsafe. Unsafe optimizers can be globally disabled in the options. However, they're better left enabled in the general case, especially if the code has been protected. There may be cases where users would like to selectively disable unsafe optimizers for a specific method only. To do that, simply add a method comment (method declaration line): [no-unsafe-opt]. If the method has been decompiled already, you will have to force a redecompilation.

Deobfuscators and deobfuscation score#

Some IR optimizers are tagged as DEOBFUSCATOR. Those optimizers are specifically designed to clean protected code. When they perform successfully on a method, a score is calculated. A "deobfuscation rating" is generated from the final score. That rating is visible in the Java output, and can be a precious indicator to quickly spot methods that were more protected than others.

Deobfuscators are not necessarily unsafe optimizers (refer to the section above). However, they may generate code that radically differ from their corresponding low-level machine code. Deobfuscators can be globally disabled in the options. Some (e.g. CF Unflattener, Unvirtualizer, etc.) can be selectively disabled in the options.

Deobfuscation ratings

Exceptional control flow#

The decompilation of code protected in try blocks (try/catch+/finally?) is enabled by default.

However, reconstruction of try-with-resources (also known as ARM, for Automatic Resource Management) is more limited. This very-high level Java construct translates into complicated, lengthy, compiler-generated optimized code.

Note

Better support for try-with-resources reconstruction is a planned addition.

Recovering try-with-resources and try-finally#

By default, JEB attempts to detect and rebuild try-finally and try-with-resources (ARM) constructs.

Those higher-level constructs can generate very complicated low-level code. At the IR level, JEB tries its best to sort and clean things out, to allow the generation of try-with-resources-finally constructs at the Java level. However, those optimizers are unsafe and can be disabled in the options.

Decompiled try-with-resources code with multi-catch clauses

Recovering enums#

Enumerations in Java are high-level constructs that translate into multiple classes and synthetic methods. JEB attempts to discover and re-sugar those enumeration artifacts into the original enum. On failure, regular classes extending java.lang.Enum will be generated.

Note

Enum reconstruction can be disabled in the Options.

Enums are great candidates for obfuscation, and most Android protectors do obfuscate them. That process destroys important synthetic fields and structures that would allow simple recovery heuristics to work. However, support should function reasonably well, even on enumeration data that was intentionally shuffled to generate decompilation errors.

Note that enumerated fields can be renamed. Renaming is done consistently over the code base, including over reconstructed switches making use of such enums.

Decompiled enums in android.arch.lifecycle. Renaming and cross-referencing enumerated constants is supported.

Custom enumerated constants should also be properly reconstructed, including:

  • Field annotations
  • Custom initializers (see below)
  • Additional methods and method overrides

In this complex enumeration, the red block shows a custom initializer. Other interesting bits are the use of overrides and custom methods, annotations, as well as default and non-default constructors.

Recovering switches#

The detection and reconstruction of switch-on-enum and switch-on-string is supported.

Reconstruction of switch-on-string can be very complicated depending on how the compiler has generated and optimized the code, and therefore, is limited to simple cases.

This successfully reconstructed switch-on-string is implemented as a double-switch idiom by dx (a sparse switch on hashCode/equals to generate custom indices i, followed by a packed-switch on i). Not all switches are implemented like this. Regular if-conditional trees may be strategically generated by optimizing compilers.

Note

Better support for switch-on-string reconstruction is a planned addition.

Member classes and arguments capturing#

Properly rendering non-trivial member classes (particularly non-static named classes or anonymous classes) is made difficult by the fact that some of their arguments are captured from the outer class(es). Properly rendering anonymous constructors, with exact argument types and position, is also challenging.

In the example below, an anonymous class initializer is used to hide string decryption code:

  • The anonymous class extends Android’s OnActivityResultListener, instantiates the object, and tosses it immediately.
  • Decryption code takes place in the initializer. Note the captured arguments from the outer container method __m: i, _b. Access to other private class fields is made via synthetic accessor calls that were re-sugared into seemingly direct field access (BA._b).

Pseudo-moot anonymous class with an instance initializer attempting to conceal string decryption code.

Lambdas generation#

By default, JEB will try to recover and reconstruct lambdas.

Desugared Lambdas#

Recovery and reconstruction does not rely on any type of metadata 1, such as special prefixes -$$Lambda$ for classes and methods implementing desugared lambdas in dex 37-.

You may therefore see constructs like this:

This DEX file contains desugared, non-obfuscated lambdas.

This DEX file contained desugared, obfuscated lambdas

API

In the above cases, the underlying Java AST may be a IJavaNew or IJavaStaticField node. This is not the case for real (not desugared) lambdas. They will map to an IJavaCall node.

Lambda reconstruction can be disabled in the options. Lambda rendering can also be disabled in the options, as well as on-demand by right-clicking a decompiled view, Rendering Options.

Lambdas options

Real Lambdas#

Lambda reconstruction also takes place when the code has not been desugared (which is rare!), i.e. code relying on dex38’s invoke-custom and invoke-polymorphic.

This DEX file contains real lambdas implemented via invoke-custom

API

Such lambdas map to an IJavaCall node for which isLambdaCall() will return true.

Breaking opaque predicates#

By default, JEB attempt to discover and simplify all predicates, including so-called opaque predicates that always resolve to true or false, and are meant to obfuscate the control-flow of the code by introducing code and code branches that will never be executed.

JEB does some of this work on its own, and other work relies external SMT solvers, such as Microsoft's Z3. More information in this blog.

This feature can be disabled in the options.

Dynamic invocation opcodes#

The translation of invoke-custom whose bootstrap method is LambdaMetafactory.metafactory(...) allows the decompiler to generate proper Java code with lambda constructs.

However, this is just one (albeit one of the most important) cases of dynamic dispatch. invoke-custom and related opcodes (const-method-handle, const-method-type) cannot be as "easily" translated into intermediate representations - and later on, AST. For that reason, those opcodes are translated to regular invocation to artificial methods.

Artificial classes in jeb.synthetic#

Classes in the jeb.synthetic package are generated automatically by the DEX decompiler:

  • InvokeCustoms contains static methods representing dynamic dispatch to a method handle's callsite done via an invoke-custom opcode: jeb.synthetic.InvokeCustoms.CallSite<INDEX>_<DynamicName>(DynamicPrototype)

  • PooledMethodHandles contains static getters of method handles stored in a DEX pool and retrieved via a const-method-handle opcode: jeb.synthetic.PooledMethodHandles.Entry<INDEX>_<MethodName|FieldName>() : java.lang.invoke.MethodHandle

  • PooledMethodTypes contains static getters of method types stored in a DEX pool and retrieved via a const-method-type opcode: jeb.synthetic.PooledMethodTypes.Entry<INDEX>() : java.lang.invoke.MethodType

Decompiling Java Bytecode#

JEB supports JLS bytecode decompilation for *.class files and *.jar-like archives (jar, war, ear, etc.). The Java bytecode is converted to Dalvik using Android's dx by default. It falls-back to using d8 if a problem occurred. Users may choose to use d8 first instead by selecting so in the Options.

The resulting DEX file(s) are processed as usual.

You may use this to decompile Android Library files (*.aar files) in JEB.

Examining the android-arch-core-runtime library