Dex Bytecode#
Dex (or DEX, throughout this section), short for Dalvik Executable, is an object container for Dalvik code. It is Android's equivalent of ELF, PE, COFF, etc. containers for native code.
The primary dex file of an app is named classes.dex, located in the app's root folder. (This file is present in the vast majority of apps, although it is optional: apps can be purely native.)
DEX splitting#
Additional DEX files may be present: classes2.dex, classes3.dex, etc. The reason behind code splitting is a Dalvik VM legacy limitation called the "64K reference limit": many items present in a DEX file are referenced by an id stored on a 16-bit integer, e.g. it is the case for methods and fields references. To overcome the limitation, compilers such as d8 (or its predecessor dx) split the code over additional DEX files classesN.dex, where N>=2.
Additional references are created to reference definitions located in other DEX files. JEB merges all classesN.dex in a single, virtual DEX unit. Note that in practice, such DEX units could not be converted back to a single DEX file. Also keep in mind that apps may artificially split their code over multiple DEX files; the 64Krefs limit is not a hard requirement.
This app was split over 7 dex files:
Warning
On pre-API 21 (Android 5) systems, it is the responsibility of the app to load additional files - i.e., the DEX splitting mechanism is not something baked into the Dalvik VM itself. Apps can extend the support class MultiDexApplication to avoid implementing their own DEX loader, as the vast majority of the apps do. However, keep in mind that this is in no way mandatory. Malware files or protected files can implement multi-dex loading facility however they see fit.
On API 21 and above, with the advent of the new Android ART runtime, files named classesN.dex are scanned and pre-compiled along with classes.dex. However, this mechanism does not preclude apps to use additional DEX loading facility as well.
DEX execution#
There are two general types of dex files:
- regular dex files contain generic code, use standard Dalvik instructions, meant to run on all Android devices
- odex files, on the other hand - a generic term for "optimized DEX" - contain device specific instructions.
Optimized dex (odex)#
The DEX file(s) located in your app is not the code executed on device - except when debugging. DEX code is executed by a runtime:
- The legacy runtime (pre-API 21 Lollipop) uses a JIT (just-in-time) compiler and generate odex files on first run.
- The current runtime is named ART (short for Android Runtime) and makes use of AOT (ahead-of-time) compilation of Dalvik to native (x86 or ARM) at install time
Note
The format of optimized DEX files has evolved over time ("dey" magic, OAT files with DEX or DEX-like entries, VDEX and CDEX, etc.). The process of reconstructing a DEX from an optimized-DEX is systematic and implemented in several tools, such as baksmali deodex or vdexExtractor. Refer to additional references, such as Lief's notes on OAT and ART, for more information on odex.
Link: List of odex instructions.
DEX format#
This section quickly summarizes important facts about the DEX format. Refer to the official specifications for additional details:
The format can be linearly represented as:
- In Header: DEX magic, DEX version:
dex\n0NN\0where NN=- 35: up to Android 7-
- 37: Android 7,
invoke-virtualandinvoke-superaccept interface methods ids (support for Java 8'sdefaultmethods) - 38: Android 8, added
invoke-polymorphic,invoke-custom, call sites and method handles entries (details) - 39: Android 9, added
const-method-handle,const-method-type(details)
- Tables are ordered alphabetically and do not allow duplicates
- Note that the map section was purely redundant until DEX 38 and the introduction of method handles and call sites.
The three links above cannot be overlooked: any Android reverser should strive for Dex and Dalvik proficiency. That being said, below is a list of lesser-known or overlooked details about about Dalvik:
- Strings are encoded using a variant of CESU-8 called MUTF-8 (modified UTF-8)
- 1- 2- 3-byte encoding (whereas UTF-8 allows up to 4-byte)
- Surrogates: 2x3-byte for chars \u010000 to \u10FFFF (whereas canonical representations of UTF-8 does not use surrogates; UTF-16 does)
- \u0000 is encoded as \x00\x00 (whereas UTF-8 uses \x00)
- Special byte \x00 indicates string end (there is no EOS concept with UTF-X)
- Some 32-bit integers are encoded using the variable encoding scheme LEB128 and its variants
- Types use Strings: type definition= index into string pool
- Prototypes use Strings and Types
- Shorty definition= index into string pool
- Full prototype definition = list of indices into type pool
- The Dalvik bytecode is stored in Code items
- Call Sites and Method Handles were introduced in DEX 38
- The DEX header remained unchanged and does not directly reference those pools; instead, they are referenced in the Map area (which largely remained unused until those items were introduced)
- Learn more about DEX 38 on our blog
Dalvik#
Dalvik is the name of the low-level bytecode stored in DEX files. Dalvik bytecode is interpreted by a Virtual Machine (DVM).
Generation:
- Source language: smali (low-level), Java (high-level), Kotlin (very high-level)
- Java -> javac -> classfiles (Java bytecode) -> dx/d8 -> classes.dex (Dalvik bc)
Characteristics:
- Register-based machine:
- 65,536 32-bit registers, numbered
v0tov65535 - 65,535 64-bit registers, "emulated" by using consecutive 32-bit registers [v0,v1], [v1,v2], ..., [vN,vN+1]
- No "special" register is accessible: no flag register, no PC register, no current-frame register, etc.
- Fixed frames (stack is N/A, no stack pointer), size declared in Code items
- Pointer= object reference ~= fits on a single register (32-bit)
- 65,536 32-bit registers, numbered
- Regular instructions range from 2 to 10 bytes (=1 to 5 words)
- Instruction opcode encoded on a single byte; the second byte of the first word is generally used to encode register indices
nop(1w),const-wide v1, 0x1122334455667788L(5w)
The generally accepted convention is to represent Dalvik disassembly in smali or a variant of smali. By default, JEB uses a variant of Smali, slightly less verbose (more readable and better suited to be displayed and manipulated in an interactive UI).
- Method bodies live in isolation, the concept of "jump far" (unstructured dispatch) does not exist. Dispatching execution to other methods is done via invoke-xxx instructions only
- Jumps are always relative to the current PC
- Retrieving the returned value of a function is done via a move-result-xxx instruction, located right after the invoke-xxx instruction
- Arithmetic instructions have no side-effects / there is no flag register
- Data in bytecode is legal:
- Immediates: Some instructions store literals inline (i.e., within the instruction code), e.g.
const-xxx - N-way branching instruction:
switch-xxx: the jump table is stored within the bytecode - Small array initialization:
fill-array-data: array data is stored within the bytecode. (Note that array data payload may be used by more than one fill-array-data instruction.)
- Immediates: Some instructions store literals inline (i.e., within the instruction code), e.g.
Calling convention#
The DVM runs managed code and uses a "no side effect, no cleaning" calling convention: every function gets a clean register slate upon execution; the parameters are stored at the bottom of the declared frame.
Registers are 32-bit wide and noted vX, 0-indexed. The alternate notation pX is used to address registers used to store input method parameters: its indexing start from frame_size - input_slot_counts.
Example 1:
- Method:
void foo(int a, char b, bool c, Object d) - The CodeItem declares a frame of size 5
v0
v1 <- parameter 0: p0 (a)
v2 <- parameter 1: p1 (b)
v3 <- parameter 2: p2 (c)
v4 <- parameter 3: p3 (d)
------- end of method frame
v5
v6
...
v65535
Example 2:
- Method:
void bar(double a, long b, float c) - The CodeItem declares a frame of size 8
v0
v1
v2
v3 <- parameter 0: p0 (a)
v4 <- parameter 1: p1 (b, lower part)
v5 <- parameter 1: p2 (b, higher part)
v6 <- parameter 2: p3 (c, lower part)
v7 <- parameter 2: p4 (c, higher part)
------- end of method frame
v8
v9
...
v65535
The default settings instruct JEB to use the pX notation when rendering parameter registers:
It can be disabled (DEX plugin option, also controlled in the UI by right-clicking, Rendering Properties, untick 'Use p for parameters')
Smali and variants#
The JEB notation is made possible because of the interactivity layer (as opposed to deadcode listing). Two notable differences:
- For readability, the names are simple names, no longer fully-qualified
- Invoke opcodes place the arguments after the method:
invoke-xxx callsite, argsinstead ofinvoke-xxx {args}, fully_qualified_callsite
Below, the default assembly code representation used by JEB (smali variant):
Official smali code can be generated, which is a useful if it needs to be exported and later on compiled using smali.jar. Also make sure to disable the "Show Addresses" option.




