DSLuaDecompiler is an advanced Lua decompiler focused on decompiling Lua scripts found in various FromSoft games for modding purposes. A heavy emphasis is placed on decompilation accuracy, with the eventual goal being to get a matching decompilation for every script found in every modern FromSoft game released (which includes Dark Souls, Bloodborne, Dark Souls 3, Sekiro, Elden Ring, and Armored Core 6). The decompiler primarily targets Lua 5.0, which is used for AI scripts in almost all FromSoft games, and HavokScript, which is used for character control logic in Bloodborne, Dark Souls 3, Sekiro, and Elden Ring. Support for other Lua versions are generally out of scope for now with the exception of early support for Lua 5.3, which is used to implement UI logic in Super Smash Bros Ultimate.
Currently, there isn't an official release outside of some builds posted to the FromSoft modding discord server, but rapid progress is being made at accurately decompiling all the Lua files in the games and a release will be made when there's confidence that all Lua files are being decompiled in a functionally accurate way.
DSLuaDecompiler is written from scratch and aims to address many of the limitations of existing Lua decompilers and to be able to perfectly decompile extremely complex scripts with lots of local variables and complex control flow. Unlike other decompilers which tend to use a single stack based transform pass that attempts to undo the compilation steps directly, DSLuaDecompiler uses a variety of generic passes and analyses that operate on an intermediate representation that resembles an AST (abstract syntax tree) for Lua source code. Each Lua opcode in the bytecode gets transformed to one or more "instructions" in the intermediate representation, and from there the IR gets continually transformed over multiple passes until it represents the AST for a lua program and can then be printed in source form. The usage of an intermediate representation allows for multiple frontends that process different bytecode formats (such as different versions of Lua and HavokScript) and allows for reuse of the vast majority of the passes.
The intermediate representation is at the heart of the decompiler and is designed to represent Lua programs through all stages of decompilation including the final AST that is used to output the decompiled source. The IR supports multiple representations of control flow and data flow in support of all the different passes used for decompilation. The IR starts with simple jump label based control flow, but the IR will get transformed to a control flow graph structure and then finally an AST for the final output. The IR also supports being transformed into an SSA (single static analysis) form which is used for recovering complex nested expressions and determining temporary and local variables. The IR aims to be as generic as possible (within reason), but nodes within the IR may need to preserve certain details from the opcodes the IR is generated from in order to decompile perfectly.
Lua is decompiled on a function by function basis and generally speaking each function is self contained and data from other functions don't need to be referenced to decompile a specific function. UpValues are used for lambda functions to be able to reference bound variables from the parent function, and the names of these up values can be resolved late in the decompilation process. Each function contains a set of basic blocks as well as information on the function arguments and registers used in the function.
In compiler theory, a basic block is a single linear set of instructions that execute one after another with no possibility of branching. In other words, it is not legal for an if statement or a conditional branch instruction to be in the middle of a basic block, as that would mean that not all instructions are guaranteed to execute in order within the basic block. Basic blocks are linked together in a directed graph known as a control flow graph (or CFG). If the last instruction in a basic block is a branch or conditional branch instruction, the potential paths control flow can go are represented as successor basic blocks in a control flow graph. This allows the decompiler to analyze the program in terms of its possible control flow, and this analysis is critical in being able to detect and recover higher level control flow constructs such as if statements, loops, loop breaks, and conditionals.