In part 2 , we replaced strlen with a closure, and it worked. Then we tried the same trick with a user function and got a segfault. The GDB backtrace pointed at DO_UCALL. We figured out that the fix is a two-file pattern: define and replace in one file, call from another.
But why? Why does the VM care what the function was at compile time? Why can’t it just look it up again at call time? And is there really no way around the two-file requirement?
This article is about understanding the Zend VM’s compilation model, and the two alternative approaches I tried before accepting the two-file pattern.
PHP Compiles First, Executes Second
When PHP encounters a file, it does two things in strict order:
- Compile the entire file to opcodes (bytecode)
- Execute those opcodes top to bottom
This isn’t line-by-line interpretation. The entire file is compiled before the first line runs. This means:
// This entire file is compiled to opcodes first.
// Then execution starts from the top.
function greet(string $name): string {
return "Hello, $name!";
}
// By the time this line EXECUTES, the compiler already decided
// HOW to call greet() — three lines below.
\FunSwap\replace("greet", function($name) { return "Yo, $name!"; });
// This call was compiled to DO_UCALL because greet was a user function
// at compile time. Our replacement happened at runtime. Too late.
echo greet("World");
The compiler saw greet was a user function. It emitted DO_UCALL. Then execution starts: greet is defined, replace swaps it to an internal function, and DO_UCALL tries to call an internal function as if it were a user function. Boom.
The Three Call Opcodes
The Zend VM has three opcodes for calling functions:
DO_UCALL: “I know this is a user function.” The handler calls i_init_func_execute_data which sets up compiled variables, initializes the op_array, and enters the VM loop. If the function is NOT a user function, it reads op_array.last_var from garbage memory. That’s our segfault.
DO_ICALL: “I know this is an internal function.” The handler calls function->internal_function.handler(INTERNAL_FUNCTION_PARAM_PASSTHRU). Simple function pointer dispatch. If the function is NOT internal, it would try to call a handler that doesn’t exist.
DO_FCALL: “I don’t know what this is, figure it out at runtime.” The handler checks function->type and dispatches to either the user function path or the internal function path. This is the safe, generic call opcode. It’s slightly slower because of the type check, but it handles both.
The compiler chooses which opcode to emit:
- If the function is known and it’s a user function →
DO_UCALL - If the function is known and it’s internal →
DO_ICALL - If the function can’t be resolved at compile time →
DO_FCALL
“Can’t be resolved” means: the function doesn’t exist in the function table yet. This happens naturally with autoloading, require(), and conditional function definitions.
The Two-File Pattern
This is why the two-file pattern works:
setup.php, define and replace:
<?php
function greet(string $name): string {
return "Hello, $name!";
}
\FunSwap\replace("greet", function($name) {
return "Yo, $name!";
});
main.php, the calling code:
<?php
require __DIR__ . '/setup.php';
// When this file is compiled, PHP looks up "greet" and finds
// our internal wrapper (the replacement already happened during
// setup.php's execution). The compiler emits DO_FCALL or DO_ICALL.
echo greet("World") . "\n";
When main.php is compiled, require has already executed setup.php, which defined greet and then replaced it with our internal wrapper. The compiler looks up greet, finds an internal function, and emits DO_ICALL. No segfault.
$ docker compose run debian php -d extension=modules/funswap.so main.php
Yo, World!
In a real PHP application with Composer autoloading, this is the natural pattern. Your bootstrap file calls \FunSwap\replace() during initialization. All application code is autoloaded later and compiled after the replacements are in place.
“But Can’t We Just…”
After figuring out the two-file requirement, I spent considerable time trying to work around it. Two approaches seemed promising. Neither worked.
Attempt 1: The Observer API
PHP 8.0 introduced the Zend Observer API, a way for extensions to hook into function calls without modifying the function table. You register an observer, and PHP calls your begin_handler before each function executes and your end_handler after.
The idea: instead of replacing the function, register an observer. In begin_handler, run our replacement logic. Skip the original function somehow.
/* The observer begin handler runs BEFORE the function body */
static void my_begin_handler(zend_execute_data *execute_data) {
/* Run replacement logic here... */
/* But how do we skip the original function body? */
/* Idea: jump the instruction pointer to the RETURN opcode */
execute_data->opline = &execute_data->func->op_array.opcodes[
execute_data->func->op_array.last - 1];
}
Set execute_data->opline to point at the last opcode (RETURN). The function body gets skipped, right?
Wrong. The Zend VM caches the instruction pointer in a CPU register-allocated local variable. Here’s the relevant code from zend_vm_execute.h:
#define OPLINE opline /* local variable, not EX(opline) */
#define LOAD_OPLINE() opline = EX(opline)
The VM loads opline from execute_data->opline at the start, then uses the local variable for all dispatch. begin_handler runs and modifies execute_data->opline, but the VM’s local opline was already set. After begin_handler returns, the VM continues from its local copy. Our change is silently ignored.
The Observer API is designed for observing: logging, tracing, profiling. It was never designed to intercept or skip function execution. There’s no mechanism to tell the VM “don’t run this function.”
Attempt 2: Modifying the Op Array
What if, instead of skipping execution, we replace the original function’s opcodes with a minimal “just return” sequence?
The idea: when replace() is called, patch the original function’s op_array->opcodes to be a single RETURN opcode. The function stays as ZEND_USER_FUNCTION (so DO_UCALL is happy), but its body does nothing.
The problems pile up fast:
- opcache may have cached and optimized the original opcodes. Patching them patches a shared, read-only copy. Or worse, opcache may have a different copy that doesn’t get patched.
- The RETURN opcode reads from a specific variable slot (
op1). If we construct a RETURN opcode, we need to set up the return value slot correctly, which depends on the function’s variable layout. - Other extensions (xdebug, profilers) may have cached pointers into the original opcode array for breakpoints and line mapping.
- Memory management: the op_array’s opcodes are allocated as part of a larger block. We can’t just
efreeand replace them.
I got as far as writing the code before deciding this was a rabbit hole. The function table replacement approach, with the two-file constraint, is clean and predictable. The op_array patching approach is fragile and fights against opcache, xdebug, and the VM’s assumptions.
One More Trap: Runtime Cache Slots
Even when the compiler emits DO_FCALL (the “safe” opcode that handles both types), there’s another layer of caching to be aware of.
The VM doesn’t want to do a zend_hash_find on the function table every time a function is called. Each DO_FCALL opcode has a runtime cache slot: a pointer-sized slot associated with that specific call site. The first time the opcode executes, it resolves the function via the hash table and stores the zend_function* pointer in the cache slot. Every subsequent execution of that same opcode skips the hash table entirely and uses the cached pointer.
This means: if you call a function, then replace() it, then call it again from the same call site (e.g., inside a loop), the second call might still invoke the original function. The opcode already cached the pointer from the first call.
require 'setup.php'; // defines greet(), does NOT replace yet
for ($i = 0; $i < 3; $i++) {
if ($i === 1) {
\FunSwap\replace("greet", fn($name) => "Yo, $name!");
}
echo greet("World") . "\n";
}
// Iteration 0: "Hello, World!" — first call, caches the original
// Iteration 1: replace() runs, then... "Hello, World!" — cached!
// Iteration 2: "Hello, World!" — still cached
The replacement changed the function table, but the call site’s runtime cache still points to the original. This cache is per-request and per-opcode. It resets on the next request, but within a single request, once cached, it sticks.
The takeaway: replace before the first call, not between calls. This reinforces the bootstrap pattern. Do all your replacements during initialization, before any application code runs.
Accepting the Constraint
The two-file pattern isn’t a hack. It’s how PHP is designed to work. PHP compiles one file at a time. If you want runtime changes to affect compilation, those changes need to happen before compilation starts.
This maps naturally to:
- Composer autoloading: classes and functions are loaded on demand, after bootstrap
- Bootstrap files:
vendor/autoload.php,config/bootstrap.php, the place forreplace()calls - Middleware registration: frameworks like Laravel and Symfony register middleware during bootstrap, before the request handler compiles
The only scenario where the two-file pattern is annoying is single-file test scripts. For tests, we use a helper pattern:
/* test file */
require __DIR__ . '/setup.php'; // defines + replaces
echo greet("World") . "\n"; // compiled after replacement
Or inline with file_put_contents + require:
$code = '<?php echo greet("World") . "\n";';
$tmp = tempnam(sys_get_temp_dir(), 'test');
file_put_contents($tmp, $code);
require $tmp;
unlink($tmp);
To be continued …
In the next article, we’ll clean up the mess: replace the global hack with a proper registry, add per-request lifecycle management, and write real .phpt tests. The extension will finally be something you could actually ship.
Leave a Reply