You will find the 🇫🇷 French version of this article here
For the past few months, I’ve been working on AstMetrics, a tool for analyzing source code of software projects at scale, regardless of the programming language.
The idea is simple: instead of limiting analysis to line counts or superficial static rules, AstMetrics relies directly on the AST (Abstract Syntax Tree), i.e. the structured representation of code as understood by the compiler.
With an AST, you can measure much more than surface-level metrics: complexity, nesting depth, number of branches, dependencies between logical units, and so on. You can also compare metrics between project versions and detect trends.
From the beginning, AstMetrics was designed as language-agnostic. Nothing prevents analyzing PHP, JavaScript, Python, or Go: as long as I can obtain an AST in a stable format (JSON, for instance), I can build metrics on top of it. This is one of the reasons I started AstMetrics compared to PhpMetrics, which is PHP-only.
It’s in this context that Go-PHP-Parser was born.
To extract the AST of a language, there are two main approaches:
At first, I explored the first option, which seemed more interesting.
The Lex+Yacc combo is classic: it was used to build parsers for many languages in the 80s–90s. There are modern
equivalents in Go, like goyacc
.
These are fundamental tools, used as compilation engines for many programming languages.
So I started writing a PHP grammar for Yacc in Go. Very quickly, I hit the limits:
I tried to automate part of the process with AI to generate the rules. It turned out too complex for the AI. Maybe in a few months it’ll be worth trying again… I spent hours on it, but for now I’m dropping that path.
By the way, a project like z7zmey/php-parser followed that approach. It’s a native PHP parser in Go based on a hand-written grammar. But it’s not fully up to date (PHP 8.2), and you can see why: maintaining a manual PHP grammar in another language is a never-ending job.
Result: I learned a lot, but abandoned the idea.
If you’re interested in the subject, I recommend reading Lex & Yacc, by John Levine, Doug Brown, Tony Mason. It’s dense but really useful, especially if you like regular expressions!
The second option is to avoid reinventing the wheel.
PHP already has its official parser, maintained by the language team. There’s even an extension, ext-ast, which exposes PHP’s AST internally in a stable and versioned form (thanks to Nikita Popov 🙏).
The problem: to use it, you must have PHP installed in the correct version, and you also need the ext-ast
extension enabled.
This works locally, but not for a generic tool like AstMetrics, which must run on any machine without dependencies.
I tried building a standalone PHP to parse code. It worked, but the performance was awful, and CPU usage was huge.
The most logical solution (not necessarily the simplest, I admit): switch to C
and use the SAPI Embed
to call the
official parser.
The chosen solution was to embed the PHP engine directly as a C library thanks to the SAPI Embed.
PHP offers several SAPIs (Server APIs). The most well-known is SAPI FPM
for running PHP behind a web server.
The Embed SAPI is an interface that allows you to use the PHP engine as a library inside another C
program.
You can initialize the engine, feed it some code, and get the result back.
This SAPI is available in the PHP GitHub repository.
By enabling ext-ast
, I can ask PHP not for the execution result, but directly for the AST
of the code.
This AST is identical to the one PHP uses internally, so it is always up to date with the language.
An AST is simply a tree representation of your source code. For example, the code:
while b ≠ 0:
if a > b:
a := a - b
else:
b := b - a
return a
is represented by this tree (Wikipedia illustration):
I wrote a small bridge in C
that:
ext-ast
.This bridge is exposed to Go via cgo. In practice, from Go I can simply call:
ast, err := parser.Parse("<?php echo 1 + 2;")
and I get a JSON structure describing the AST.
To avoid forcing the user to compile PHP embed themselves, the project relies on static-php-cli:
Result: the Go user has nothing to install. Just run:
go get github.com/Halleck45/go-php-parser
In the future, I might drop static-php-cli
if I see the project isn’t maintained anymore. It’s possible, even though
static-php-cli
saves a lot of time when compiling PHP.
Here’s an overview of the overall architecture of Go-PHP-Parser:
Two main reasons:
Performance: Go compiles to native binaries without a heavy runtime. It’s fast for handling C
calls via cgo
and efficient at processing large volumes of files in parallel thanks to goroutines. Perfect for scanning entire
repositories.
Interoperability: Go is a good language for writing easy-to-use libraries. By providing a Go API, I make integration into AstMetrics trivial.
Approach | Advantage | Drawback |
---|---|---|
Custom parser (Lex/Yacc) | Independent, full control | Huge maintenance, slow to keep up to date |
Project like z7zmey | Native Go, fast | Not up to date, costly to maintain |
Calling a PHP binary | Simple to implement | External process, I/O overhead, requires installation |
Embed + ext-ast (current) | Fast, always up to date, reduced maintenance | Requires a C bridge and embedded binaries. More complex |
Preliminary benchmarks show that:
php-ast
natively (4,000 to 8,000 files per second on my 16-core
32 GB RAM PC).php
process for each file.Go-PHP-Parser was born to serve AstMetrics, but it can be useful for much more:
Basically, anything that needs fast and reliable access to the PHP AST.
Go-PHP-Parser is not a parser written from scratch, and that’s intentional.
Instead of maintaining a parallel PHP grammar, I chose to rely on the official parser of the language, via the Embed SAPI
and ext-ast
. This ensures staying up to date while benefiting from native performance and the simplicity of Go.
Next steps for me: using it in AstMetrics! It’s a lot of work, but little by little it’s moving forward, among my many other projects…
I hope the project will be useful to others! If you’d like to test or contribute, the project is available here: https://github.com/Halleck45/go-php-parser
© Jean-François Lépine, 2010 - 2025