Purpose
- Fine-tune the performance of invoking Lua scripts using the gopher-lua library: whether to use pooling and what strategy to implement for the virtual machine pool.
- Output a performance test report to users (developers).
Test Description
Based on Go’s benchmarking capabilities, concurrent test cases are written. To rule out the performance impact of the script itself, the script only implements simple logic and pre-compilation is implemented. By adjusting the virtual machine pool strategy, CPU count, and concurrency, output the average time taken to call Lua scripts, and the memory occupied.
Test Script
function helloLua(n)
goSayHello("hello", "my name is lua") -- Call a Go method
return n, 100000
end
Benchmark Code
var luaMng = NewLuaPreCompileManager(NewLStatePool)
func init() {
err := luaMng.CompileLua("test.lua", script)
if err != nil {
panic(err)
}
}
func invokeLua() {
result, err := luaMng.InvokeScriptFunc("test.lua", "helloLua", 30*time.Second, 2, 1)
if err != nil {
panic(err)
}
fmt.Println(result[0], result[1])
}
// go test -bench='Parallel$' -cpu=2 -benchtime=5s -count=3 -benchmem
func BenchmarkLuaPreCompileManager_InvokeScriptFunc_Parallel(b *testing.B) {
b.ReportAllocs()
b.ResetTimer()
b.SetParallelism(2000)
b.RunParallel(func(pb *testing.PB) {
for pb.Next() {
invokeLua()
}
})
}
Virtual Machine Configuration:
return NewLState(lua.Options{
CallStackSize: 32, // Maximum call stack size, the depth of the call stack, that is, up to 32 method depths
MinimizeStackMemory: true, // The call stack will automatically grow and shrink as needed, up to `CallStackSize`
})
Report Data
A few concepts:
- Core: Virtual machines that are resident in memory, i.e., returned to the pool after use
- Non-core: Virtual machines that are Closed after use
- Blocking: Blocking wait for a virtual machine to be returned to the pool
- Non-blocking: Directly create a new virtual machine
Virtual Machine Pool | Benchmark Specified CPU Count | Benchmark Duration | Benchmark Count | Parallelism (Goroutine Count) | Milliseconds/op | Memory Consumption/op | CPU Usage (Peak) | Memory Usage (Peak) |
---|---|---|---|---|---|---|---|---|
No Pooling | 2 | 10s | 5 | 1000 | 0.18455 | 159.5KB | 190% | 476.9M |
No Pooling | 2 | 10s | 5 | 2000 | 0.168622 | 159.5KB | 191% | 935.8M |
No Pooling | 2 | 10s | 5 | 4000 | 0.175112 | 159.6KB | 190% | 1.82G |
Pooling, Variable Size | 2 | 10s | 5 | 1000 | 0.065165 | 6.53KB | 44% | 291M |
Pooling, Variable Size | 2 | 10s | 5 | 2000 | 0.073247 | 6.50KB | 50% | 560M |
Pooling, Variable Size | 2 | 10s | 5 | 4000 | 0.077863 | 6.47KB | 52% | 1.08G |
Fixed Core Count 1000+Unlimited Non-core | 2 | 10s | 5 | 4000 | 0.046725 | 7.4KB | 90% | 883M |
Fixed Core Count 2000+Unlimited Non-core | 2 | 10s | 5 | 4000 | 0.045968 | 6.8KB | 66% | 962M |
Fixed Core Count 1000+Blocking Wait | 2 | 10s | 5 | 4000 | 0.048416 | 6.52KB | 70% | 326M |
Fixed Core Count 2000+Blocking Wait | 2 | 10s | 5 | 4000 | 0.04729 | 6.52KB | 72% | 652M |
Fixed Core Count 1000+Blocking Wait | 4 | 10s | 5 | 4000 | 0.046915 | 6.52KB | 100% | 348M |
Fixed Core Count 2000+Blocking Wait | 4 | 10s | 5 | 4000 | 0.047518 | 6.52KB | 102% | 649M |
Fixed Core 1000+Non-core 2000+Blocking Wait | 2 | 10s | 5 | 4000 | 0.048806 | 7.2KB | 84% | 682M |
Report Analysis
The time taken for a single script call is less than 0.2ms.
Comparison between Pooling and Non-Pooling:
- There is a significant difference in CPU usage; creating virtual machines is very CPU-intensive.
- The average time per script call is 0.1ms apart, and pooling can enhance performance.
- At the same level of parallelism, the total memory usage is significantly different; pooling can reduce memory consumption.
Pooling without Fixed Pool Size:
- Supports 1000 parallel processes, requiring 291M of memory, and for every doubling of the number of parallel processes, the memory usage doubles.
- The higher the level of parallelism, the higher the average time taken.
Pooling with Fixed Core Count + Unlimited Non-core Count: At the same level of parallelism, the lower the core count, the higher the CPU usage and the average time taken.
Pooling with Fixed Core Count + Blocking Wait: At the same level of parallelism, the lower the core count, the higher the average time taken, but the total memory usage is lower.
Pooling with Fixed Core Count + Unlimited Non-core Count vs Pooling with Fixed Core Count + Blocking Wait: With the same 2000 core count, 4000 concurrent processes, blocking wait has lower memory usage than non-blocking wait, but the average time taken is higher.
Optimization Plan
- Choose the direction of pooling to reduce CPU consumption, memory usage, and lower the average time taken.
- The maximum size of the pool should be limited to prevent OOM due to sudden traffic spikes.
- Consider optimizing pool performance with a fixed core count + maximum non-core count + blocking wait strategy (based on the average concurrency and memory situation of a single process, configure a fixed core count + maximum non-core count):
- Performance is best when the concurrency is less than the core count;
- When the concurrency is within the maximum non-core range, it can reduce blocking waits;
- When the concurrency exceeds the maximum core count, it can control the maximum memory usage.
- To avoid the pool continuing to occupy memory in cases where the script rules have been unloaded, etc., a free check and release mechanism should be implemented.