Towards a Synthetic Benchmark to Assess VM Startup, Warmup, and Cold-Code Performance
Just-in-time compilation is a major success story for the performance of dynamic languages. Though, because work needs to be done at run time, JIT compilers spend time and memory on optimizing instead of executing programs. Historically, this tradeoff was well worth it for long-running server applications. In interactive systems such as browsers, the performance of “cold code” that is executed only a few times is known to be an issue. With the increased velocity in software development cycles, cold code has become an issue for “long running” server applications as well. When we deploy and restart large applications multiple times a day, fewer and fewer parts get “hot” and optimized, making cold code and interpreter performance more important again.
Since these issues are particularly pronounced in applications with millions of lines of code, academic researchers can rarely study it. Such large systems have often grown over time and are at the heart of complex architectures. Even if companies would be willing to share such code, it would likely be impractical to run and study.
In this talk, I’ll discuss first steps on how to generate large code bases to investigate VM startup, warmup, and cold-code performance. Simply generating a million lines of code is not going to lead to useful insights. One could generate a single method with arithmetic operations, or perhaps a million lines distributed over a few hundred thousand methods. Though, a single class with all these methods, would require method lookup to be optimized for scenarios not seen in “natural” code bases, and thus, is not directly useful.
Based on a structural code metrics, we can however generate such code in a much more “natural” form. I’ll discuss the basic ideas, the statistics behind it, and how the approach can be used to ensure a code structure that is similar to what one would find in code written by developers. Since this first step only focused on code structure, I’ll also discuss what next steps could be in terms of objects, operation mix, and the various metrics that may be used to characterize a program’s behavior.
Though, no matter what fancy metrics we use, the code will be artificial. While generating benchmarks makes it easy to experiment with optimizations, it’s unclear whether the results are of use in practice. To start a discussion, I’ll propose an experimental setup to assess whether the such a synthetic benchmarks gives valuable insights.
Tue 23 MarDisplayed time zone: Belfast change
15:00 - 16:30 | |||
15:00 30mTalk | The Strange and Wondrous Life of Functions in Ř MoreVMs Jan Ječmen FIT CTU Prague, Olivier Flückiger Northeastern University, Sebastián Krynski Czech Technical University, National University of Quilmes, Jan Vitek Northeastern University / Czech Technical University Media Attached | ||
15:30 30mTalk | Successes and Challenges in Bringing Performance to Java with Inline Types MoreVMs Sharon Wang IBM Media Attached File Attached | ||
16:00 30mTalk | Towards a Synthetic Benchmark to Assess VM Startup, Warmup, and Cold-Code Performance MoreVMs Stefan Marr University of Kent Media Attached |