Efficient Remote Profiling for Resource-Constrained Devices

Priya Nagpurkar, Hussam Mousa, Chandra Krintz, and Tim Sherwood

ACM Transactions on Architecture and Code Optimization (TACO)
Vol. 3, Number 1, March, 2006, pages 1-32.

PDF

Abstract

The widespread use of ubiquitous, mobile, and continuously-connected computing agents has inspired software developers to change the way they test, debug, and optimize software. Users now play an active role in the software evolution cycle by dynamically providing valuable feedback about the execution of a program to developers. Software developers can use this information to isolate bugs in, maintain, and improve the performance of a wide-range of diverse and complex embedded device applications. The collection of such feedback poses a major challenge to systems researchers since it must be performed without degrading a user's experience with, or consuming the severely restricted resources of the mobile device. At the same time, the resource constraints of embedded devices prohibit the use of extant software profiling solutions.

To achieve efficient remote profiling of embedded devices, we couple two efficient hardware/soft\-ware program monitoring techniques: Hybrid Profiling Support(HPS) and Phase-Aware Sampling. HPS efficiently inserts profiling instructions into an executing program using a novel extension to Dynamic Instruction Stream Editing(DISE). Phase-aware sampling exploits the recurring behavior of programs to identify key opportunities during execution in order to collect profile information (i.e. sample). Our prior work on phase-aware sampling required code duplication to toggle sampling. By guiding low-overhead, hardware-supported sampling according to program phase behavior via HPS, our system is able to collect highly accurate profiles transparently.

We evaluate our system assuming a general purpose configuration as well as a popular hand-held device configuration. We measure the accuracy and overhead of our techniques and quantify the overhead in terms of computation, communication, and power consumption. We compare our system to random and periodic sampling for a number of widely used performance profile types. Our results indicate that our system significantly reduces the overhead of sampling while maintaining high accuracy.