Code Optimization
Code Optimization is the process of improving the efficiency of a program without changing its functionality. The goal is to:
- Reduce execution time (faster code)
- Reduce memory usage
- Reduce power consumption (especially important in embedded systems)
Optimization is especially critical in embedded ARM systems where speed and resources are limited.
Profiling
Profiling is the process of measuring how much time or CPU cycles each part (function/subroutine) of a program takes during execution.
Goal:
- To identify performance bottlenecks
- Focus optimization only on critical sections, not the entire code
Tool:
- ARM uses a tool called ARMulator (ARM Simulator).
- The profiler in ARMulator samples the Program Counter (PC) at regular intervals.
How It Works:
- Every time PC is sampled, the profiler checks which function it’s pointing to.
- A hit counter is maintained per function.
- Higher hits = that function is running more often or for longer.
Note:
- Too few samples = inaccurate results
- Some systems use timer interrupts to sample the PC, but this can slow down real-time systems.
Cycle Counting
Cycle counting is the process of measuring the number of CPU clock cycles taken to execute a specific block of code or function.
Purpose:
- Benchmark before and after optimization
- Measure exact time consumption per function
How to Perform Cycle Counting in ARM:
- ARM processors typically don’t have built-in cycle counters.
- So, use ARMulator (simulator) with cycle-counting features.
- You can simulate different ARM cores (like ARM7, ARM9) and get accurate cycle data.
Optimization Workflow in Embedded ARM Development
Step | Description |
---|---|
1. Profile | Use a profiler to find which functions take the most time. |
2. Analyze | Understand why the function is slow (loop, memory access, etc.) |
3. Optimize | Use better algorithms, reduce memory access, use shift instead of multiply, etc. |
4. Cycle Count | Use a simulator to count cycles before and after optimization. |
5. Validate | Ensure optimized code works correctly and efficiently. |
Example Case:
🔻 Before Optimization:
for (i = 0; i < 10000; i++)
result += data[i] * 2;
🔺 After Optimization (using barrel shifter in ARM):
LDR R1, =data
MOV R2, #0
MOV R3, #0
loop:
LDR R4, [R1, R2, LSL #2]
ADD R3, R3, R4, LSL #1 ; Multiply by 2 using barrel shifter
ADD R2, R2, #1
CMP R2, #10000
BNE loop