Code Optimization

Code Optimization is the process of improving the efficiency of a program without changing its functionality. The goal is to:

Reduce execution time (faster code)
Reduce memory usage

Reduce power consumption (especially important in embedded systems)

Optimization is especially critical in embedded ARM systems where speed and resources are limited.

Profiling

Profiling is the process of measuring how much time or CPU cycles each part (function/subroutine) of a program takes during execution.

Goal:

To identify performance bottlenecks

Focus optimization only on critical sections, not the entire code

Tool:

ARM uses a tool called ARMulator (ARM Simulator).
The profiler in ARMulator samples the Program Counter (PC) at regular intervals.

How It Works:

Every time PC is sampled, the profiler checks which function it’s pointing to.
A hit counter is maintained per function.
Higher hits = that function is running more often or for longer.

Note:

Too few samples = inaccurate results
Some systems use timer interrupts to sample the PC, but this can slow down real-time systems.

Cycle Counting

Cycle counting is the process of measuring the number of CPU clock cycles taken to execute a specific block of code or function.

Purpose:

Benchmark before and after optimization
Measure exact time consumption per function

How to Perform Cycle Counting in ARM:

ARM processors typically don’t have built-in cycle counters.

So, use ARMulator (simulator) with cycle-counting features.
You can simulate different ARM cores (like ARM7, ARM9) and get accurate cycle data.

Optimization Workflow in Embedded ARM Development

Step	Description
1. Profile	Use a profiler to find which functions take the most time.
2. Analyze	Understand why the function is slow (loop, memory access, etc.)
3. Optimize	Use better algorithms, reduce memory access, use shift instead of multiply, etc.
4. Cycle Count	Use a simulator to count cycles before and after optimization.
5. Validate	Ensure optimized code works correctly and efficiently.

Example Case:

🔻 Before Optimization:

for (i = 0; i < 10000; i++)
result += data[i] * 2;

🔺 After Optimization (using barrel shifter in ARM):

LDR R1, =data
MOV R2, #0
MOV R3, #0

loop:
LDR R4, [R1, R2, LSL #2]
ADD R3, R3, R4, LSL #1 ; Multiply by 2 using barrel shifter
ADD R2, R2, #1
CMP R2, #10000
BNE loop

Code Optimization, Profiling, and Cycle Counting