RZ/T2M Code Snippets
PMU Cycle Measurement (IAR)
Measuring execution times with peripheral counters requires some efforts (configuration in Smart Configurator), and is error-prone if applying or considering wrong dividers.
Furthermore these are not very accurate when measureing small runtimes. It is more convenient to use the Performance Monitor Unit (PMU) of CR52.
The PMU register description of CR52 is available here:
https://developer.arm.com/documentation/100026/0100/performance-monitor-unit/pmu-register-summary?lang=en
Enable PMU:
// PMUSERENR = 1 asm volatile ("mcr p15, 0, %0, c9, c14, 0" :: "r"(1)); // PMCR.E (bit 0) = 1 asm volatile ("mcr p15, 0, %0, c9, c12, 0" :: "r"(1)); // PMCNTENSET.C (bit 31) = 1 asm volatile ("mcr p15, 0, %0, c9, c12, 1" :: "r"(1 << 31));
Cycle measurement:
uint32_t cycle_start_count, cycle_end_count, resultCycles; asm volatile ("MRC p15, 0, %0, C9, C13, 0" :"=r"(cycle_start_count)); do_my_benchmark (); asm volatile ("MRC p15, 0, %0, C9, C13, 0" :"=r"(cycle_end_count)); resultCycles = cycle_end_count - cycle_start_count;
Optionally, clear the counter to avoid overflows:
asm volatile ("MCR p15, 0, %0, C9, C13, 0" :: "r"(0x0));
Comments:
When single stepping in IAR, Performance Monitoring must be enabled, otherwise the counter is 0.
Or, without the code above the counter can be read out manually after enabling:
(Display with “I-Jet” -> “Performance Monitoring”)Using breakpoints will show an increased number of cycles (around 40 plus). The debugger needs some cycles to access and stop the PMU counter.
Doing a “clear – read – benchmark – read” adds around 20 cycles to the result.
This can be avoided by adding nops between clear and read: for (uint8_t i=0; i<100; i++) __asm volatile ("nop");PMU is only available for Cortex-R. (Cortex-M has different unit DWT (Data Watchpoint and Trace) for cycle count (register CYCCNT).)