USB Bulk Endpoint Throughput in a USB HS Device
Overview
Although USB 2.0 High-Speed signals at 480 Mbps, the actual data throughput depends on the endpoint type and is reduced by USB protocol overhead. While certain USB endpoint types are allocated guaranteed bandwidth or fixed throughput, bulk endpoints are not. Instead, they are scheduled only after all other endpoint transfers have been serviced, utilizing the remaining bus bandwidth. This document explains the factors that affect the achievable throughput of bulk endpoint when an RA MCU is used as a USB High-Speed device, and presents the resulting measurements.
About Bulk endpoint
If the USB HS device is the only one on the bus, with no other USB devices or hubs competing for bandwidth, the following bulk endpoint throughput is typically considered achievable.
| Throughput | Remark |
|---|---|---|
EHCI specification (Theory) | ~ 425 Mbps | Up to 13 max-size BULK transactions are permitted per microframe. |
Practical (Optimistic at best) | ~ 320 Mbps | Typically, fewer than 10 max-size bulk transactions per microframe are achievable because of scheduling delays and bus idle gaps. |
Although a PC’s USB port may appear to be a direct root port from the host controller, in most cases it is actually connected through an internal USB hub IC, and what we see externally is just one of its downstream ports. Multiple devices, such as webcams, fingerprint readers, and Bluetooth modules, share this hub and compete for bandwidth, often reducing throughput below the Practical (Optimistic at best) value shown in the table above.
Throughput Influencing Factors
USB host scheduling latency
All USB read and write operations are initiated by the USB host, so the throughput of bulk transfers also depends on how the USB host operates.
The Measurement results below compare two USB host software cases: LibUsbDotNet and PyUSB, both of which are based on the WinUSB driver. The results may further differ depending on whether a synchronous or asynchronous API is used by the USB host software.
USB FSP using DMA or interrupt
As shown in the measurement results below, DMA achieves much higher performance than interrupt mode. However, when using DMA, the USB peripheral's D0FIFO and D1FIFO can each be assigned to only one pipe until the DMA transfer completes. This restriction means that only one bulk endpoint per IN/OUT direction can be used with DMA at a time.
The current USB FSP driver does not support configuring two or more bulk endpoints in the same direction, with one using DMA mode and the other using interrupt mode.
FIFO size allocated for the bulk pipe
Pipes 1 to 5 can be used for bulk endpoints, and their FIFO size (PIPEBUF.BUFSIZE[4:0]) can be configured up to 0x1F (2 Kbytes). When Double Buffer mode (PIPECFG.DBLB) is enabled, this results in a 4 Kbytes FIFO for the pipe. Note that the total FIFO capacity is limited to 8.5 Kbytes and is shared across all pipes.
The measurement results below compare cases where the FIFO size (PIPEBUF.BUFSIZE[4:0]) is set to the default 512 bytes or to 2048 bytes.
User buffer size allocated for the bulk transfer
If the total data to be transferred by bulk is 1 Mbyte and sufficient user memory is available, a 1 Mbyte buffer can be used. However, if a smaller buffer is allocated due to memory limitations, the smaller the buffer, the lower the throughput will be.
USB device scheduling latency
For example, with a FIFO size of 4 Kbytes and a user buffer size of 16,384 bytes, two delays can be observed on the USB bus: one each time the 4 Kbytes FIFO becomes full, and another when the 16,384 bytes transfer is finished and re-armed. The second delay in re-arming a USB bulk transfer can vary much depending on the RTOS being used, the way its resources are managed, the synchronization method of the data transfer thread, the USB device class stack employed, and so on.
Measurement results
Hardware
EK-RA6M5, 200MHz CPU clock, USB High-speed port
Software
FreeRTOS (v11.1.0) + PVND class stack
e2 studio (2025-07), FSP (6.0.0), GCC (14.3.1)
Continuous Transfer mode (PIPECFG.CNTMD)=ON
Double Buffer mode (PIPECFG.DBLB)=ON
Others
Transfer size=1,000,000 byte
Direct connection to USB host (no USB hubs or other devices in between)
USB host=Windows PC
Test code used:
uint8_t g_buf[BUF_SIZE];
R_USB_PipeRead(&g_basic_ctrl, g_buf, BUF_SIZE, bulk_out_pipe);
or
R_USB_PipeWrite(&g_basic_ctrl, g_buf, BUF_SIZE, bulk_in_pipe); |
| FIFO size (byte) | Compiler Optimization | User buffer size (BUF_SIZE, byte) | Data only BW (Mbps) for OUT operation | Data only BW (Mbps) for IN operation | ||
|---|---|---|---|---|---|---|---|
Libusbdotnet Host | Pyusb Host | Libusbdotnet Host | Pyusb Host | ||||
Interrupt | 512 | None (-O0) | 8192 | 27 | 29 | 30 | 31 |
More (-O2) | 8192 | 64 | 70 | 65 | 68 | ||
16384 | 71 | 74 | 67 | 71 | |||
2048 | None (-O0) | 8192 | 34 | 37 | 41 | 42 | |
More (-O2) | 8192 | 77 | 87 | 93 | 98 | ||
16384 | 96 | 97 | 98 | 104 | |||
DMA | 512 | None (-O0) | 8192 | 110 | 132 | 121 | 139 |
More (-O2) | 8192 | 143 | 162 | 132 | 156 | ||
16384 | 175 | 201 (Note 1) | 148 | 172 | |||
2048 | None (-O0) | 8192 | 120 | 126 | 138 | 150 | |
More (-O2) | 8192 | 138 (Note 3) | 153 (Note 3) | 155 | 179 | ||
16384 | 175 | 196 | 176 | 219 (Note 2) | |||
Note 1. Best OUT (host->device) result when using FIFO size=512, Compiler Opt=more(-O2), BUF_SIZE=bigger, USB host=PyUSB.
Note 2. Best IN (device->host) result when using FIFO size=2048, Compiler Opt=more(-O2), BUF_SIZE=bigger, USB host=PyUSB.
Note 3. FIFO sizes of 2048 bytes result in lower throughput compared to 512 bytes, as more idle packets observed and caused delays.