@@ -7,6 +7,7 @@ metrics like memory bandwidth, latency, and utilization:
77
88* Unified Coherence Fabric (UCF)
99* PCIE
10+ * PCIE-TGT
1011
1112PMU Driver
1213----------
@@ -212,6 +213,11 @@ Example usage:
212213
213214 perf stat -a -e nvidia_pcie_pmu_0_rc_4/event=0x4,src_bdf=0x0180,src_bdf_en=0x1/
214215
216+ .. _NVIDIA_T410_PCIE_PMU_RC_Mapping_Section :
217+
218+ Mapping the RC# to lspci segment number
219+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
220+
215221Mapping the RC# to lspci segment number can be non-trivial; hence a new NVIDIA
216222Designated Vendor Specific Capability (DVSEC) register is added into the PCIE config space
217223for each RP. This DVSEC has vendor id "10de" and DVSEC id of "0x4". The DVSEC register
@@ -267,3 +273,74 @@ Example output::
267273 000d:40:00.0: Bus=40, Segment=0d, RP=01, RC=04, Socket=01
268274 000d:c0:00.0: Bus=c0, Segment=0d, RP=02, RC=04, Socket=01
269275 000e:00:00.0: Bus=00, Segment=0e, RP=00, RC=05, Socket=01
276+
277+ PCIE-TGT PMU
278+ ------------
279+
280+ This PMU is located in the SOC fabric connecting the PCIE root complex (RC) and
281+ the memory subsystem. It monitors traffic targeting PCIE BAR and CXL HDM ranges.
282+ There is one PCIE-TGT PMU per PCIE RC in the SoC. Each RC in Tegra410 SoC can
283+ have up to 16 lanes that can be bifurcated into up to 8 root ports (RP). The PMU
284+ provides RP filter to count PCIE BAR traffic to each RP and address filter to
285+ count access to PCIE BAR or CXL HDM ranges. The details of the filters are
286+ described in the following sections.
287+
288+ Mapping the RC# to lspci segment number is similar to the PCIE PMU. Please see
289+ :ref: `NVIDIA_T410_PCIE_PMU_RC_Mapping_Section ` for more info.
290+
291+ The events and configuration options of this PMU device are available in sysfs,
292+ see /sys/bus/event_source/devices/nvidia_pcie_tgt_pmu_<socket-id>_rc_<pcie-rc-id>.
293+
294+ The events in this PMU can be used to measure bandwidth and utilization:
295+
296+ * rd_req: count the number of read requests to PCIE.
297+ * wr_req: count the number of write requests to PCIE.
298+ * rd_bytes: count the number of bytes transferred by rd_req.
299+ * wr_bytes: count the number of bytes transferred by wr_req.
300+ * cycles: count the clock cycles of SOC fabric connected to the PCIE interface.
301+
302+ The average bandwidth is calculated as::
303+
304+ AVG_RD_BANDWIDTH_IN_GBPS = RD_BYTES / ELAPSED_TIME_IN_NS
305+ AVG_WR_BANDWIDTH_IN_GBPS = WR_BYTES / ELAPSED_TIME_IN_NS
306+
307+ The average request rate is calculated as::
308+
309+ AVG_RD_REQUEST_RATE = RD_REQ / CYCLES
310+ AVG_WR_REQUEST_RATE = WR_REQ / CYCLES
311+
312+ The PMU events can be filtered based on the destination root port or target
313+ address range. Filtering based on RP is only available for PCIE BAR traffic.
314+ Address filter works for both PCIE BAR and CXL HDM ranges. These filters can be
315+ found in sysfs, see
316+ /sys/bus/event_source/devices/nvidia_pcie_tgt_pmu_<socket-id>_rc_<pcie-rc-id>/format/.
317+
318+ Destination filter settings:
319+
320+ * dst_rp_mask: bitmask to select the root port(s) to monitor. E.g. "dst_rp_mask=0xFF"
321+ corresponds to all root ports (from 0 to 7) in the PCIE RC. Note that this filter is
322+ only available for PCIE BAR traffic.
323+ * dst_addr_base: BAR or CXL HDM filter base address.
324+ * dst_addr_mask: BAR or CXL HDM filter address mask.
325+ * dst_addr_en: enable BAR or CXL HDM address range filter. If this is set, the
326+ address range specified by "dst_addr_base" and "dst_addr_mask" will be used to filter
327+ the PCIE BAR and CXL HDM traffic address. The PMU uses the following comparison
328+ to determine if the traffic destination address falls within the filter range::
329+
330+ (txn's addr & dst_addr_mask) == (dst_addr_base & dst_addr_mask)
331+
332+ If the comparison succeeds, then the event will be counted.
333+
334+ If the destination filter is not specified, the RP filter will be configured by default
335+ to count PCIE BAR traffic to all root ports.
336+
337+ Example usage:
338+
339+ * Count event id 0x0 to root port 0 and 1 of PCIE RC-0 on socket 0::
340+
341+ perf stat -a -e nvidia_pcie_tgt_pmu_0_rc_0/event=0x0,dst_rp_mask=0x3/
342+
343+ * Count event id 0x1 for accesses to PCIE BAR or CXL HDM address range
344+ 0x10000 to 0x100FF on socket 0's PCIE RC-1::
345+
346+ perf stat -a -e nvidia_pcie_tgt_pmu_0_rc_1/event=0x1,dst_addr_base=0x10000,dst_addr_mask=0xFFF00,dst_addr_en=0x1/
0 commit comments