Skip to content

empirical Roofline result abnormal #5

@fancyerii

Description

@fancyerii

In my laptop(4cores/8hyperthreads), the L1 speed is about 795GB/s. far slower than the speed of the book.

image

My environment:
Ubuntu 20.04, gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, openmpi 4.0.7(compiled from source and install to /home/lili/soft/localinstall with --prefix)

my configure:

# Linux workstation, MPI and OpenMP (8-core Intel Xeon CPU E5530, 2.40 GHz)

ERT_RESULTS Results.madonna.lbl.gov.01

ERT_DRIVER  driver1
ERT_KERNEL  kernel1

ERT_MPI         True
ERT_MPI_CFLAGS
ERT_MPI_LDFLAGS

ERT_OPENMP         True
ERT_OPENMP_CFLAGS  -fopenmp
ERT_OPENMP_LDFLAGS -fopenmp

ERT_FLOPS   1,2,4,8,16
ERT_ALIGN   32

ERT_CC      /home/lili/soft/localinstall/bin/mpic++
ERT_CFLAGS  -O3 -march=native -msse3 -mavx2 -fpermissive -fstrict-aliasing -ftree-vectorize

ERT_LD      /home/lili/soft/localinstall/bin/mpic++
ERT_LDFLAGS
ERT_LDLIBS

ERT_PRECISION FP64

ERT_RUN     export OMP_NUM_THREADS=ERT_OPENMP_THREADS; /home/lili/soft/localinstall/bin/mpirun -np ERT_MPI_PROCS ERT_CODE

ERT_PROCS_THREADS  1-8
ERT_MPI_PROCS      1,2,4
ERT_OPENMP_THREADS 1,2,4,8

ERT_NUM_EXPERIMENTS 3

ERT_MEMORY_MAX 1073741824

ERT_WORKING_SET_MIN 1

ERT_TRIALS_MIN 1

ERT_GNUPLOT gnuplot
$ lscpu
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      39 bits physical, 48 bits virtual
CPU(s):                             8
On-line CPU(s) list:                0-7
Thread(s) per core:                 2
Core(s) per socket:                 4
Socket(s):                          1
NUMA node(s):                       1
Vendor ID:                          GenuineIntel
CPU family:                         6
Model:                              158
Model name:                         Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
Stepping:                           9
CPU MHz:                            1469.562
CPU max MHz:                        3900.0000
CPU min MHz:                        800.0000
BogoMIPS:                           5799.77
Virtualization:                     VT-x
L1d cache:                          128 KiB
L1i cache:                          128 KiB
L2 cache:                           1 MiB
L3 cache:                           8 MiB
NUMA node0 CPU(s):                  0-7
Vulnerability Gather data sampling: Mitigation; Microcode
Vulnerability Itlb multihit:        KVM: Mitigation: Split huge pages
Vulnerability L1tf:                 Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds:                  Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown:             Mitigation; PTI
Vulnerability Mmio stale data:      Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed:             Mitigation; IBRS
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:                Mitigation; Microcode
Vulnerability Tsx async abort:      Mitigation; Clear CPU buffers; SMT vulnerable
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse
                                    2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopolog
                                    y nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xt
                                    pr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3
                                    dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpi
                                    d ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt inte
                                    l_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clea
                                    r flush_l1d arch_capabilities

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions