Benchmark results# Contents Flame graph of original implementation Flame graph of CPU implementation Flame graph of GPU implementation Random (N, N, N, N) tensor# N = (10, 12, 14, 17, 20, 24, 29) Plot by number of X elements# Plot by number of B elements#