another example (Newton-Raphson zooming for Mandelbrot set):
with 7 threads : 105.26 Watts * 12.2 seconds = 1284.6 Joules
with 1 thread : 54.17 Watts * 52.1 seconds = 2822.3 Joules
sleeping / idle overhead : 26.01 Watts
7 threads minus overhead (105 - 26.01)W * 12.2s = 964 Joules
1 thread minus overhead (54.17 - 26.01)W * 52.1s = 1467.2 Joules
if the machine would be on/idle anyway: 105.26W * 12.2s + 26.01 W * (52.1 - 12.2)s = 2321.9 Joules
thus using more threads saves energy even when parallelism efficiency is far from perfect: best to get in/out as quickly as possible so you can turn the machine off (ideal case) / leave it fully idle (second best)
power consumption doesn't scale linearly with load (a little load increases a lot vs baseline: 1 thread doubles idle power consumption, but high load doesn't increase much more: 7 threads quadruples idle power consumption)
measured with turbostat on Debian, AMD 2700X CPU with default CPU scaling governor, usual browser/email/etc running too