It's Not Easy Being Green: On the Energy Efficiency of Programming Languages

9 pointsposted a year ago
by whereistimbo

12 Comments

emeryberger

a year ago

Hi, co-author here! We're happy to answer questions. See some discussion here:

* Mastodon: https://mastodon.social/@ltratt/113282264909342842 * Lobsters: https://lobste.rs/s/y12hdo/it_s_not_easy_being_green_on_ener...

igouy

a year ago

> Fig. 4. "This simple model captures the relationship implied in Pereira et al., namely that the choice of programming language has a direct impact on total energy consumption."

"Pereira et al." state that "the quality of the compiler, and its (aggressive) optimizations all greatly influence the performance of the resulting programs".

iow the relationship implied in "Pereira et al." is that shown in Fig. 5. "We are in fact comparing implementations of programming languages, not the languages themselves."

"Pereira et al." state that "… energy consumption does not depends only on execution time, as shown in the equation Energy = Time × Power."

iow the relationship implied in "Pereira et al." is that shown in Fig. 8. "Energy consumption is the product of power and time."

"Pereira et al." state that "… a software program may become faster by improving its source code …"

iow the relationship implied in "Pereira et al." has elements of Fig. 12. "The final model incorporating application implementations".

The simple model of Fig. 4. is a strawman.

igouy

a year ago

Perhaps clarifications rather than questions.

> "Critique: 2.2.1 Programming Language versus Implementation"

In context, it seems possible to read "Programming Language" as shorthand for "Programming Language Implementation" as-appropriate.

(Especially since “Pereira et al.” list "Compiler / Interpreter Versions" such-as "JRuby : jruby 9.1.7.0" and "Ruby : ruby 2.4.1".)

> "Critique: 2.2.2 Quality of Benchmark Implementations"

Surely not the quality of the particular programs, selected from the benchmarks game and used for comparison by “Pereira et al.”; surely the suitability of the selection process used by “Pereira et al.” to choose programs for their purpose.

That "corpus of small benchmark implementations" most likely provided both parallel and sequential programs, most likely provided both SIMD and non-SIMD programs, etc.

Presumably “Pereira et al.” could have chosen only sequential / non-SIMD / standard library programs for their comparison, but did not.

> "Critique: 2.2.3 Apparent Anomalies."

> "C++ is reported as being 34% less energy efficient and 56% slower than C"

For a single outlier (regex-redux) there's a 12x difference between the measured times of the selected (pcre) C and (boost/regex) C++ programs.

As you say, apparent anomalies presented without investigation or explanation.

> "TypeScript is reported as being 4.8× less energy efficient and 7.1× slower than JavaScript."

It seems that there may have been some kind-of problem with tsc back in the day.

The exact same fannkuch-redux program that took 1,234.81 seconds (node.js v8.1.3 and tsc 2.4.1) in July 2017, only took 147.23 seconds (node.js v9.4.0 and tsc 2.6.2) in January 2018.

(Unfortunately the Internet Archive is currently unable to provide details.)

nicovank

a year ago

> Quality of Benchmark Implementations

Correct. "Selection of Benchmark Implementations" is a better name here. We'll update this in the next iteration. The point in this subsection is indeed that the selection is not adequate for comparison. This is not the only issue, even an adequate selection of perfectly idiomatic and identical implementations would not have resulted in accurate comparison.

> C/C++ Outlier

Correct, Section 4.5.2 details this. It is 8.9x for us.

> JS/TS Outlier

The main outlier on our machine is mandelbrot, 21x (Section 4.5.1). Our second outlier is n-body (not discussed).

igouy

a year ago

> would not have resulted in accurate comparison

Because? Is the reasoning for that spelled-out somewhere in the paper?

> Section 4.5.2

> Section 4.5.1

After the paper had discussed “Pereira et al.” I repeatedly confused discussion of your new measurements with discussion of the old “Pereira et al.” measurements.

> "forcing benchmarks to run on a single core" p2&3

> "we eliminate the effect of varying concurrency in different benchmark implementations by limiting benchmarks to execute on a single core" p6

> "the JavaScript version uses 28 cores on average" p14

fwiw I am now very confused.

emeryberger

a year ago

We only pin to one core for one experiment described in Section 4.3. All the remaining experiments are run with full access to all cores.

igouy

a year ago

Thank you.

I'm concerned that section 2.2.1 is a misreading of Pereira et al.

[29] "… the performance of a language is influenced by the quality of its compiler, virtual machine, garbage collector, available libraries, etc."

In that context it seems plain that "language" must be understood as a shortening of "language implementation."

> "For instance, Pereira et al. treat Ruby and JRuby as different languages, while they are in fact two separate implementations of the same Ruby language."

It seems to me that Pereira et al. treat Ruby and JRuby as different "language implementations" and compare each one independently against the other language implementations.

(In the "corpus of small benchmark implementations" it was simply convenient to keep separate programs for Ruby and JRuby.)

emeryberger

a year ago

Those papers say "language" over and over again, in the titles, in the body of the text. That work confounds languages and their implementations, and make it sound like there is a one-to-one connection between the two (of course, there is not necessarily such a correspondence).

With respect to Ruby vs. JRuby: my student just checked and verified that some but not all of the benchmarks are implemented differently (k-nucleotide, mandelbrot, pidigits, spectral-norm).

igouy

a year ago

> Those papers say "language" over and over again, in the titles, in the body of the text.

Yes they do! And over and over again in-context we sensibly read that to mean what you wish to term more precisely "language implementation".

> Fig. 4 Fig. 5 "We are in fact comparing implementations of programming languages, not the languages themselves."

They know. They just prefer shorter names.

Here's their short-name precise-name lookup table:

https://sites.google.com/view/energy-efficiency-languages/se...

steveklabnik

a year ago

Thank you so much for doing this work. I am so glad to see this. I don’t care that the older paper made Rust look good: it was terribly flawed in many ways, and I was embarrassed every time someone would bring it up as an example of why Rust was great.