Benchathon Activities No porting was done! Problem investigation Benchmark improvements: validation, workload size, tools Versions: v11 going into benchathon, v14 going out, v15 with any remaining changes agreed at meeting Meeting Reviewed status and problems Discussed analysis data and desired additional data Discussed benchmark scaling: problem size vs. iteration count, impacts on memory size and I/O content benchmarks to be packaged in zip archives to minimize http connection overhead impact on initial run time Eliminated benchmark candidates by consensus: 203_linpack, 204_newmst, 207_tsp, 212_gcbench, 223_diners, 225_shock Set schedule Keep benchmark gate open to 12/31/97 Intel expects up to 5 additional candidates, real applications Everyone urged to solicit new candidates Member vote planned in March, release in April Discussed how to group and report benchmarks and composite metrics Discussed other run rule issues
The other problem discussed at length was timing variability, from one run to another, and from execution to execution in an autorun sequence. Most of the problems were observed with V10 and earlier where a large console buffer was occupying memory and causing excessive garbage collection activity. In V11 and later the amount of output sent to the console is greatly reduced, you have the option to direct console output to your Java console which may be a file or system console, and you have the option to discard console output. However there were some indications that timing variability problems remain in some cases with V11, particularly in lower memory systems.
Anirudha Rahatekar (Intel) suggested some additional controls and instrumentation around the benchmark executions in order to better control the memory environment and reduce variability. These are noted in the "DEVELOPMENT RELEASES" section below and will be available in V15.
Members agreed to perform some tests of variability and share the results with the group. (If you don't want to release absolute numbers, then you can at least give relative percentages.)
Intel offered to provide additional profile information. The profiles in this chart were collected with the ordinary JDK profile flag, which requires that you measure with JIT turned off. Michael Greene saw substantial differences on some of the benchmarks depending on whether JIT was enabled or not, so it was considered important to be able to look at both. Walter Bays thought that he might also be able to get some profiles with JIT but wasn't sure.
There was general agreement that real applications are more important than synthetic, although it was noted that typically for commercial applications the source code is not available for inspection to see what the program is doing. The situation is more like BaPCO than traditional SPEC CPU benchmarks, and we need to look at their rationales.
There was no agreement on how benchmarks might be grouped. Many felt that there should be some solid basis on which to group benchmarks based on program characteristics or application area. Some suggestions were application/synthetic, integer/floating point, or some combination of those divisions. How or whether to combine sub-metrics into a composite metric in these cases was discussed with no resolution.
"Early" "Late"
Nov
Close gate continue benchmark search
Analyze
Dec
subcommittee vote
OSSC vote Close gate
Jan begin member vote Analyze
Annual meeting Annual meeting
end member vote
Feb subcommittee vote
release OSSC vote
Mar begin member vote
Apr end member vote
release
In the next month everyone is encouraged to redouble their efforts to acquire additional benchmark candidates, particularly real applications. Benchmarks should be fit into the SPEC tool harness. Walter has an action item to send out a guide to the steps needed to do this. In order to give everyone as much time as possible to examine the candidates, and to improve their chances of being accepted into the suite, you should not wait for the last day but send information on any prospective candidates as soon as you have it, and get the benchmark out to committee members as soon as possible. Michael Greene now "owns" the benchmark numbers 232 through 236.
We discussed running short versions (e.g. 10%) of the benchmarks for systems without JIT and with small memories, such as embedded systems or low-end NC's. No resolution was reached. It was thought that perhaps some intermediate problem size (e.g. 20%) would be more appropriate. Attention would have to be paid to both memory size and run time. Perhaps one follow-on benchmark would be able to address both embedded systems and low-end NC's.
_202_jess
Longer 100% workload
_205_raytrace/
Removed spurious output - KMD/NS
_213_javac/
New longer 100% workload - KMD/NS
_214_deltablue/
New shorter 100% workload - KMD/NS
_222_mpegaudio
Fixed validation problem per subcommittee vote on floating point accuracy
_224_richards/
Restored printout of subunit timings for academic purposes - KMD/NS
_227_mtrt
Fixed validation problem where it was dependent on thread order and now
is not
Removed 6 benchmarks eliminated by subcommittee consensus, and put them
into a "Removed" group. There were other changes to some of them as well
but these are not particularly important now.
_203_linpack
_204_newmst
New workload. Some double changed to float - KMD
_207_tsp
_212_gcbench
New longer 100% workload. Sized to still fit in 30MB heap space - KMD/NS
_223_diners
_225_shock
These remain in the "Removed" category in case someone wants to work on revising/combining them to try again to get the committee's approval. Even though removed, we all owe these benchmark authors a big thank-you for their effort, and for the beneficial effects their benchmarks have had on JVM's already during suite development. Should any author not wish to have his code remain in the "Removed/work-in-progress" category, we will pull it from the next release and all SPEC members will be asked to delete all copies of that benchmark in their posession. I will be contacting the authors on that subject soon. As a corrolary, if any of these benchmarks provides you with useful insights on your systems' performance and you would like to retain access to it, then it is in your interest to contact the author and work with her on addressing its' shortcomings for the suite. (Note also that some of these are freely available on the net.)
Version 15 should include changes to the tool harness agreed in last week's meeting, primarily aimed at the issue of timing variability.