Features Download
From: Yi Kong <Yi.Kong <at> arm.com>
Subject: RFC:LNT Improvements
Newsgroups: gmane.comp.compilers.llvm.devel
Date: Tuesday 29th April 2014 22:49:37 UTC (over 3 years ago)
Dear all,

Following the Benchmarking BOF from 2013 US dev meeting, I’d like to
propose some improvements to the LNT performance tracking software.

The most significant issue with current implementation is that the report
is filled with extremely noisy values. Hence it is hard to notice
performance improvements or regressions.

After investigation of LNT and the LLVM test suite, I propose following
methods. I've also attached prototype patches for each method.
- Increase the execution time of the benchmark so it runs long enough to
avoid noisy results
        Currently there are two options to run benchmarks, namely small and
large problem size. I propose adding a third option: adaptive. In adaptive
mode, benchmarks scale the problem size according to pre-measured system
performance value so that the running time is kept at around 10 seconds,
the sweet spot between time and accuracy. The downside is that correctness
for some benchmarks cannot be measured. Solution is to measure correctness
in a separate board with small problem size.
        LNT: [PATCH 2/3] Add options to run test-suite in adaptive mode
        Test suite: [PATCH 1/2] Add support for adaptive problem size
                        [PATCH 2/2] A subset of test suite programs
modified for adaptive
- Show and graph total compile time
        There is no obvious way to scale up the compile time of individual
benchmarks, so total time is the best thing we can do to minimize error.
        LNT: [PATCH 1/3] Add Total to run view and graph plot
- Only show performance changes with high confidence in summary report
        To investigate the correlation between program run time and its
variance, I ran Dhrystone of different problem size multiple times. The
result shows that some fluctuations are expected and shorter tests have
much greater variance. By modelling the run time to be normally
distributed, we can calculate the minimal difference for statistical
significance. Using this knowledge, we can hide those results with low
confidence level from summary report. They are still available and marked
in colour in detailed report in case interested.
        LNT: [PATCH 3/3] Ignore tests with very short run time
- Make sure board has low background noise
        Perform a system performance benchmark before each run and compare
the value with the reference(obtained during machine set-up). If the
percentage difference is too large, abort or defer the run. In prototype
this feature is implemented using Bash script and not integrated into LNT.
Will rewrite in Python.
        LNT: benchmark.sh

In my prototype implementation, the summary report becomes much more
useful. There are almost no noisy readings while small regressions are
still detectable for long running benchmark programs. The implementation is
backwards compatible with older databases.

Screenshots from a sample run is attached.

Thanks for reading!

-- IMPORTANT NOTICE: The contents of this email and any attachments are
confidential and may also be privileged. If you are not the intended
recipient, please notify the sender immediately and do not disclose the
contents to any other person, use it for any purpose, or store or copy the
information in any medium.  Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
Registered in England & Wales, Company No:  2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
Registered in England & Wales, Company No:  2548782
CD: 6ms