Subject: Ottawa Linux Power Management Summit, June 25-26, 2007 - Minutes
Date: Wednesday 5th September 2007 08:26:04 UTC (over 10 years ago)
A Linux Power Management "mini-summit" was held in Ottawa on June 25 and 26, 2007, immediately preceeding the Ottawa Linux Symposium. An effort was made to follow the best-known-method for a Linux mini-summit, thought to be the most recent storage-summit. The invitation to the meeting was open -- sent to [email protected] in early May. The focus of the meeting was on technical discussion. Thus, only presentations which supported discussion were encouraged, and the size of the forum was capped at 20. The agenda was set by consensus of the attendees. Thank you to the Intel Open Source Technology Center for sponsoring the meeting. Day 1 attendees: Len Brown, Intel OTC, Linux Kernel ACPI Maintainer Mark Gross, Intel OTC, embedded Linux team Paul Mundt, Renesas, Linux Kernel Super-H Maintainer Kevin Hilman, MontaVista, MV DPM Maintainer Igor Stoppa, Nokia, OSSO Power Management Sakari Poussa, Nokia, OSSO Power Management Dave Jones, Red Hat, Fedora Maintainer, Linux Kernel Cpufreq Maintainer Klaus Pedersen, Nokia, OSSO Power Management Ken Rozendal, IBM, Linux on Power Vivek Kashyap, IBM LTC Adam Belay, Novell/MIT, cpuidle developer Eugeny S. Mints, NGS Power Management Scott E. Preece, Motorola Marcelo Tosatti, Red Hat, One Laptop Per Child Day 2 additional attendees: Tariq Shureih, Intel OTC, MID power policy manager Rishi Bhattacharya, Texas Instruments Iliasbiris, Instituto de Tecnologia notes: Mark Gross showed off a Classmate PC. The unit he had was a 900MHz Celeron (model 13) Find out more at http://classmatepc.com Mark led a discussion about constraints/quality of service. An application specifies a QOS/SLA to some middle-ware, which translates that into operation constraints. We discussed the vocabulary for constraints. More on this below. Igor Stoppa presented findings from the Nokia Tablet team. The OMAP1 used in the n770 had idle/big-sleep/deep-sleep. The OMAP2 is used in the n800, is built on 90nm technology. The OMAP3 is expected to be built on high leakage 65nm technology, and thus require software to take advantage of power-gating off states. Indeed, the OMAP3 has over 30 power gates. http://linux.omap.com has OMAP Linux resources. http://source.mvista.com hosts OMAP patches before they get to kernel.org Re: Performance States Igor asserted that once a voltage is selected, it is it always the best policy to run at the maximum frequency supported by that voltage. However, the OMAP2 throws Linux a curve ball when increasing the ARM core to its maximum speed, it will _reduce_ the speed of the DSP. Eg. 400MHz and 133MHz respectively. cpufreq doesn't have a concept of this kind of dependency. cpufreq_set_policy() doesn't match Nokia's needs as it is a 1-way notification, and there is no way to register constraints. Igor reported a scaling frequency bug where the current polling interval and minimum residency formulas in ondemand don't work on Nokia's hardware. He also described "spread to deadline" in contrast to "race to idle". In spread-to-deadline, the work is run at the minimum rate such that it will complete in time for a known future deadline. The deadline might be an expected external periodic communication event, for example. Re: pause/resume Total pause/resume on the n800 is 20-80ms. PLL re-lock takes about 0.1ms and the voltage ramp is about 5ms by comparison. The big time consumer is drivers. In particular syncing with screen updates. Paul Mundt contrasted the clock framework with cpufreq, saying that one could build a rate table of all P-state transitions. Though this would need to prototyped to see if it is viable. Marcelo Tosatti shoed off an OLPC XO-1 (http://laptop.org/) It includes a 433MHz AMD Geode LX. (this replaced the previous cache-less Geode GX) The XO-1 has 1G NAND flash 1200x900 LED screen which uses 0.2W min, 1.0 Watts max. These screen power numbers are truly impressive. OLPC wants to aggressively auto-suspend to an suspend-to-RAM like state, except the screen stays on (and wireless stays on). The system wakes upon user-input. The requirement for this state is < 100ms resume latency. Jim Gettys asserts that the iPAQ could resume in 10ms by comparison. Marcelo reports that the XO-1 can resume in 160ms today if USB is disabled. However, if USB is enabled, it resumes in 250ms. He thinks that resume needs to be multi-threaded, and it needs to be smarter so that it doesn't blindly resume every device in the system. XO-1 has a Display Controller (DCON), which will refresh display even when processor completely powered off. Regarding wake, enable_irqwake(irq) is ugly b/c it is IRQ specific. Needs to e enable-wakeup(device) -- a generic API. Audio amplifier must delay ~100ms power-up to avoid a pop. OLPC is not using suspend-to-disk, yet. Discussed the STD vs STR path. The expectation is that STR can be faster if it doesn't follow the same path as STD. Per the list, Rafael is working on this. OLPC is using OHM - Open HW Manager -- a generic system manager, of which power management is just one part. olpc-pm.c olpc_pm_enter() is kicked off by OHM on detecting idle. Dave Jones led a discussion on cpufreq. Re: Accounting vs cpufreq. Enterprise capacity planning applications get confused by cpufreq. cpufreq lowers the MHz due to low demand, the management application sees no idle time left -- indicating that the system has reached capacity and need to be upgraded. Dave commented that the cpufreq conservative governor should be deleted and whatever hooks are needed should simply be added to ondemand. MHz vs scheduler: today cpufreq simply tracks idle time and the schedule is completely unaware that cpufreq changes the frequency. Application hints may be appropriate for apps to tell the scheduler about their MHz needs. Also, the scheduler may be better off scheduling cycles instead of scheduling time. Discussion on APERF/MPERF MSRs on Intel processors: The APERF/MPERF ratio conveys the "actual" to "maximum" MHz ratio since the MSRs were last reset. Note that with Intel Dynamic Acceleration (IDA), this ratio can be greater than 1 -- so maybe "maximum" needs to be re-worded as "marketing":-) governors It isn't clear whey there needs to be a governor per core. It seems to be unused today, except on incorrectly administered systems. user-space: cpuspeed, powernowd not used so much these days. The fabled DPM/PowerOP/cpufreq integration isn't happening fast. Per previous discussion, an abstract notion Operating Points makes the most sense, and perhaps dealing in units of absolute MHz is not the right model. Though users are now accustomed to thinking they know the absolute MHz.... Dave Jones was open to the idea of transforming cpufreq into a generic clock scaling implementation. Dave mentioned that Fedora Core 7 32-bit is now shipping with CONFIG_NOHZ=y and CONFIG_HZ=1000. Kevin Hillman led a discussion on DPM (Dynamic Power Management, http://dynamicpower.sourceforge.net/) DPM has been shipping since Linux-2.4 and is a part of many successful products, so it will continue to be supported. One key aspect of DPM is that it allows customers to put their platform-specific proprietary control code in user-space. DPM has hooks in the scheduler where applications explicitly request an operating state. MontaVista is hoping to migrate to mainline, now that mainline is becoming more capable. In particular, they need solid tickless, cpufreq, and wake-up events. Paul Mundt described the cutting edge in the Super-H space. The SH4A-SMP has 4 cores and it expected to be used in high-end consumer electronics, navigation etc. It has per-core voltage regulation, and CPU offline saves real power. Often ITRON is run on a core. Mark Gross led a discussion on Device QOS Parameters, to see if common language might be suitable, say in a sysfs interface. We brain-stormed on how throughput, rate, power gain, latency, acoustic and timeout applied to various classes of devices; such as storage, wired and wireless networks, and the display. Suspend/Resume: Earlier on the list, Linus stated that he might prefer multiple entry points that do simpler functions rather than the over-loaded .suspend/.resume I/F we have today. Adam Belay described a 2-pass device suspend to ram loop, where .stop is first called for each device before the first .suspend is called: .start .stop dont touch hardware able to return failure .suspend(target state) saves HW state enable wake feature invoke D-state (power-off) [take STD snapshot here] .resume There is also a .reset especially for kexec that can be called after .stop. It removes the IRQ and int src. The .stop loop allows a device to veto the suspend and for the system to quickly back out of the operation. sysfs brainstorm... /sys/class/power/state /sys/device/.../power/state A class could provide default hooks for devices. Tariq Shureih presented an effort to implement a Linux Power Policy Manager (PPM). This effort is primarily intended to fill the needs of Linux mobile-internet-devices (http://moblin.org/) However, nothing limits its use to that market segment. The big question was how this compares to the OHM effort. http://ohm.freedesktop.org The initial answer is that PPM will be BSD licensed, and OHM will be LGPL. Nokia in the lab hopes to replace their proprietary solution with OHM. We discussed powertop. Go to http://linuxpowertop.org/ for the latest. Virtualization. Observed that power management comes "for free" in the hosted model and there is heavy lifting to make it work in the hypervisor model. In particular, Xen, the popular hypervisor-hybrid model currently lacks C-state and P-state support.