Subject: Vectorization Cost Models and Multi-Instruction Patterns?
Date: Monday 19th January 2015 23:50:38 UTC (over 3 years ago)
Hi all, While tinkering with saturation instructions, I hit problems with the cost model calculations. The loop vectorizer cost model accumulates the individual TTI cost model of each instruction. For saturating arithmetic, this is a gross overestimate, since you have 2 sexts (inputs), 2 icmps + 2 selects (for the saturation), and a truncate (output); these all fold alway. With an intrinsic, you'd end up with a better estimate; however, I'm trying to see what problems we would encounter without intrinsics, and I think this is the biggest one. Note that AFAICT, costs for min/max patterns (icmp+iselect) are also overestimated, but not as much as saturate. Proposal: Add a method, part of the vector API of TargetTransformInfo, for multi-instruction cost computation. It would take a scalar Instruction, and a reference to a set of Instruction. If it's able to match a min/max/saturate/.., it adds all the matched instructions to the set, so the caller (say LoopVectorizationCostModel) can ignore them. But: - this all seems icky: a very blunt hammer. - what, if anything, should we do about legality checks? The expanded IR equivalent of a saturate uses larger types than necessary, so this might prevent vectorization. In practice it doesn't, because only load/store/PHI types are checked there. - is this useful in other cases, beyond min/max (maybe abs ?) and saturate? Thanks! -Ahmed