(4) Someone wrote this folding in the past. Even assuming that shld/shrd instructions indeed have very high latency for all modern Intel's processors, we still should respect "older" processors and make the support for the new ones easier (what if new AMD fixes shld issue for their next gen processor?). (3) Having FeatureSlowSHLD is a more flexible approach. FeatureSlowBTMem), that are enabled for all modern Intel and AMD processors, but these features still exist (I suspect for a reason). I've put a FIXME comment in the code, mentioning that we might makes sense to disable this folding for Intel, so there is a clue in the code. #AMD K10 FAMILY CODE#After this patch, changing the code to disable this folding for any particular processor will be very easy (just a couple of lines of code). I'd rather do the change for Intel when I'm 100% sure or let someone else who cares about performance of shld/shrd on any of the Intel's processors (and who knows what he is doing :)) to make this change. So far, I haven't heard a recommendation from a person who is intimately familiar with Intel's architecture. However, I don't want to change the code purely based on my "feelings". I have a feeling that all other modern Intel processors will fall into this category as well. Based on my friend's performance measurements it seems that Ivy Bridge microarchitecture is a good candidate, but that's still needs to be confirmed (that's why I even haven't changed the code for Ivy Bridge). (1) I don't really know which Intel architectures have very poor latency for shld/shrd. There were several reasons for adding FeatureSlowSHLD: When autodetecting subtarget features - set IsSHLDSlow to true for AMD processors. Have high latencies and we are not optimizing for size. Known to have SHLD/SHRD instructions with very poor latency.Įnabled this feature for all AMD's family K8-K16 architectures.ĭon't fold (or (x > (64 - c))) if SHLD/SHRD instructions Introduced a new feature FeatureSlowSHLD that should be set up for the architectures that are > search for "Software Optimization Guide for AMD Family 15h Processors" > Couldn't find Optimization guide for AMD Fam 14, but I think shld documentation is applicable for Bobcat as well. #AMD K10 FAMILY SOFTWARE#> Software Optimization Guide for AMD Family 10h and 12h Processors Here are the references to AMD's processors optimization guide:Īthlon, Athlon-tbird, Athlon-4, Athlon-xp, Athlon-mpĪthlon64, Opteron, AMD 64 FX, AMD k8-sse, AMD Athlon64-sse3, AMD Opteron-sse3 If you know which Intel's processors should have a flag "have poor latency for SHLD/SHRD instructions" - please let me know. I would also like to hear from Intel experts. I'd like to get confirmation from the community's AMD experts that family K14 processors have poor latency SHLD/SHRD instructions.Įxperiments on Ivy Bridge showed 15% improvement, when an alternative sequence of instructions was generated (thanks to Dmitry Babokin from Intel for running the performance measurements for me). I couldn't find optimization guides for AMD's processors family K14 and on the Web, but actual performance measurements showed 30% speedup for Bobcat (family K14). Optimization guides for these processors recommend using an alternative sequence of instructions. AMD's processors family K7, K8, K10, K12, K15 and K16 are known to have SHLD/SHRD instructions with very poor latency.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |