Errm, by doing simple tricks it is possible to reduce amount of branches, or simplify them so the predictor is more likely to predict it right.
Parsing data can be done in many ways. The order of instructions in a loop can have impact on the branch prediction success rate and/or cache miss/hit rate, which then directly contribute towards the performance of given code. This all is architecture dependent, e.g. Core2 predicts branches a lot better than K8/K10. K8/K10 does cache misses a lot more often than Core2, which exaggerates the differences between architectures. The data does not necessarily have to be checked every time when it is split into chunks and chunks are being parsed. Chunk size has influence on cache hit/miss rates, parsing algorithm has it's own influence on the flow of data and instructions in the pipeline, instruction order has influence on branch predictor, all this can vary between programs, unless the machine code is exact same. Besides, compiling with different compiler version can generate differences which can easily contribute towards performance, e.g. via branch prediction or data miss/hits. And there is more to the topic than branches and caches.
Offtopic? You choose.![]()




Reply With Quote

Bookmarks