The improvements in latency translates to improved small block throughput. The large block throughput is virtually unaffected by NAND latency and is highly dependent on internal stripe sizes and parallelization.
FastPath also seems to enable further scaling past about 80-100K IOPS, but that's not a relevant case for most common user scenarios, since it's mostly above QD 24-32.
There should be a graph you can look at for scaling at QD 1-8 that i posted, if you look at that, you'll se FP helping out.