Posts Tagged array_unique
Fast PHP array_unique for removing duplicates
Posted by Howard Yeend in PHP on June 21, 2010
PHP’s native dedupe function, array_unique, can be slow for large amount of input. Here I’m going to talk about a simple function that performs the same task but which runs a lot faster.
Often, people spout PHP optimisation advice that is incredibly misguided, so I want to make it clear up-front that you should be benchmarking your scripts using xdebug before you start optimising, and very often the real bottlenecks cannot be avoided by using “micro-optimization”. Here’s a nice PDF on Common Optimization Mistakes.
Now I’ve got that disclaimer out of the way, I believe it’s OK to talk about making a faster array_unique because it’s often used on large amounts of data – for example removing duplicate email addresses, and as such there are genuine use cases for a fast array_unique in PHP. Also our fast_unique function can be several seconds faster than array_unique so we’re not necessarily talking about micro-optimisation here; on 200k duplicate entries, array_unique takes nearly 5 seconds on my windows dev box, fast_unique takes <1 second on the same data. On 2 million entires the results are staggering – performance graphs are given below (but then, why are you doing that kind of work in PHP?).
The function looks like this:
Read the rest of this entry »
Recent Comments