Fast PHP – effective optimisation and bottleneck detection


PHP is not the fastest language on earth. That honour probably goes to machine code. But like many high-level languages, PHP provides some handy abstractions, like named variables, hashmaps (associative arrays), a C-like syntax, object oriented capabilities, loose typing and so on – we trade processing speed for development ease.

So it’s quite a common problem that people find their large PHP web applications running quite slowly.

Here are some frequently encountered bottlenecks found in web applications generally, and PHP specifically:

1) The Database

So often we treat databases like big persistent arrays. They’re not.

First of all, remember that anything that goes in or comes out of the DB is going to have to be transferred to your web server. That’s a network hit. So storing images or other binary data in the database is generally a Bad Idea (for other reasons too). But it’s not just images – all database traffic will happen over the local network, so if you’ve got big chunks of HTML or thousands of rows of data flicking back and forth you need to be aware that that will entail a network hit. mySQL compression can help with this, but be sure to benchmark for your own scripts – it may be that the CPU overhead cancels out the benefit of mySQL compression for your servers.

Secondly, a great many databases are thrown together without much thought for optimisation. You need to look closely at what queries you’ll be running, and optimise the structure of the database for them. Often it’s as simple as adding an index, but sometimes you might need to examine your queries to make sure they’re doing what they should be.

The database is a very very common source of speed problems – you will often find that simply adding an index to a table will solve your troubles. Look to the database first!

Here’s a very neat tip from google’s page:

Avoid doing SQL queries within a loop

They note that it’s faster to run one query like this:

INSERT INTO users (first_name,last_name) VALUES("John", "Doe"),("Jane", "Doe")

Than it is to run two queries like this:

INSERT INTO users (first_name,last_name) VALUES("John", "Doe")
INSERT INTO users (first_name,last_name) VALUES("Jane", "Doe")

A great little SQL optimisation tip. Construct one query in a loop and run it once outside the loop rather than running the query inside the loop.

2) Client-side optimisation.

Often when we hear about “slow web applications”, people are actually talking about page load time, which can often be quite independent of the server-side code optimisation side of things. Take a good look at your images – are they uncompressed PNGs? Do you really lose anything by converting them over to 90% quality JPGs instead? You’ll certainly gain page load speed and reduce bandwidth. Also make sure you’re using gzip compression on the web server where applicable, and think about minifying your css, javascript and html code. Often user complaints about sluggish code is actually more related to the delivery rather than the processing. In certain use cases (eg toolbar icons), using CSS sprites is a good client-side optimisation, but don’t forget about the hit on initial devlopment time.

Install the firebug addons and to get a good look at where the client-side bottlenecks are.

3) Slow Code

PHP is easy. Easy to get wrong that is. There are a lot of posts out there with micro-optimisation techniques, which can work wonders in very specific circumstances, but generally speaking these “tips” will be counter-productive in the long run, forcing you to code un-naturally and making your code resistant to changes. For example, using & to access 1d arrays is faster, but what if you add a dimension later on in development – will you remember that it hurts performance on >1d arrays? And what if the next version of PHP changes that behaviour?

However, there are a few optimisations that are so blindingly obvious, yet often ignored – caching the count() in a loop for instance:

for($i=0 ; $i < count($arr) ; $i++) {
}

versus

for($i=0, $c=count($arr) ; $i < $c ; $i++) {
}

In the first snippet, that count function runs on every iteration. In the second, it runs only once. This really can have a big impact not only on a per-script basis, but also on the server as a whole.

You should really work to make the second style a habit. It’s hardly any different syntactically but it offers a huge potential performance increase.

But aside from a few simple good practice optimisations like that one, how do you tell where the bottlenecks in your PHP code really are?

A lot of people just throw down echo microtime(); in a bunch of likely-looking places and run it a few times. Let me share a tip about code profiling.

Using the xdebug extension, we can get an insight into the speed footprint of every single line of function call in a script. Here’s an example of the output we can get:

xdebug screenshot

We have a nice breakdown of the amount of time (percentage or ms) that each function call takes, how many times it is run, and whether the time is taken up within that function (self) or elsewhere (cumulative). And you can get this breakdown simply by appending ?XDEBUG_PROFILE to the GET request – no code changes required!

Sadly, there is one caveat: it’s not very stable on windows. I would moan at you and tell you that you should be developing on the same architecture as your live servers, but as you can clearly see from the screenshot, I’m still humping Bill Gates too.

Here’s a little post about setting up xdebug and wincachegrind for PHP code profiling. It’s pretty easy to do.

4) Repeated work is inefficient

So many content managed web applications feature code like this:

include "db.php";
include "header.php";

$sql = "SELECT Content FROM Pages WHERE PageID=5";
$res = mysql_query($sql,CN);
if($res) {
        $row = mysql_fetch_assoc($res);
        echo $row['Content'];
} else {
        echo "error fetching content";
}

include "footer.php";

And I mean, that’s one of the nice things about PHP – you can quickly slap together an interface between the DB and HTML – wonderful for rapid prototyping.

But think about it – how often does that page content change? Wouldn’t it be better to link to page.html, and when the administrator hits save in the backend CMS, overwrite page.html with the generated content?

That way the million users who request that page every month don’t even spawn a PHP process – apache can handle that entire transaction. This means the page is served super-quick instead of running the same code with the same input and the same output a million times. Everyone benefits – the server is running less duplicate code, and users get faster web applications.

It’s not just rarely-accessed parts of the site that can be optimised in this way either.

Consider an online store. Well, why not generate a pretty much static copy of the site – it only needs to be updated when product info changes. Obviously you’ll want to look at your own application to see where it makes sense to cache on the disk instead of generating on-the-fly, but it’s an option that’s too often overlooked.

That brings us nicely on to:

5) Caching

This is where the line between web developer and server admin starts to blur, but for large scale applications you will need to start thinking about opcode caching, memcache, squid, mySQL query-cache and other types of cache.

These types of solution are almost always highly tailored to the individual needs of the application, so I will not talk much about it – the chances are that if you need this level of caching, you already know more about it than I do, so I won’t pretend to be an expert in this area. Here are some links/pics from the guys who really do know about this kind of stuff:

That’s all for today. To recap:

  1. Don’t transfer more than you need to across the network.
  2. Make sure your tables are indexed properly.
  3. Don’t write slow code.
  4. Cache rarely-changing output in files on disk.
  5. Draw out your architecture to identify where more serious caching can happen.

Thanks to Daniel at toosweettobesour for taking me to task about the micro-optimisations ;)


Related Posts:

, , , , , , ,

  1. #1 by Canyon on April 18, 2010 - 8:03 am

    for($i=0 ; $i < count($arr) ; $i++) {
    }

    This one++ :

    $count = count ($arr);
    for($i=0 ; $i < $count ; $i++) {
    }

  2. #2 by Daniel on April 19, 2010 - 9:10 am

    I find it ironic there’s an example of micro-optimization (the for loop example) without any evidence what-so-ever of its efficacy and immediately afterwords discusses Xdebug and how profiling can show you your bottlenecks.

    I hate to throw the baby out with the bathwater, but the PHP community is desperately trying to escape the quagmire of micro-optimizations and this article isn’t helping.

    If the article wanted to be more genuine, it would have either (preferably) dropped the for loop segment or it would have compared for loops with foreach loops and provided benchmarking details. And it would address issues like the fact that PHP arrays are not numerically indexed, and said numerical indexes do not have to start at 0 and go in perfect sequential order (making for loops a stabiliy liability when it comes to accessing an array).

    I wish everyone would PLEASE stop trying to outsmart the compiler and instead focus on real issues like algorithm inefficiencies (“By changing how I call my recursive functions I can become O(log n)”).

    And I really wish everyone realize that noone’s sites are going to be as high traffic that they need to worry about single/double quotes, for vs. while loops, etc. It has been shown time and time again that you will hit database and IO bottlenecks long before language bottlenecks that will force you to scale up or out your architecture (thus delaying the possibility of reaching language bottlenecks) so by the time you hit language bottlenecks you’re so massive you’ll have a team of engineers using HipHop (and STILL avoiding these silly little micro-optimizations).

    • #3 by Howard Yeend on April 19, 2010 - 10:26 am

      You’re taking one little example and throwing the rest of the article out along with it.

      I did kinda try to dismiss that as a “simple good practice optimisation” and then try to direct the reader to a more structured approach.

      “compared for loops with foreach loops and provided benchmarking details”

      In an earlier draft I did link to http://www.phpbench.com for that, but decided (as you say) that micro-optimisations aren’t really worth the bother.

      “single/double quotes, for vs. while loops, etc.”

      I agree, that’s why I didn’t speak about them.

      “It has been shown time and time again that you will hit database and IO bottlenecks”

      Yep, that was my very first point in the article :)

      There are indeed a lot of articles out there giving really silly advice (eg using single quotes), to which your comment is quite relevant. But I don’t feel that this article quite warrants your reply.

    • #5 by Trung on April 21, 2010 - 5:25 am

      It will be absolutely awesome to get ur complexity from O(N) -> O(logN). But, hmnnn, I am wondering if there are much space for improvements of algorithm inefficiencies in many web sites (such as CMS/blogs/etc.) apart from big sites like Google/Facebook/YouTube. Also most of such improvements are in database layer i think.

      In this busy & competitive world where time is gold and waiting is frustrating, even an improvement from 1s -> 0.9s is a considerable job.

      In my opinion, micro-optimisation is important (thou absolutely agree that it is much less important than backend/database/client-side optimisations) and doesn’t mean to break the code layout/semantics if we do it properly. That habit also makes developers feel good about their codes (for being optimal) and micro-optimisations improve the whole site considerably if they are applied throughout the whole site.

  3. #6 by Adrian on April 19, 2010 - 9:32 am

    I must agree with Daniel; database access will bring a site to it’s knees long before anything else does.

    If your site is performing something like 20 queries per page, you’re doing it wrong.

    • #7 by Howard Yeend on April 19, 2010 - 10:27 am

      I did say that, right at the top!

      “1) The Database”

  4. #8 by Canyon on April 20, 2010 - 12:52 am

    I think micro optimization are more a way of “well coding”, not real optimization with PHP. I didn’t understand all (cause my dirty english and sorry for that) but i want to say PHP is like that: “you like it or go away…” Php is very limited but very useful for construct website, its principal problem is the core. There are many ways to optimize all that but a good comprehension of all mechanisms are essential: code faster than > procedural faster than>oop and a good way, in your exemple, use PDO instead, for exemple…
    Amically. Regards.

  5. #9 by Trung on April 20, 2010 - 2:45 am

    Another helpful post! Can’t wait for the rest of the article.

  6. #10 by Webdesign - Kim Tetzlaff on January 4, 2011 - 12:30 am

    My first time on your blog and i must say, this is a great post.

    thanks ;)

  7. #11 by billige ure on October 10, 2011 - 11:00 pm

    Thanks it was bery useful to read this…

    I’m new with PHP and i just won to learn it in the right way, so thanks for this

  8. #12 by Nitin Gupta on June 21, 2012 - 2:53 am

    Could you please tell me, how to optimize SQL queries and IO Operations. AS IO are the bottleneck for every system.

Comments are closed.