Search This Blog

Friday, February 15, 2013

Agile Toolkit

Agile Toolkit:


This question is often asked, but is never answered properly. So how to measure framework speed? Let me also explain why “scalability” is more important than general “performance”.
The primary goal of Frameworks and Libraries is to make certain things simpler and easier for developers. Frameworks can also be categorized by what it is they’re simplifying:
  • MVC Frameworks help you organize flexible page structure
  • Database frameworks (also called ORM) help you to organize flexible database access interface
  • User Interface frameworks will help you make your user interface consistent
  • Security frameworks will help you introduce security practices in your software.
Frameworks do that by introducing some assumptions, generalizing and also restricting developer to certain practices which framework authors approve of.
It’s fairly obvious to anyone than adding thousands of lines of code to your  ”Hello World” application wouldn’t make it execute faster. The regular argument of a framework authors is that they help development process to be faster and smoother on a larger projects. This often nothing to do with the speed of the code execution: “Developer time is more expensive than CPU time”.
What’s worse is that some frameworks impose standards which slow your code down. An example is a basic Acitve Record implementation, which brings your powerful SQL engine to the level of Simple DB to match the lowest common feature denominator across multiple database vendors. This along with a general overheads of the framework greatly contributes to the “slowness” of your project.
So how can framework contribute to performance of your project?

1. Make Fewer SQL Queries

When your frameworks sends request to a database and waits for result, it introduces latency. The PHP code waits for data to be parsed and executed by SQL and results to be returned. Because today SQL is still the most popular way then your job is to make as little queries as possible. If you could build your page with 10 SQL queries instead of 50, that would speed things up a lot.
Having less number of queries means each query would be more complex and would retrieve more data. Yet we don’t want to write the queries. Your database framework will have to have concept of expressions, joins and sub-selects to be able to off-load as much logic as possible into SQL engine. If number of queries on your page increases along with number of records displayed on the page, then you have produced non-scalable code. If your framework endorses this practice, then your framework is not scalable.

2. Selective render

When you look at the page such as this one you see many different areas with many different types of dynamic data. As it happens with dynamic web pages, some areas of your screen will refresh themselves. There are two approaches to do so
  1. Build a separate JSON service for sending updates
  2. Let framework handle the refresh
The (2) is very tempting, but average frameworks can’t perform web page refreshes graciously, that’s why many developers will have to duplicate some code and build a separate mechanism for JSON polling. This introduces inconsistence and bugs in the user interface. Many developers resort ONLY to JSON polling after the page is loaded increasing number of dynamic requests, making page unusable for search engines, building reliance on javascript and severely reducing total page load time.
For a scalable framework it’s an absolute necessity that the fragment of the page could be inexpensively re-rendered and reloaded with as little database traffic as possible and without additional coding from developer.

3. Parallelization

Suppose you have two different API services behind your web frontend. One service is built to return JSON with list of on-line users. The other service returns the JSON with recent post feed. Your frontend is written in your favorite PHP framework which makes use of templates or some page logic to determine if any of those page areas (or both) needs displaying.
JSON backend introduce some serious latency in your request, so it’s imperative that you your frontend PHP framework would execute both requests in parallel. Unfortunately most of the PHP frameworks want the data pretty much right away, when the page is rendered.
A scalable framework would be capable of identifying which widgets you need to display on your page and allow to execute both JSON requests first before collecting their output.

4. Overheads

Minimization of overheads is often the ONLY practice frameworks concern themselves with. This is where modularity is important. Let’s say the developer X have built a plug-in for WordPress. He decided to use super-fast framework and throughly benchmarked his add-on. Next comes the developer Y who have also built a plug-in for WordPress with a different framework but equally as fast. WordPress now requires to initialize two frameworks introducing huge overheads.
You might have been seen this in practice with a different set of frameworks, such as Zend and CodeIgniter. And then you integrate Facebook API which does not rely on any framework and makes you chew through thousands of line of PHP code which you don’t need anyway. And you also use ORM of your choice which has no dependencies yet re-implements tons of code.
A scalable framework should provide a unified set of modules and practices which are used by all of the plug-ins and add-ons to avoid any code duplication. In practice, decoupled frameworks are very bad for scalability. How many frameworks would provide add-on developer with a consistent API for building user interface?

5. Caching

After making all the possible mistakes in the book, frameworks do implement a caching strategy and brand themselves as a “High Performance”. The caching is not a cure and will introduce complications and limitations. Framework must be fast and scalable without any caching. Yet caching is important too. There are two types of caches: data cache and output cache.
The data cache helps developer to skip some database queries. If your implementation requires you to traverse models on each row (such as to determine URL for user thumbnails), then adding a memcache seems like a good-enough solution to the problem. This does help scalability a little, but essentially displays outdated data and can lead into many nasty problems.
Output caching helps you to cache either parts of the page or all the page completely. Many MVC frameworks wouldn’t have sufficient support for implementing output caching, because they are used to “echo” things out. As developer it becomes your burden to collect output into buffer and build logic for substitution and keeping cache up-to-date.
A scalable framework should handle both tasks flexibly and consistently. It should allow you to exchange your expensive data source (SQL, JSON) with a faster data source such as Memcache gracefully and transparently.
The framework should be equally good at letting you substitute whole areas of the page with a static alternatives without loosing functionality and requiring you to go under the hood of page routing or loosing any dynamic functionality on your page.

Conclusion

Many PHP frameworks sucks at properly understanding scalability principles. Look into a Silex framework and imagine that your project has 200 pages. On every page request, it will execute 200 matching pattern checks which are NOT based on regular expression. How is that scalable? Possibly another caching would help?
I am saddened by the state in which PHP frameworks are today and I believe there are many things which can be improved. I also believe that developers can be educated to have a good judgement and understanding of scalability.
There is huge potential for the next generation of PHP frameworks and I strongly encourage you to write and talk about it.
So should someone ask you, “what PHP framework is the fastest” — tell them that they are all slow, but some of them are “scalable”.

No comments:

Post a Comment