Search This Blog

Wednesday, February 27, 2013

Wvdial - Wikipedia, the free encyclopedia

Wvdial - Wikipedia, the free encyclopedia:


WvDial (pronounced 'weave-dial' [1]) is a utility that helps in making modem-based connections to the Internet that is included in some important Linux distributions.[2] WvDial is a Point-to-Point Protocol dialer: it dials a modem and starts pppd in order to connect to the Internet.
When WvDial starts, it first loads its configuration from /etc/wvdial.conf and ~/.wvdialrc, which contain basic information about the modem port, speed, and init string, along with information about your ISP, such as the phone number, your user name, and your password.[3]
Then it initializes your modem (strings are sent to the modem) and dials the server and waits for a connection (a CONNECT string from the modem). Any time after connecting, WvDial will start PPP if it sees a PPP sequence from the server or, alternatively, tries to start PPP. If all of this fails, WvDial just runs pppd and hopes for the best.[3]
The connection started with WvDial can be dropped by switching back to the terminal from where it was started and pressing Ctrl-C.
WvDial uses the wvstreams library.

Tuesday, February 26, 2013

11.10 - How to remove all associated files and configuration settings of an app installed through 'force architecture' command - Ask Ubuntu

11.10 - How to remove all associated files and configuration settings of an app installed through 'force architecture' command - Ask Ubuntu:


1- Go to Synaptic Package Manager and Remove [CrossPlatformUI] and all it's dependencies. + May be there still some package will be remain. No problem.
2- As is mentioned above, Run:
gksudo gedit /var/lib/dpkg/status
in terminal.
3- Search in the search area [Package: crossplatformui] You will find four or five line address. I forgot those lines and I can not find them because they are removed. But, any way; follow every line address to find those files.
4- When you find exact files, Right Click on that and choose open as administrator. When file is open, Remove the content of those and save it.
5- Do it for all files. Now go back to Synaptic Package Manager, and mark those package which are remain as completely remove. Synaptic Package Manager will remove those.
6- finally go to terminal again and run again:
gksudo gedit /var/lib/dpkg/status
7- Search for [Package: crossplatformui] again and delete all the informations.
8- You are done. You can check in terminal by running:
sudo apt-get remove CrossPlatformUI
9- As soon as you connect to Internet, run:
sudo apt-get update 
If you were right, there must be no error.
I hope this can help you, and sorry if I have forgot something. I am new to Linux. Good luck.

Monday, February 25, 2013

Easily Turn Your Ubuntu into a Virtual Router | Zhangyou's Blog

Easily Turn Your Ubuntu into a Virtual Router | Zhangyou's Blog:


Do you have a PSP or a Blackberry that supports wifi connection but without an ad-hoc wireless network capacity? I own these two devices and have been struggling to get their wireless work with my laptop’s internet connection. With the default network-manager of nm-applet, you can only create an ad-hoc wireless network share, which cannot be found by these devices. After some Google work, I finally found a solution, that is, hostapd

In Ubuntu 10.04, it seems extremely easy to have this hostapd work. First, of course, you need to install it. It is already in the official repositories. Just run this in the terminal:
sudo apt-get install hostapd
Then, open a text editor program, for example gedit. Copy the following into it.

interface=wlan0
driver=nl80211
ssid=
channel=1
hw_mode=g
auth_algs=1
wpa=3
wpa_passphrase=
wpa_key_mgmt=WPA-PSK
wpa_pairwise=TKIP CCMP
rsn_pairwise=CCMP


Please don’t forget to fill in the name of your network after “ssid=”, as well as the password after “wpa_passphrase=”.

After all these, save the file as hostapd.conf in your home folder.

Now, in your terminal:

sudo hostapd hostapd.conf

Turn the wifi connection on in your devices and enjoy the fast network share!

Friday, February 15, 2013

Agile Toolkit

Agile Toolkit:


This question is often asked, but is never answered properly. So how to measure framework speed? Let me also explain why “scalability” is more important than general “performance”.
The primary goal of Frameworks and Libraries is to make certain things simpler and easier for developers. Frameworks can also be categorized by what it is they’re simplifying:
  • MVC Frameworks help you organize flexible page structure
  • Database frameworks (also called ORM) help you to organize flexible database access interface
  • User Interface frameworks will help you make your user interface consistent
  • Security frameworks will help you introduce security practices in your software.
Frameworks do that by introducing some assumptions, generalizing and also restricting developer to certain practices which framework authors approve of.
It’s fairly obvious to anyone than adding thousands of lines of code to your  ”Hello World” application wouldn’t make it execute faster. The regular argument of a framework authors is that they help development process to be faster and smoother on a larger projects. This often nothing to do with the speed of the code execution: “Developer time is more expensive than CPU time”.
What’s worse is that some frameworks impose standards which slow your code down. An example is a basic Acitve Record implementation, which brings your powerful SQL engine to the level of Simple DB to match the lowest common feature denominator across multiple database vendors. This along with a general overheads of the framework greatly contributes to the “slowness” of your project.
So how can framework contribute to performance of your project?

1. Make Fewer SQL Queries

When your frameworks sends request to a database and waits for result, it introduces latency. The PHP code waits for data to be parsed and executed by SQL and results to be returned. Because today SQL is still the most popular way then your job is to make as little queries as possible. If you could build your page with 10 SQL queries instead of 50, that would speed things up a lot.
Having less number of queries means each query would be more complex and would retrieve more data. Yet we don’t want to write the queries. Your database framework will have to have concept of expressions, joins and sub-selects to be able to off-load as much logic as possible into SQL engine. If number of queries on your page increases along with number of records displayed on the page, then you have produced non-scalable code. If your framework endorses this practice, then your framework is not scalable.

2. Selective render

When you look at the page such as this one you see many different areas with many different types of dynamic data. As it happens with dynamic web pages, some areas of your screen will refresh themselves. There are two approaches to do so
  1. Build a separate JSON service for sending updates
  2. Let framework handle the refresh
The (2) is very tempting, but average frameworks can’t perform web page refreshes graciously, that’s why many developers will have to duplicate some code and build a separate mechanism for JSON polling. This introduces inconsistence and bugs in the user interface. Many developers resort ONLY to JSON polling after the page is loaded increasing number of dynamic requests, making page unusable for search engines, building reliance on javascript and severely reducing total page load time.
For a scalable framework it’s an absolute necessity that the fragment of the page could be inexpensively re-rendered and reloaded with as little database traffic as possible and without additional coding from developer.

3. Parallelization

Suppose you have two different API services behind your web frontend. One service is built to return JSON with list of on-line users. The other service returns the JSON with recent post feed. Your frontend is written in your favorite PHP framework which makes use of templates or some page logic to determine if any of those page areas (or both) needs displaying.
JSON backend introduce some serious latency in your request, so it’s imperative that you your frontend PHP framework would execute both requests in parallel. Unfortunately most of the PHP frameworks want the data pretty much right away, when the page is rendered.
A scalable framework would be capable of identifying which widgets you need to display on your page and allow to execute both JSON requests first before collecting their output.

4. Overheads

Minimization of overheads is often the ONLY practice frameworks concern themselves with. This is where modularity is important. Let’s say the developer X have built a plug-in for WordPress. He decided to use super-fast framework and throughly benchmarked his add-on. Next comes the developer Y who have also built a plug-in for WordPress with a different framework but equally as fast. WordPress now requires to initialize two frameworks introducing huge overheads.
You might have been seen this in practice with a different set of frameworks, such as Zend and CodeIgniter. And then you integrate Facebook API which does not rely on any framework and makes you chew through thousands of line of PHP code which you don’t need anyway. And you also use ORM of your choice which has no dependencies yet re-implements tons of code.
A scalable framework should provide a unified set of modules and practices which are used by all of the plug-ins and add-ons to avoid any code duplication. In practice, decoupled frameworks are very bad for scalability. How many frameworks would provide add-on developer with a consistent API for building user interface?

5. Caching

After making all the possible mistakes in the book, frameworks do implement a caching strategy and brand themselves as a “High Performance”. The caching is not a cure and will introduce complications and limitations. Framework must be fast and scalable without any caching. Yet caching is important too. There are two types of caches: data cache and output cache.
The data cache helps developer to skip some database queries. If your implementation requires you to traverse models on each row (such as to determine URL for user thumbnails), then adding a memcache seems like a good-enough solution to the problem. This does help scalability a little, but essentially displays outdated data and can lead into many nasty problems.
Output caching helps you to cache either parts of the page or all the page completely. Many MVC frameworks wouldn’t have sufficient support for implementing output caching, because they are used to “echo” things out. As developer it becomes your burden to collect output into buffer and build logic for substitution and keeping cache up-to-date.
A scalable framework should handle both tasks flexibly and consistently. It should allow you to exchange your expensive data source (SQL, JSON) with a faster data source such as Memcache gracefully and transparently.
The framework should be equally good at letting you substitute whole areas of the page with a static alternatives without loosing functionality and requiring you to go under the hood of page routing or loosing any dynamic functionality on your page.

Conclusion

Many PHP frameworks sucks at properly understanding scalability principles. Look into a Silex framework and imagine that your project has 200 pages. On every page request, it will execute 200 matching pattern checks which are NOT based on regular expression. How is that scalable? Possibly another caching would help?
I am saddened by the state in which PHP frameworks are today and I believe there are many things which can be improved. I also believe that developers can be educated to have a good judgement and understanding of scalability.
There is huge potential for the next generation of PHP frameworks and I strongly encourage you to write and talk about it.
So should someone ask you, “what PHP framework is the fastest” — tell them that they are all slow, but some of them are “scalable”.

The no-framework PHP MVC framework | Rasmus' Toys Page

The no-framework PHP MVC framework | Rasmus' Toys Page:


Since a lot of people seem to me misunderstanding this article. It isn't about OOP vs. Procedural programming styles. I happen to lean more towards procedural, but could easily have gone more OOP. I simplified the code a bit for brevity, but have added a light OO layer back in the model now. Not that it makes a difference. What I was hoping to get across here is a simple example of how you can use PHP as-is, without additional complex external layers, to apply an MVC approach with clean and simple views and still have all the goodness of fancy Web 2.0 features. If you think I am out to personally offend you and your favourite framework, then you have the wrong idea. I just happen find most of them too complex for my needs and this is a proposed alternative. If you have found a framework that works for you, great.



So you want to build the next fancy Web 2.0 site? You'll need some gear. Most likely in the form of a big complex MVC framework with plenty of layers that abstracts away your database, your HTML, your Javascript and in the end your application itself. If it is a really good framework it will provide a dozen things you'll never need.
I am obviously not a fan of such frameworks. I like stuff I can understand in an instant. Both because it lets me be productive right away and because 6 months from now when I come back to fix something, again I will only need an instant to figure out what is going on. So, here is my current approach to building rich web applications. The main pieces are:

MVC?

I don't have much of a problem with MVC itself. It's the framework baggage that usually comes along with it that I avoid. Parts of frameworks can be useful as long as you can separate the parts out that you need. As for MVC, if you use it carefully, it can be useful in a web application. Just make sure you avoid the temptation of creating a single monolithic controller. A web application by its very nature is a series of small discrete requests. If you send all of your requests through a single controller on a single machine you have just defeated this very important architecture. Discreteness gives you scalability and modularity. You can break large problems up into a series of very small and modular solutions and you can deploy these across as many servers as you like. You need to tie them together to some extent most likely through some backend datastore, but keep them as separate as possible. This means you want your views and controllers very close to each other and you want to keep your controllers as small as possible.

Goals for this approach

  1. Clean and simple design
    • HTML should look like HTML
    • Keep the PHP code in the views extremely simple: function calls, simple loops and variable substitutions should be all you need
  2. Secure
    • Input validation using pecl/filter as a data firewall
    • When possible, avoid layers and other complexities to make code easier to audit
  3. Fast
    • Avoid include_once and require_once
    • Use APC and apc_store/apc_fetch for caching data that rarely changes
    • Stay with procedural style unless something is truly an object
    • Avoid locks at all costs

Example Application

Here is the example application I will be describing.

It is a form entry page with a bit of Javascript magic along with an sqlite backend. Click around a bit. Try to add an entry, then modify it. You will see the server->client JSON traffic displayed at the bottom for debug purposes.

The Code

This is the code layout. It uses AJAX (with JSON instead of XML over the wire) for data validation. It also uses a couple of components from the Yahoo! user interface library and PHP's PDO mechanism in the model.

 
The presentation layer is above the line and the business logic below. In this simple example I have just one view, represented by the add.html file. It is actually called add.php on the live server, but I was too lazy to update the diagram and it really doesn't matter. The controller for that view is called add_c.inc. I tend to name files that the user loads directly as something.html or something.php and included files assomething.inc. The rest of the files in the presentation layer are common files that all views in my application would share.

ui.inc has the common user interface components, common.js contains Javascript helper functions that mostly call into the presentation platform libraries, andstyles.css provides the stylesheet.

A common db.inc file implements the model. I tend to use separate include files for each table in my database. In this case there is a just single table called "items", so I have a single items.inc file.

Input Filtering

You will notice a distinct lack of input filtering yet if you try to inject any sort of XSS it won't work. This is because I am using the pecl/filter extension to automagically sanitize all user data for me.

View - add.html

Let's start with the View in add.html:

The main thing to note here is that the majority of this file is very basic HTML. No styles, or javascript and no complicated PHP. It contains only simple presentation-level PHP logic. A modulus operation toggles the colours for the rows of items, and a loop around a heredoc (<<<) block performs variable substitutions. head() andfoot() function calls add the common template headers and footers.

If you wanted to make it even cleaner you could use an auto_prepend_fileconfiguration setting which tells PHP to always include a certain file at the top of your script. Then you could take out the include calls and the initial head() function call. I tend to prefer less magic and to control my template dependencies right in my templates with a very clean and simple include structure. Try to avoid usinginclude_once and require_once if possible. You are much better off using a straight include or require call, because the *_once() calls are very slow under an opcode cache. Sometimes there is no way around using these calls, but recognize that each one costs you an extra open() syscall and hash look up.

ui.inc

Here is the UI helper code from ui.inc:

This file just contains the head() and foot() functions that contain mostly plain HTML. I tend to drop out of PHP mode if I have big blocks of HTML with minimal variable substitutions. You could also use heredoc blocks here, as we saw inadd.html.

Controller - add_c.inc

Our Controller is in add_c.inc:

Our controller is going to manipulate the model, so it first includes the model files. The controller then determines whether the request is a POST request, which means a backend request to deal with. (You could do further checks to allow an empty POST to work like a GET, but I am trying to keep the example simple.) The controller also sets the Content-Type to application/json before sending back JSON data. Although this mime-type is not yet official so you might want to useapplication/x-json instead. As far as the browser is concerned, it doesn't care either way.

The controller then performs the appropriate action in the model according to the specified command. A load_item, for example, ends up calling the load() method in the data model for the items table and sends back a JSON-encoded response to the browser.
The important piece here is that the controller is specific to a particular view. In some cases you may have a controller that can handle multiple very similar views. Often the controller for a view is only a couple of lines and can easily be placed directly at the top of the view file itself.

common.js

Next I need to catch these JSON replies, which I do in common.js:

The postForm() and postData() functions demonstrate the genius of the Yahoo user interface libraries: they provide us with single-line functions to do our backend requests. The fN function in the callback object does the bulk of the work, taking the JSON replies generated by our controller and manipulating the DOM in the browser in some way. There are also fade() and unfade() functions that are called on status messages, and on validate errors to produce flashing red field effects.

Note the bottom half of this file where fancyItems() and fancyForm() implement all the client-side magic to animate the forms by attaching handlers to various events. Often you will see server-side business logic nicely separated from the templates, but then there are big blocks of complicated client-side Javascript mixed into the template which in my opinion defeats the clean separation goal. By going through and attaching appropriate mouseover, mouseout, focus, blur and click handlers after the fact I can keep my templates extremely clean and still get a very dynamic experience. Here I am using the event library from the Yahoo! user interface libraries to add the handlers.

Model - db.inc

Now for the model. First the generic db.inc which applies to all our model components:

I am using sqlite via PDO for this example, so the connect() function is quite simple. The example also uses a fatal error function that provides a helpful backtrace for any fatal database error. The backtrace includes all the arguments passed to the functions along the trace.

The load_list() function uses an interesting trick: it uses APC's apc_fetch() function to fetch an array containing the list of item categories. If the list isn't in shared memory, I read the file from disk and generate the array. I have made it generic by using a variable variable. If you call it with load_list('categories'), it automatically loads categories.txt from the disk and creates a global array called $categories.

Model - items.inc

Finally, I have the model code for the items table, items.inc:

At the top of each model file, I like to use a comment to record the schema of any associated tables. I then provide a simple class with a couple of methods to manipulate the table: in this case, insert()modify() and load(). Each function checks the database handle property to avoid reconnecting in case I have multiple calls on each request. You could also handle this directly in your connect() method.

To avoid an extra time syscall, I use $_SERVER["REQUEST_TIME"] to retrieve the request time. I am also using PDO's named parameters mechanism, which is cleaner than trying to use question mark placeholders.

Conclusion

Clean separation of your views, controller logic and backend model logic is easy to do with PHP. Using these ideas, you should be able to build a clean framework aimed specifically at your requirements instead of trying to refactor a much larger and more complex external framework.

Many frameworks may look very appealing at first glance because they seem to reduce web application development to a couple of trivial steps leading to some code generation and often automatic schema detection, but these same shortcuts are likely to be your bottlenecks as well since they achieve this simplicity by sacrifizing flexibility and performance. Nothing is going to build your application for you, no matter what it promises. You are going to have to build it yourself. Instead of starting by fixing the mistakes in some foreign framework and refactoring all the things that don't apply to your environment spend your time building a lean and reusable pattern that fits your requirements directly. In the end I think you will find that your homegrown small framework has saved you time and aggravation and you end up with a better product.

Friday, February 8, 2013

What causes svn error 413 Request Entity Too Large? - Stack Overflow

What causes svn error 413 Request Entity Too Large? - Stack Overflow: "LimitXMLRequestBody "


Try to add the following configuration directives to your Apache configuration file:
LimitXMLRequestBody 0
LimitRequestBody 0
IF ABOVE DOESN't WORK THEN TRY here
Upon examining the 'mod_security.conf' configuration file, I discovered that the value 'SecRequestBodyInMemoryLimit' was indeed set to the default 131072 bytes (128 KB). I commented out this line and the problem disappeared! I suspect it will now allow file sizes up to the 1GB hard limit. 
There is another parameter 'SecResponseBodyLimit' which may also need adjusting. This is set to 524 288 bytes (512 KB). 
IF YOU ARE USING NGINX THEN USE this
Add ‘client_max_body_size xxM’ inside the server section, where xx is the size (in megabytes) that you want to allow.

Thursday, February 7, 2013

How to tune MySQL’s sort_buffer_size at Xaprb

How to tune MySQL’s sort_buffer_size at Xaprb:


I perpetually see something like the following:
My server load is high and my queries are slow and my server crashes. Can you help me tune my server? Here is some information.
[random sample of SHOW GLOBAL STATUS, like the query cache counters]
my.cnf:
[mysqld]
key_buffer_size=1500M
query_cache_size= 64M
max_connections = 256
key_buffer = 8M
sort_buffer_size = 100M
read_buffer_size = 8M
delay_key_write = ALL
There are many problems in this my.cnf file, but the sort_buffer_size is a glaring one that identifies the user as someone who should not be playing with live ammunition. Therefore, I have developed an advanced process for tuning sort_buffer_size, which you can follow to get amazing performance improvements. It’s magical.
  1. How expert are you?
    • I know that there is a sort buffer, and that it is related to sort_merge_passes. When sort_merge_passes is high, I have been told to increase the sort_buffer_size. I also know that it is somehow related to the number of sorts the server does, so when there are a lot of sorts shown in variables like Sort_rows and Sort_scan, I think I should also increase it. You are a beginner.
    • I have been administering MySQL for many years. I know that there are two sort algorithms inside MySQL. I know exactly how to optimize the key cache hit ratioYou are a novice.
    • I have read every blog post Peter Zaitsev ever wrote, and I can improve on them all. You are an expert.
  2. Based on your score on the scale above, find your optimal sort_buffer_size tuning algorithm below:
    • Beginners and novices should leave this setting at its default, and comment it out of the configuration file.
    • Experts don’t need me to tell them what to do, but most of them will leave this setting at its default, and comment it out of the configuration file.
The most amazing thing about sort_buffer_size is how many people utterly ruin their server performance and stability with it, but insist that they know it’s vital to change it instead of leaving at its default. I do not know why this is always the case. Why don’t people choose random variables to destroy their performance? It’s not as though there is a shortage to choose from. Why does everyone always pick sort_buffer_size instead of something else? It’s like a flame drawing the moths in.
Feel free to ask questions if anything is unclear, but be prepared for a direct answer if you ask for tuning advice.
PS: I considered a simpler tuning guide, such as Domas’s guide to tuning the query cache, but I am convinced that people need more a complex guide for the sort_buffer_size, or they will not believe in the validity of the instructions. I base this on multiple experiences being paid a lot of money to suggest not setting sort_buffer_size to 256M, and being told that I must be an idiot.

More on understanding sort_buffer_size « MySQL Expert | MySQL Performance | MySQL Consulting

More on understanding sort_buffer_size « MySQL Expert | MySQL Performance | MySQL Consulting:


More on understanding sort_buffer_size

There have been a few posts by Sheeri and Baron today on the MySQL sort_buffer_size variable. I wanted to add some more information about this buffer, what is impacted when it is changed, and what to do about it?
The first thing you need to know is the sort_buffer_size is a per session buffer. That is this memory is assigned per connection/thread. I’ve seen clients that set this assuming it’s a global buffer Don’t Assume – Per Session Buffers.
Second, internally in the OS usage independently of MySQL, there is a threshold > 256K. From Monty Taylor “if buffer is set to over 256K, it usesmmap() instead of malloc() for memory allocation. Actually – this is a libc malloc thing and is tunable, but defaults to 256k. From the manual:” . He goes on in a further to shows that impact > 256K for a buffer is 37x slower. This applies to all per session buffers, not just sort buffer. Now I have heard recently about this limit being 512K. I wasn’t able to nail down the specific speaker to see if this was a newer library or kernel or OS.
With MySQL instrumentation and the sort_buffer_size we are lucky, there is the Sort_merge_passes status variable. While it’s not perfect, it does indicate if the size of the buffer is in-sufficient, however even if we use a sort_buffer_size of say 256K, and you see Sort_merge_passes increasing slowly, does not indicate you have to increase the buffer.
So, all this does not tell you how to tune the buffer? Unfortunately with MySQL there is no actual easy answer. You do need to monitor the mysqld memory usage overall, especially if you are using persistent connections. A connection/thread will not release the memory assigned until it is closed, so it’s important to monitor for memory creep of the PGA, knowing what your initial SGA is. Morgan Tocker wrote a patch in Bug #33540 to create a RESET CONNECTION type command.
You do need to look for memory as a bottleneck. You need to learn how MySQL use memory, not just the sort_buffer_size. I actually started many years ago to write global/session variables to indicate when buffers were used, and how much and I started with the sort_buffer_size which was buried down in some very old filesort code. When I sought the input of an expert C coder around this, they wondered how the code, especially a loop handler even actually worked.
Nobody knows what the optimal setting is, and that’s the problem. In certain areas especially memory usage the MySQL instrumentation is simply non-existent, and I’d like to see this as something that is fixed.
In conclusion, if I ever see a sort_buffer_size above 256K, e.g. 1M or 2M, I always reset it to 256K. My reasoning is simple. Until you have evidence in your specific environment increasing the buffer makes performance better, it’s better to use a smaller value. There are bigger wins, like not using sorting, or better design, or even better simplifying or eliminating SQL.

How fast can you sort data with MySQL ? - MySQL Performance Blog

How fast can you sort data with MySQL ? - MySQL Performance Blog: "set sort_buffer_size=100000;"


I took the same table as I used for MySQL Group by Performance Tests to see how much MySQL can sort 1.000.000 rows, or rather return top 10 rows from sorted result set which is the most typical way sorting is used in practice.
I tested full table scan of the table completes in 0.22 seconds giving us about 4.5 Million of rows/sec. Obviously we can’t get sorted result set faster than that.
I placed temporary sort files on tmpfs (/dev/shm) to avoid disk IO as a variable as my data set fits in memory anyway and decided to experiment with sort_buffer_size variable.
The minimum value for sort_buffer_size is 32K which gives us the following speed:
mysql> select * from gt order by i desc limit 10;
+--------+------------------------------------------+
| i      | c                                        | 
+--------+------------------------------------------+
| 100000 | 635e8e8f8e3b9dc547bbd3deaadb1f297f691729 |
| 100000 | 0a7750a1393e77a2871ecfb39d5032d0b0f7c37c |
| 100000 | 0db0601036fb9d1d5e17631d4d1bed9149675bb3 | 
| 100000 | eb6d2b5ed1897bdd0ff6e22ee1b44814ffb8f912 |
| 100000 | 1bff67cc134e316dad5370de38020bef818ec45c |
|  99999 | 635da2e73d88dbe5f7297253680398e58d32ff65 |
|  99999 | a1feec5f8ee6c6a96723a2a0b57c418bb3ced929 | 
|  99999 | 72b934f76863791f740b96858d5acb6a60459644 |
|  99999 | 855b47aaa25054e77dcc27de5def8de1e265f371 |
|  99999 | 81980bcd9dbaa565f22a93ce1faf9e9d53407f0a |
+--------+------------------------------------------+ 
10 rows in set (0.56 sec)
Not bad ! Even though MySQL does not optimize “get top N sorted rows” very well it takes just 2.5 times longer than full table scan to get the data. And this is with minimum sort_buffer allowed when a lot of sort merge passes are required for sort completion:
mysql> show status  like "sort%";
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Sort_merge_passes | 321   |
| Sort_range        | 0     |
| Sort_rows         | 10    |
| Sort_scan         | 1     |
+-------------------+-------+
4 rows in set (0.00 sec)
As you can see from this show status output MySQL only counts completely sorted rows in Sort_rows variable. In this case 1.000.000 of rows were partially sorted but only 10 rows fetched from the data file and sent and only they are counted. In practice this means Sort_rows may well understate sort activity happening on the system.
Lets now increase sort_buffer_size and see how performance is affected:
set sort_buffer_size=100000;

mysql> select * from gt order by i desc limit 10;
10 rows in set (0.44 sec)

mysql> show status  like "sort%";
+-------------------+-------+
| Variable_name     | Value | 
+-------------------+-------+
| Sort_merge_passes | 104   |
| Sort_range        | 0     |
| Sort_rows         | 10    |
| Sort_scan         | 1     |
+-------------------+-------+
4 rows in set (0.00 sec)
OK raising sort_buffer_size to 100K gives quite expected performance benefit, now we’re just 2 times slower than table scan of the query and considering table size was about 60MB we have 120MB/sec sort speed, while 2.000.000 rows/sec is of course more relevant in this case.
Still a lot of sort merge passes lets go with even higher buffer sizes.
set sort_buffer_size=1000000;

mysql> select * from gt order by i desc limit 10;
10 rows in set (0.70 sec)

mysql> show status  like "sort%";
+-------------------+-------+
| Variable_name     | Value | 
+-------------------+-------+
| Sort_merge_passes | 10    |
| Sort_range        | 0     |
| Sort_rows         | 10    |
| Sort_scan         | 1     |
+-------------------+-------+
4 rows in set (0.00 sec)

set sort_buffer_size=10000000;

mysql> select * from gt order by i desc limit 10;
10 rows in set (1.34 sec)

mysql> show status  like "sort%";
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Sort_merge_passes | 1     |
| Sort_range        | 0     |
| Sort_rows         | 10    |
| Sort_scan         | 1     |
+-------------------+-------+ 
4 rows in set (0.00 sec)
Wait it is not right. We’re increasing sort_buffer_size and number of sort_merge_passes decreases appropriately but it does not help sort speed instead it drops 3 times from 0.44sec to do 1.34sec !
Lets try it even higher to finally get rid of sort merge passes – may be it is sort merge which is inefficient with largesort_buffer_size ?
mysql> set sort_buffer_size=100000000;
Query OK, 0 rows affected (0.00 sec)

mysql> select * from gt order by i desc limit 10;
+--------+------------------------------------------+ 
| i      | c                                        |
+--------+------------------------------------------+
| 100000 | eb6d2b5ed1897bdd0ff6e22ee1b44814ffb8f912 |
| 100000 | 635e8e8f8e3b9dc547bbd3deaadb1f297f691729 | 
| 100000 | 1bff67cc134e316dad5370de38020bef818ec45c |
| 100000 | 0db0601036fb9d1d5e17631d4d1bed9149675bb3 |
| 100000 | 0a7750a1393e77a2871ecfb39d5032d0b0f7c37c |
|  99999 | 41f091f4074717bf80d2b1a788e6a4a122057d11 | 
|  99999 | 049d9591ef0f584deaaf0433c0f3eda8631bdb85 |
|  99999 | 72b934f76863791f740b96858d5acb6a60459644 |
|  99999 | f0a42a16a41b4249da7c31f2d9556f05622a87b4 |
|  99999 | 35de8ae483779e6024c51998eb5b5e69e02eb74c | 
+--------+------------------------------------------+
10 rows in set (1.55 sec)

mysql> show status  like "sort%";
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+ 
| Sort_merge_passes | 0     |
| Sort_range        | 0     |
| Sort_rows         | 10    |
| Sort_scan         | 1     |
+-------------------+-------+
4 rows in set (0.00 sec)
Nope. We finally got rid of sort_merge_passes but our sort performance got even worse !
I decided to experiment a bit further to see what sort_buffer_size is optimal for given platform and given query (I did not test if it is the same for all platforms or data sets) – The optimal sort_buffer_size in this case was 70K-250K which is quite smaller than even default value.
The CPU in question was Pentium 4 having 1024K of cache.
A while ago I already wrote what large buffers are not always better but I never expected optimal buffer to be so small at least in some conditions.
What do we learn from these results:
  • Benchmark your application Unfortunately general tuning guidelines can be wrong for your particular case, or generally wrong because they tend to reprint the manual which is often written based on theoretical expectations rather than supported by large amount of testing.
  • sort_merge_passes are not that bad. Setting your sort_buffer_size large enough so there is zero sort_merge_passes may not be optimal.
  • World is full of surprises I obviously did not expect to get such results, and this is not any exception. Even spending a lot of time optimizing and otherwise working with MySQL I continue to run in results which surprise me. Some are later expected others come from underlying bugs and later fixed.