Saturday, October 11, 2014

Building Scalable Databases: Pros and Cons of Various Database Sharding Schemes - Dare Obasanjo's weblog

Building Scalable Databases: Pros and Cons of Various Database Sharding Schemes

Database sharding is the process of splitting up a database across multiple machines to improve the scalability of an application. The justification for database sharding is that after a certain scale point it is cheaper and more feasible to scale a site horizontally by adding more machines than to grow it vertically by adding beefier servers.

Why Shard or Partition your Database?

Let's take Facebook.com as an example. In early 2004, the site was mostly used by Harvard students as a glorified online yearbook. You can imagine that the entire storage requirements and query load on the database could be handled by a single beefy server. Fast forward to 2008 where just the Facebook application related page views are about 14 billion a month (which translates to over 5,000 page views per second, each of which will require multiple backend queries to satisfy). Besides query load with its attendant IOPs, CPU and memory cost there's also storage capacity to consider. Today Facebook stores 40 billion physical files to represent about 10 billion photos which is over a petabyte of storage. Even though the actual photo files are likely not in a relational database, their metadata such as identifiers and locations still would require a few terabytes of storage to represent these photos in the database. Do you think the original database used by Facebook had terabytes of storage available just to store photo metadata?
At some point during the development of Facebook, they reached the physical capacity of their database server. The question then was whether to scale vertically by buying a more expensive, beefier server with more RAM, CPU horsepower, disk I/O and storage capacity or to spread their data out across multiple relatively cheap database servers. In general if your service has lots of rapidly changing data (i.e. lots of writes) or is sporadically queried by lots of users in a way which causes your working set not to fit in memory (i.e. lots of reads leading to lots of page faults and disk seeks) then your primary bottleneck will likely be I/O. This is typically the case with social media sites like Facebook, LinkedIn, Blogger, MySpace and even Flickr. In such cases, it is either prohibitively expensive or physically impossible to purchase a single server to handle the load on the site. In such situations sharding the database provides excellent bang for the buck with regards to cost savings relative to the increased complexity of the system.
Now that we have an understanding of when and why one would shard a database, the next step is to consider how one would actually partition the data into individual shards. There are a number of options and their individual tradeoffs presented below – Pseudocode / Joins

How Sharding Changes your Application

In a well designed application, the primary change sharding adds to the core application code is that instead of code such as

//string connectionString = @"Driver={MySQL};SERVER=dbserver;DATABASE=CustomerDB;"; <-- should be in web.config
string connectionString = ConfigurationSettings.AppSettings["ConnectionInfo"];          
OdbcConnection conn = new OdbcConnection(connectionString);
conn.Open();
          
OdbcCommand cmd = new OdbcCommand("SELECT Name, Address FROM Customers WHERE CustomerID= ?", conn);
OdbcParameter param = cmd.Parameters.Add("@CustomerID", OdbcType.Int);
param.Value = customerId; 
OdbcDataReader reader = cmd.ExecuteReader();

the actual connection information about the database to connect to depends on the data we are trying to store or access. So you'd have the following instead

string connectionString = GetDatabaseFor(customerId);          
OdbcConnection conn = new OdbcConnection(connectionString);
conn.Open();
         
OdbcCommand cmd = new OdbcCommand("SELECT Name, Address FROM Customers WHERE CustomerID= ?", conn);
OdbcParameter param = cmd.Parameters.Add("@CustomerID", OdbcType.Int);
param.Value = customerId; 
OdbcDataReader reader = cmd.ExecuteReader();

the assumption here being that the GetDatabaseFor() method knows how to map a customer ID to a physical database location. For the most part everything else should remain the same unless the application uses sharding as a way to parallelize queries.

A Look at a Some Common Sharding Schemes

There are a number of different schemes one could use to decide how to break up an application database into multiple smaller DBs. Below are four of the most popular schemes used by various large scale Web applications today.

Vertical Partitioning: A simple way to segment your application database is to move tables related to specific features to their own server. For example, placing user profile information on one database server, friend lists on another and a third for user generated content like photos and blogs. The key benefit of this approach is that is straightforward to implement and has low impact to the application as a whole. The main problem with this approach is that if the site experiences additional growth then it may be necessary to further shard a feature specific database across multiple servers (e.g. handling metadata queries for 10 billion photos by 140 million users may be more than a single server can handle).
Range Based Partitioning: In situations where the entire data set for a single feature or table still needs to be further subdivided across multiple servers, it is important to ensure that the data is split up in a predictable manner. One approach to ensuring this predictability is to split the data based on values ranges that occur within each entity. For example, splitting up sales transactions by what year they were created or assigning users to servers based on the first digit of their zip code. The main problem with this approach is that if the value whose range is used for partitioning isn't chosen carefully then the sharding scheme leads to unbalanced servers. In the previous example, splitting up transactions by date means that the server with the current year gets a disproportionate amount of read and write traffic. Similarly partitioning users based on their zip code assumes that your user base will be evenly distributed across the different zip codes which fails to account for situations where your application is popular in a particular region and the fact that human populations vary across different zip codes.
Key or Hash Based Partitioning: This is often a synonym for user based partitioning for Web 2.0 sites. With this approach, each entity has a value that can be used as input into a hash function whose output is used to determine which database server to use. A simplistic example is to consider if you have ten database servers and your user IDs were a numeric value that was incremented by 1 each time a new user is added. In this example, the hash function could be perform a modulo operation on the user ID with the number ten and then pick a database server based on the remainder value. This approach should ensure a uniform allocation of data to each server. The key problem with this approach is that it effectively fixes your number of database servers since adding new servers means changing the hash function which without downtime is like being asked to change the tires on a moving car.
Directory Based Partitioning: A loosely couples approach to this problem is to create a lookup service which knows your current partitioning scheme and abstracts it away from the database access code. This means the GetDatabaseFor() method actually hits a web service or a database which actually stores/returns the mapping between each entity key and the database server it resides on. This loosely coupled approach means you can perform tasks like adding servers to the database pool or change your partitioning scheme without having to impact your application. Consider the previous example where there are ten servers and the hash function is a modulo operation. Let's say we want to add five database servers to the pool without incurring downtime. We can keep the existing hash function, add these servers to the pool and then run a script that copies data from the ten existing servers to the five new servers based on a new hash function implemented by performing the modulo operation on user IDs using the new server count of fifteen. Once the data is copied over (although this is tricky since users are always updating their data) the lookup service can change to using the new hash function without any of the calling applications being any wiser that their database pool just grew 50% and the database they went to for accessing John Doe's pictures five minutes ago is different from the one they are accessing now.

Problems Common to all Sharding Schemes

Once a database has been sharded, new constraints are placed on the operations that can be performed on the database. These constraints primarily center around the fact that operations across multiple tables or multiple rows in the same table no longer will run on the same server. Below are some of the constraints and additional complexities introduced by sharding

Joins and Denormalization – Prior to sharding a database, any queries that require joins on multiple tables execute on a single server. Once a database has been sharded across multiple servers, it is often not feasible to perform joins that span database shards due to performance constraints since data has to be compiled from multiple servers and the additional complexity of performing such cross-server.
A common workaround is to denormalize the database so that queries that previously required joins can be performed from a single table. For example, consider a photo site which has a database which contains a user_info table and a photos table. Comments a user has left on photos are stored in the photos table and reference the user's ID as a foreign key. So when you go to the user's profile it takes a join of the user_info and photos tables to show the user's recent comments. After sharding the database, it now takes querying two database servers to perform an operation that used to require hitting only one server. This performance hit can be avoided by denormalizing the database. In this case, a user's comments on photos could be stored in the same table or server as their user_info AND the photos table also has a copy of the comment. That way rendering a photo page and showing its comments only has to hit the server with the photos table while rendering a user profile page with their recent comments only has to hit the server with the user_info table.
Of course, the service now has to deal with all the perils of denormalization such as data inconsistency (e.g. user deletes a comment and the operation is successful against the user_info DB server but fails against the photos DB server because it was just rebooted after a critical security patch).
Referential integrity – As you can imagine if there's a bad story around performing cross-shard queries it is even worse trying to enforce data integrity constraints such as foreign keys in a sharded database. Most relational database management systems do not support foreign keys across databases on different database servers. This means that applications that require referential integrity often have to enforce it in application code and run regular SQL jobs to clean up dangling references once they move to using database shards.
Dealing with data inconsistency issues due to denormalization and lack of referential integrity can become a significant development cost to the service.
Rebalancing (Updated 1/21/2009) – In some cases, the sharding scheme chosen for a database has to be changed. This could happen because the sharding scheme was improperly chosen (e.g. partitioning users by zip code) or the application outgrows the database even after being sharded (e.g. too many requests being handled by the DB shard dedicated to photos so more database servers are needed for handling photos). In such cases, the database shards will have to be rebalanced which means the partitioning scheme changed AND all existing data moved to new locations. Doing this without incurring down time is extremely difficult and not supported by any off-the-shelf today. Using a scheme like directory based partitioning does make rebalancing a more palatable experience at the cost of increasing the complexity of the system and creating a new single point of failure (i.e. the lookup service/database).

Mr. Moore gets to punt on sharding by David of Basecamp

Mr. Moore gets to punt on sharding

Sharding is a database technique where you break up a big database into many smaller ones. Instead of having 1 million customers on a single, big iron machine, you perhaps have 100,000 customers on 10 different, smaller machines.

The general advise on sharding is that you don’t until you have to. It’s similar to Martin Fowler’s First Law of Distributed Object Design: Don’t distribute your objects! Sharding is still relatively hard, has relatively poor tool support, and will definitely complicate your setup.

Now I always knew that the inevitable day would come where we would have no choice. We would simply have to shard because there was no more vertical scaling to be done. But that day seems to get pushed further and further into the future.

Bigger caches, more reads
Our read performance is in some aspect being taken care of by the fact that you can get machines with 256GB RAM now. We upgraded the Basecampdatabase server from 32GB to 128GB RAM a while back and we thought that would be the end of it.

The box was maxed out and going beyond 128GB at the time was stupid expensive. But now there’s 256GB to be had at a reasonable price and I’m starting to think that by the time we reach that, there’ll be reasonably priced 512GB machines.

So as long as Moore’s law can give us capacity jumps like that, we can keep the entire working set in memory and all will be good. And even if we should hit a ceiling there, we can still go to active read slaves before worrying about sharding.

The bigger problem is writes
Traditionally it hasn’t been read performance that caused people to shard anyway. It has been write performance. Our applications are still very heavy on the reads vs writes, so it’s less of a problem than it is for many others.

But with the rise of SSD, like Fusion-IO’s ioDrive that can do 120K IOPS, it seems that we’re going to be saved by the progress of technology once again by the time we’ll need it.

Punt on sharding
So where does that leave sharding? For us, we’re in the same position we’ve been in for the past few years. We just don’t need to pay the complexity tax yet, so we don’t. That’s not to say that sharding doesn’t have other benefits than simply allowing that which otherwise couldn’t be, but the trade is not yet good enough.

One point of real pain we’ve suffered, though, is that migrating a database schema in MySQL on a huge table takes forever and a day. That’s a very real problem if you want to avoid an enterprisey schema full of kludges put in place to avoid adding, renaming, or dropping columns on big tables. Or avoid long scheduled maintenance windows.

I really hope that the clever chaps at MySQL comes up with something more reasonable for that problem, though. I’m told that PostgreSQL is a lot more accommodating in this regard, so hopefully competition will rise all boats for that.

Don’t try to preempt tomorrow
I guess the conclusion is that there’s no use in preempting the technological progress of tomorrow. Machines will get faster and cheaper all the time, but you’ll still only have the same limited programming resources that you had yesterday.

If you can spend them on adding stuff that users care about instead of prematurely optimizing for the future, you stand a better chance of being in business when that tomorrow finally rolls around.

An Unorthodox Approach to Database Design : The Coming of the Shard - High Scalability -

An Unorthodox Approach To Database Design : The Coming Of The Shard

Once upon a time we scaled databases by buying ever bigger, faster, and more expensive machines. While this arrangement is great for big iron profit margins, it doesn't work so well for the bank accounts of our heroic system builders who need to scale well past what they can afford to spend on giant database servers. In a extraordinary two article series, Dathan Pattishall, explains his motivation for a revolutionary new database architecture--sharding--that he began thinking about even before he worked at Friendster, and fully implemented at Flickr. Flickr now handles more than 1 billion transactions per day, responding in less then a few seconds and can scale linearly at a low cost.

What is sharding and how has it come to be the answer to large website scaling problems?

Information Sources

What Is Sharding?

While working at Auction Watch, Dathan got the idea to solve their scaling problems by creating a database server for a group of users and running those servers on cheap Linux boxes. In this scheme the data for User A is stored on one server and the data for User B is stored on another server. It's a federated model. Groups of 500K users are stored together in what are called shards.

The advantages are:

High availability. If one box goes down the others still operate.

Faster queries. Smaller amounts of data in each user group mean faster querying.

More write bandwidth. With no master database serializing writes you can write in parallel which increases your write throughput. Writing is major bottleneck for many websites.

You can do more work. A parallel backend means you can do more work simultaneously. You can handle higher user loads, especially when writing data, because there are parallel paths through your system. You can load balance web servers, which access shards over different network paths, which are processed by separate CPUs, which use separate caches of RAM and separate disk IO paths to process work. Very few bottlenecks limit your work.

How Is Sharding Different Than Traditional Architectures?

Sharding is different than traditional database architecture in several important ways:

Data are denormalized. Traditionally we normalize data. Data are splayed out into anomaly-less tables and then joined back together again when they need to be used. In sharding the data are denormalized. You store together data that are used together.

This doesn't mean you don't also segregate data by type. You can keep a user's profile data separate from their comments, blogs, email, media, etc, but the user profile data would be stored and retrieved as a whole. This is a very fast approach. You just get a blob and store a blob. No joins are needed and it can be written with one disk write.

Data are parallelized across many physical instances. Historically database servers are scaled up. You buy bigger machines to get more power. With sharding the data are parallelized and you scale by scaling out. Using this approach you can get massively more work done because it can be done in parallel.

Data are kept small. The larger a set of data a server handles the harder it is to cash intelligently because you have such a wide diversity of data being accessed. You need huge gobs of RAM that may not even be enough to cache the data when you need it. By isolating data into smaller shards the data you are accessing is more likely to stay in cache.

Smaller sets of data are also easier to backup, restore, and manage.

Data are more highly available. Since the shards are independent a failure in one doesn't cause a failure in another. And if you make each shard operate at 50% capacity it's much easier to upgrade a shard in place. Keeping multiple data copies within a shard also helps with redundancy and making the data more parallelized so more work can be done on the data. You can also setup a shard to have a master-slave or dual master relationship within the shard to avoid a single point of failure within the shard. If one server goes down the other can take over.

It doesn't use replication. Replicating data from a master server to slave servers is a traditional approach to scaling. Data is written to a master server and then replicated to one or more slave servers. At that point read operations can be handled by the slaves, but all writes happen on the master.

Obviously the master becomes the write bottleneck and a single point of failure. And as load increases the cost of replication increases. Replication costs in CPU, network bandwidth, and disk IO. The slaves fall behind and have stale data. The folks at YouTube had a big problem with replication overhead as they scaled.

Sharding cleanly and elegantly solves the problems with replication.

Some Problems With Sharding

Sharding isn't perfect. It does have a few problems.

Rebalancing data. What happens when a shard outgrows your storage and needs to be split? Let's say some user has a particularly large friends list that blows your storage capacity for the shard. You need to move the user to a different shard.

On some platforms I've worked on this is a killer problem. You had to build out the data center correctly from the start because moving data from shard to shard required a lot of downtime.

Rebalancing has to be built in from the start. Google's shards automatically rebalance. For this to work data references must go through some sort of naming service so they can be relocated. This is what Flickr does. And your references must be invalidateable so the underlying data can be moved while you are using it.

Joining data from multiple shards. To create a complex friends page, or a user profile page, or a thread discussion page, you usually must pull together lots of different data from many different sources. With sharding you can't just issue a query and get back all the data. You have to make individual requests to your data sources, get all the responses, and the build the page. Thankfully, because of caching and fast networks this process is usually fast enough that your page load times can be excellent.

How do you partition your data in shards? What data do you put in which shard? Where do comments go? Should all user data really go together, or just their profile data? Should a user's media, IMs, friends lists, etc go somewhere else? Unfortunately there are no easy answer to these questions.

Less leverage. People have experience with traditional RDBMS tools so there is a lot of help out there. You have books, experts, tool chains, and discussion forums when something goes wrong or you are wondering how to implement a new feature. Eclipse won't have a shard view and you won't find any automated backup and restore programs for your shard. With sharding you are on your own.

Implementing shards is not well supported. Sharding is currently mostly a roll your own approach. LiveJournal makes their tool chain available. Hibernate has a library under development. MySQL has added support for partioning. But in general it's still something you must implement yourself.

Scaling Digg and Other Web Applications - High Scalability -

MemcacheDB: Evolutionary Step For Code, Revolutionary Step For Performance

Imagine Kevin Rose, the founder of Digg, who at the time of this presentation had 40,000 followers. If Kevin diggs just once a day that's 40,000 writes. As the most active diggers are the most followed it becomes a huge performance bottleneck. Two problems appear.

You can't update 40,000 follower accounts at once. Fortunately the queuing system we talked about earlier takes care of that.

The second problem is the huge number of writes that happen. Digg has a write problem. If the average user has 100 followers that’s 300 million diggs day. That's 3,000 writes per second, 7GB of storage per day, and 5TB of data spread across 50 to 60 servers.

With such a heavy write load MySQL wasn’t going to work for Digg. That’s where MemcacheDB comes in. In Initial tests on a laptop MemcacheDB was able to handle 15,000 writes a second. MemcacheDB's own benchmark shows it capable of 23,000 writes/second and 64,000 reads/second. At those write rates it's easy to see why Joe was so excited about MemcacheDB's ability to handle their digg deluge.

What is MemcacheDB? It's a distributed key-value storage system designed for persistent. It is NOT a cache solution, but a persistent storage engine for fast and reliable key-value based object storage and retrieval. It conforms to memcache protocol(not completed, see below), so any memcached client can have connectivity with it. MemcacheDB uses Berkeley DB as a storing backend, so lots of features including transaction and replication are supported.

Before you get too excited keep in mind this is a key-value store. You read and write records by a single key. There aren't multiple indexes and there's no SQL. That's why it can be so fast.

Digg uses MemcacheDB to scale out the huge number of writes that happen when data is denormalized. Remember it's a key-value store. The value is usually a complete application level object merged together from a possibly large number of normalized tables. Denormalizing introduces redundancies because you are keeping copies of data in multiple records instead of just one copy in a nicely normalized table. So denormalization means a lot more writes as data must be copied to all the records that contain a copy. To keep up they needed a database capable of handling their write load. MemcacheDB has the performance, especially when you layer memcached's normal partitioning scheme on top.

I asked Joe why he didn't turn to one of the in-memory data grid solutions? Some of the reasons were:

This data is generated from many different databases and takes a long time to generate. So they want it in a persistent store.

MemcacheDB uses the memcache protocol. Digg already uses memcache so it's a no-brainer to start using MemcacheDB. It's easy to use and easy to setup.

Operations is happy with deploying it into the datacenter as it's not a new setup.

They already have memcached high availability and failover code so that stuff already works.

Using a new system would require more ramp-up time.

If there are any problems with the code you can take a look. It's all open source.

Not sure those other products are stable enough.

So it's an evolutionary step for code and a revolutionary step for performance. Digg is looking at using MemcacheDB across the board.

Scaling Digg and Other Web Applications - High Scalability -

Scaling Digg And Other Web Applications

Joe Stump, Lead Architect at Digg, gave this presentation at the Web 2.0 Expo. I couldn't find the actual presentation, but fortunately Kris Jordan took some great notes. That's how key moments in history are accidentally captured forever. Joe was also kind enough to respond to my email questions with a phone call.

In this first part of the post Joe shares some timeless wisdom that you may or may not have read before. I of course take some pains to extract all the wit from the original presentation in favor of simple rules. What really struck me however was how Joe thought MemcacheDBWill be the biggest new kid on the block in scaling. MemcacheDB has been around for a little while and I've never thought of it in that way. Well learn why Joe is so excited by MemcacheDB at the end of the post.

Impressive Stats

80th-100th largest site in the world

26 million uniques a month

30 million users.

Uniques are only half that traffic. Traffic = unique web visitors + APIs + Digg buttons.

2 billion requests a month

13,000 requests a second, peak at 27,000 requests a second.

3 Sys Admins, 2 DBAs, 1 Network Admin, 15 coders, QA team

Lots of servers.

Scaling Strategies

Scaling is specialization. When off the shelf solutions no longer work at a certain scale you have to create systems that work for your particular needs.

Lesson of web 2.0: people love making crap and sharing it with the world.

Web 2.0 sucks for scalability. Web 1.0 was flat with a lot of static files. Additional load is handled by adding more hardware. Web 2.0 is heavily interactive. Content can be created at a crushing rate.

Languages don't scale. 100% of the time bottlenecks are in
IO. Bottlenecks aren't in the language when you are handling so many simultaneous requests. Making PHP 300% faster won't matter. Don't optimize PHP by using single quotes instead of double quotes when
the database is pegged.

Don’t share state. Decentralize. Partitioning is required to process a high number of requests in parallel.

Scale out instead of up. Expect failures. Just add boxes to scale and avoid the fail.

Database-driven sites need to be partitioned to scale both horizontally and vertically. Horizontal partitioning means store a subset of rows on a different machines. It is used when there's more data than will fit on one machine. Vertical partitioning means putting some columns in one table and some columns in another table. This allows you to add data to the system without downtime.

Data are separated into separate clusters: User Actions, Users, Comments, Items, etc.

Build a data access layer so partitioning is hidden behind an API.

With partitioning comes the CAP Theorem: you can only pick two of the following three: Strong Consistency, High Availability, Partition Tolerance.

Partitioned solutions require denormalization and has become a big problem at Digg. Denormalization means data is copied in multiple objects and must be kept synchronized.

MySQL replication is used to scale out reads.

Use an asynchronous queuing architecture for near-term processing.
- This approach pushes chunks of processing to another service and let's that service schedule the processing on a grid of processors.
- It's faster and more responsive than cron and only slightly less responsive than real-time.
- For example, issuing 5 synchronous database requests slows you down. Do them in parallel.
- Digg uses Gearman. An example use is to get a permalink. Three operations are done parallel: get the current logged, get the permalink, and grab the comments. All three are then combined to return a combined single answer to the client. It's also used for site crawling and logging. It's a different way of thinking.
- See Flickr - Do the Essential Work Up-front and Queue the Rest and The Canonical Cloud Architecture for more information.

Bottlenecks are in IO so you have tune the database. When the database is bigger than RAM the disk is hit all the time which kills performance. As the database gets larger the table can't be scanned anymore. So you have to:
- denormalize
- avoid joins
- avoid large scans across databases by partitioning
- cache
- add read slaves
- don't use NFS

Run numbers before you try and fix a problem to make sure things actually will work.

Files like for icons and photos are handled by using MogileFS, a distributed file system. DFSs support high request rates because files are distributed and replicated around a network.

Cache forever and explicitly expire.

Cache fairly static content in a file based cache.

Cache changeable items in memcached

Cache rarely changed items in APC. APC is a local cache. It's not distributed so no other program have access to the values.

For caching use the Chain of Responsibility pattern. Cache in MySQL, memcached APC, and PHP globals. First check PHP globals as the fastest cache. If not present check APC, memcached and on up the chain.

Digg's recommendation engine is a custom graph database that is eventually consistent. Eventually consistent means that writes to one partition will eventually make it to all the other partitions. After a write reads made one after another don't have to return the same value as they could be handled by different partitions. This is a more relaxed constraint than strict consistency which means changes must be visible at all partitions simultaneously. Reads made one after another would always return the same value.

Assume 1 million people a day will bang on any new feature so make it scalable from the start. Example: the About page on Digg did a live query against the master database to show all employees. Just did a quick hack to get out. Then a spider went crazy and took the site down.

Miscellaneous

Digg buttons were a major key to generating traffic.

Uses Debian Linux, Apache, PHP, MySQL.

Pick a language you enjoy developing in, pick a coding standard, add inline documentation that's extractable, use a code repository, and a bug tracker. Likes PHP, Track, and SVN.

You are only as good as your people. Have to trust guy next to you that he's doing his job. To cultivate trust empower people to make
decisions. Trust that people have it handled and they'll take care of it. Cuts down on meetings because you know people will do the job right.

Completely a Mac shop.

Almost all developers are local. Some people are remote to offer 24 hour support.

Joe's approach is pragmatic. He doesn't have a language fetish. People went from PHP, to Python/Ruby, to Erlang. Uses vim. Develops from the command line. Has no idea how people constantly change tool sets all the time. It's not very productive.

Services (SOA) decoupling is a big win. Digg uses REST. Internal services return a vanilla structure that's mapped to JSON, XML, etc. Version in URL because it costs you nothing, for example:
/1.0/service/id/xml. Version both internal and external services.

People don't understand how many moving parts are in a website. Something is going to happen and it will go down.

How Facebook Makes Mobile Work at Scale for All Phones, on All Screens, on All Networks - High Scalability -

How Facebook Makes Mobile Work At Scale For All Phones, On All Screens, On All Networks

When you find your mobile application that ran fine in the US is slow in other countries, how do you fix it? That’s a problem Facebook talks about in a couple of enlightening videos from the@scale conference. Since mobile is eating the world, this is the sort of thing you need to consider with your own apps.

In the US we may complain about our mobile networks, but that’s more #firstworldproblems talk than reality. Mobile networks in other countries can be much slower and cost a lot more. This is the conclusion from Chris Marra, Project Manager at Facebook, in a really interesting talk titled Developing Android Apps for Emerging Market.

Facebook found in the US there’s 70.6% 3G penetration with 280ms average latency. In India there’s 6.9% 3G penetration with 500ms latency. In Brazil there’s 38.6% 3G penetration with more than 850ms average latency.

Chris also talked about Facebook’s comprehensive research on who uses Facebook and what kind of phones they use. In summary they found not everyone is on a fast phone, not everyone has a large screen, and not everyone is on a fast network.

It turns out the typical phone used by Facebook users is from circa 2011, dual core, with less than 1GB of RAM. By designing for a high end phone Facebook found all their low end users, which is the typical user, had poor user experiences.

For the slow phone problem Facebook created a separate application that used lighter weight animations and other strategies to work on lower end phones. For the small screen problem Facebook designers made sure applications were functional at different screen sizes.

Facebook has moved to a product organization. A single vertical group is responsible for producing a particular product rather than having, for example, an Android team try to create all Android products. There’s also a horizontally focussed Android team trying to figure out best practices for Android, delving deep into the details of what makes a platform tick.

Each team is responsible for the end-to-end performance and reliability for their product. There are also core teams looking at and analyzing general performance problems and helping where needed to improve performance.

Both core teams and product teams are needed. The core team is really good at instrumentation and identifying problems and working with product teams to fix them. For mobile it’s important that each team owns their full product end-to-end. Owning core engagement metrics, core reliability, and core performance metrics including daily usage, cold start times, and reliability, while also knowing how to fix problems.

To solve the slow network problem there’s a whole other talk. This time the talk is given by Andrew Rogers, Engineering Manager at Facebook, and it’s titled Tuning Facebook for Constrained Networks. Andrew talks about three methods to help deal with network problems: Image Download Sizes, Network Quality Detection, Prefetching Content.

Overall, please note the immense effort that is required to operate at Facebook scale. Not only do you have different phones like Android and iOS, you have different segments within each type of phone you must code and design for. This is crazy hard to do.

Reducing Image Sizes - WebP Saved Over 30% JPEG, 80% Over PNG

Image data dominates the number bytes that are downloaded from Facebook applications. Accounts for 85% of total data download in Facebook for Android and 65% in Messenger.
Reducing image sizes would reduce the amount of data downloaded and result in quicker downloads, especially on high latency networks (which we learned were typical).
Request an appropriate image size for the viewport
- Resize on the server. Don’t send a large image to the client and then have the client scale the image down to a smaller size. This wastes a lot of bandwidth and takes a lot of time.
- Send a thumbnail (for profile pictures) and a small preview image (this is seen in the newsfeed), and then a full image (seen in photo stories) when a user asks to zoom in it. A low-res phone may never need a full image. Most of the time a thumbnail and a small preview is all you need.
- Significant savings. Scaling a 960 pixel image that is 79KB is size to 240 pixels wide yields a 86% size reduction, to 480 pixels is a 58% size reduction, to 720 pixels is at 23% size reduction.
- With screen sizes becoming larger scaling images isn’t as effective as it used to be. Still worth doing, but the payoff isn’t as much.
Change the image format
- 90% of images sent to Facebook and Messenger for Android use the WebP format.
- WebP format released by Google in 2010.
- WebP saves 7% download size over JPEG for equivalent quality.
- WebP saved over 30% download size over JPEG by tuning quality and compression parameters with no noticeable quality differences. A big savings.
- WebP also supports transparency and animation, which is useful for their Stickers product.
- WebP saves 80% over PNG.
- For older versions of Android the images are transported using WebP but transcoded on the client to JPEG for rendering on the device.
Network Quality Detection - Adjust Behavior To Network Quality
- On the same network technology 2G, 3G, LTE, WiFi, connections speeds in theUS were 2-3x faster than in India and Brazil.
- LTE is often faster than WiFi, so you can’t just base the connection speed on technology.
- Built into the client the ability to:
  - Measure throughput on all large network transfers
  - Facebook servers provide a Round Trip Time (RDD) estimate in the HTTP header in every response
  - The client maintains a moving average of throughput and RTT times to determine network quality.
- Connections are bucketized into the following groups: Poor is < 150kbps. Moderate is 150-600kbps. Good is 600-2000kbps. Excellent is > 2000kbps.
- Feature developers can adjust their behavior depending on the connection quality.
- Some possible responses based on quality:
  - Increase/decrease compression.
  - Issue more/fewer parallel network requests. Don’t saturate the network.
  - Disable/enable auto-play video. Don’t cause more traffic on slow networks.
  - Pre-fetch more content.
- A tool developed at Facebook called Air Traffic Control supports the simulation of different traffic profiles. Each profile can be configured: bandwidth, packet loss, packet loss-correlation, delay, delay correlation, delay jitter. Extremely helpful in finding unexpected behaviour on slow networks.
- There are buckets for videos, but not sure what they are.
Prefetching Content
- Issuing network requests for content ahead of when the content is actually needed on the device.
- Prefetching is especially important on networks with high latency. Waiting to issue a download request for an image the user will be looking at a blank screen.
- Issue network requests for feeds early in the app startup process so the data will be present when the feed is displayed to the user. The download can occur in parallel with other initialization tasks.
- Keep a priority queue of network requests. Don’t block foreground network requests on background network requests or the user won’t see the data they are currently interested in.
- Monitor for overfetching and excess consumption of device resources. Over fetching could fill up the disk or waste a lot of money on the user’s data plan.

Miscellaneous

Client uploading to servers. The idea is to send fewer bytes over the wire from the client to the server, which means resizing images on the client side before they are sent to the server. If an upload fails retry quickly. It’s usually just a problem in the pipe.
There are 20 different APKs (Android application package) for the Facebook app, cut by API level, screen size, and processor architecture.

5 Psychological Strategies for Building a Winning Team Culture - Tech News | Techgig

5 Psychological Strategies for Building a Winning Team Culture

To be an effective leader of a team in the business world, you have to know yourself as well as the strengths, motivations, quirks and downfalls of those working for you. You cannot prompt members of your team to be their most effective to through linear and one-way management. You have to be flexible and take risks yet be conservative when necessary. Most of all you must be someone the members of your team want to work for and impress. Here are five key strategies:

1. Know your own emotions.

To be an efficient leader, you need to know how to manage your emotions. Analyze yourself and identify what your trigger points are. These trigger points will teach you when to act on an emotion and when it's smarter to stay quiet.

Leading a team involves managing a matrix of conflict and balancing measures. You cannot lead effectively unless you can identify the emotions in yourself and members of your team. If you do not know your own emotions, you will not be able to effectively manage the emotions of others. Having understanding and empathy will guide you how to effectively lead others as well as yourself. The the first person to manage is you.

2. Use the mind to manage feelings.

To manage effectively, know that emotions are always more powerful than the mind. When even the most rational people are confronted with intense emotion, they loses the capacity to think straight. Influential leaders understand the power of feelings. They know it is the emotions and not the people they have to lead. Amid the turmoil of events, maintain your presence of mind as a leader.

To establish presence of mind, expose yourself to conflict and learn how to work through the emotions involved to reach a resolution. The more exposed you are to turmoil, the better you will be at seeing the full story and rising above the smallness involved with some emotion. You need to lead your team to see the big picture.

3. Understand the feelings of others.

Emotions follow logical patterns if you know how to examine them. They rise and they fall. When employees' negative emotions are triggered and at their peak, effective management cannot occur and little to no rational thinking can take place. Allow your employees time to calm down and regain composure before you step in.

Should emotions expressing excitement be involved, step in at the peak of their intensity and push hard. This move involves the art of looking beyond the present and calculating ahead. The passage of time can bring learning and presence of mind. The use of timing is a great strategy both for managing conflict and engineering movement.

4. Use emotion to move a team.

Different emotions help people's thinking in different ways. Learn to navigate your own feelings and spot emotions in employees by the signs and patterns that reveal hostility or excitement. Once you have these patterns in mind, be deliberate in how to use that emotion so employees become more deeply motivated.

In this way you can fill your employees with purpose and direction by offering rewards at the end of a win or the close of sale. To ensure continued motivation, be sure you follow through on the rewards offered. Otherwise you will have motivated them to the reward stage but instead built resentment when rewards don't follow as promised. Effective managers live by their word and walk their talk.

4. Channel emotions effectively.

The problem in leading any group is that people inevitably have their own agendas. You have to create an environment in which employees do not feel constrained by your influence yet follow your lead. Create a sense of participation, but do not fall for groupthink so that individual contributions are minimized.

Each person you manage will require something different from you. Motivate each individual in a specific way in an effort to make the whole team better. Lead each individual to do his or her best while encouraging the whole team to seek victory. Teach each person to focus on goal, not to sweat the small stuff of others, and reward each person’s contribution to the whole.

5. Use sentiment to boost morale.

Emotions determine experience and perception. To proficiently manage members of a team, maintain morale by getting them to think less about themselves and more about the group. Involve them in a cause. Let them know how critical the closing of a certain deal will be and how it will affect the company as a whole. The critical elements for building morale are speed and adaptability, the ability to move and make decisions faster than the team and painting that big picture in such a way that employees want nothing more than to have their victory.

Break your team into independent groups of people who can operate on their own. Allow each person to become infused with the spirit of the deal or project at hand, giving that individual a mission to accomplish and then letting him or her run. In this way you transform a goal into a crusade.

Building a winning team culture means managing yourself and others intuitively and intelligently. This requires self-analysis and self-control. Self-analysis enables you to understand yourself and other people. The more a leader knows about emotion the better he or she will be able to guide, inspire and motivate others.

Mustali Kachwala's Blog

Search This Blog

Saturday, October 11, 2014

Building Scalable Databases: Pros and Cons of Various Database Sharding Schemes - Dare Obasanjo's weblog

Why Shard or Partition your Database?

How Sharding Changes your Application

A Look at a Some Common Sharding Schemes

Problems Common to all Sharding Schemes

Mr. Moore gets to punt on sharding by David of Basecamp

Mr. Moore gets to punt on sharding

An Unorthodox Approach to Database Design : The Coming of the Shard - High Scalability -

An Unorthodox Approach To Database Design : The Coming Of The Shard

Information Sources

What Is Sharding?

How Is Sharding Different Than Traditional Architectures?

Some Problems With Sharding

Scaling Digg and Other Web Applications - High Scalability -

MemcacheDB: Evolutionary Step For Code, Revolutionary Step For Performance

Scaling Digg and Other Web Applications - High Scalability -

Scaling Digg And Other Web Applications

Impressive Stats

Scaling Strategies

Miscellaneous

How Facebook Makes Mobile Work at Scale for All Phones, on All Screens, on All Networks - High Scalability -

How Facebook Makes Mobile Work At Scale For All Phones, On All Screens, On All Networks

Reducing Image Sizes - WebP Saved Over 30% JPEG, 80% Over PNG

Network Quality Detection - Adjust Behavior To Network Quality

Prefetching Content

Miscellaneous

5 Psychological Strategies for Building a Winning Team Culture - Tech News | Techgig

5 Psychological Strategies for Building a Winning Team Culture

1. Know your own emotions.

2. Use the mind to manage feelings.

3. Understand the feelings of others.

4. Use emotion to move a team.

4. Channel emotions effectively.

5. Use sentiment to boost morale.