Search This Blog

Friday, December 14, 2012

95th percentile bandwidth metering explained and analyzed | Semaphore Corporation

95th percentile bandwidth metering explained and analyzed | Semaphore Corporation:

95th percentile bandwidth metering explained and analyzed

We frequently see a fair amount of confusion from new customers as to what 95th percentile bandwidth metering and billing is and how it works. While we have a nice text description of how its calculated, I figured since its such a prevalent billing method I'd explain it with a more visual representation, as well as providing some analysis of 95th percentile vs. other potential metering methods and why it is used at all. Hopefully our future posts here will be somewhat more technically interesting, but this topic is at least informative and answers a question we hear a lot.

What is 95th Percentile metering?

The short answer is that 95th percentile is a way to meter bandwidth usage that allows a customer to burst beyond their committed base rate, and still provides the carrier with the ability to scale their billing with the cost of the infrastructure and transit commits (if any). Its an alternative to either capped ports with fixed billing or actual data transferred, which are models more frequently seen outside the datacenter where occasional bursting is either not allowed or penalized with higher bills.
Carriers sample the amount of data transferred on a customer's port(s) every 5 minutes and use that value to derive a data rate (typically in megabits per second or Mbps) for that 5 minute interval. Over the course of a customers monthly billing cycle, around 8000 of these samples are taken. These values are then sorted and ranked by percentile, and the value that falls on the 95th percentile will be the customer's bill for the month if it exceeds their base commit rate. The higher a customer's base commit rate, the lower their per-Mbps cost will be, allowing to bulk purchase of bandwidth and well as less volatility in the event their 95th percentile rate exceeds their base commit. For a fairly normal business traffic pattern, that provides a value that is fair to both the carrier and the customer in terms of service delivered to the customer and the ability of the carrier to scale its infrastructure to meet customer needs over time.
That's all well and good, but without having a month's worth of data in front of you, its hard to tell what exactly that means. I'll provide some samples of traffic patterns, beginning with a fairly typical one, as well as some more abnormal patterns later on, to explain both how 95th percentile works, as well as how it compares with other burstable and non-burstable methods.

95th percentile metering of normal business traffic

Normal Traffic Patterns - Original Data
The above chart shows a fairly normal month of usage for a business customer. Weekend usage is minimal, and weekday usage when smoothed follows a curve during normal business hours. Usage during off-peak hours is minimal, even during the week. The vast majority of customers fall into this pattern. You can also see short bursts throughout the day to nearly double top of the daily curve. These bursts are what 95th percentile is designed to address.
Normal Traffic Patterns - Ranked Data
The chart above shows what the same month of traffic looks like when sorted and ranked. The 95th percentile falls around 6Mbps, or 60% of the highest burst. Above the 95th percentile, the rate increases rapidly, demonstrating the customer's ability to make use of their full available bandwidth on a momentary basis without penalty. (Remember, these are 5-minute samples, the momentary data rate for those bursts might have been as high as the 100Mbps Fast Ethernet line rate for a few seconds.) The curve above is a fairly typical distribution. 

Alternate metering methodologies

There are a couple of other common methodologies for bandwidth metering that commonly seen in the Internet world. Each has its own application for specific types of services. I'll provide a brief outline as to how these methods work and where they are likely to be found.

Committed Information Rate (CIR)

CIR is a guarantee that the port will always have the bandwidth you're paying for available to it. This is similar to the base commit in 95th percentile metering as well. Traditional TDM circuits such as Frame Relay use this method for provisioning so a customer always knows they are going to get. Most customers are familiar with this model through home Broadband services such as Cable, DSL or Home-fiber connections. Somebody purchasing a 7Mbps/3Mbps DSL line will always have 7Mbps of download speed and 3Mbps of upload speed (in theory at least), and they will always pay for that rate. However, they are also not allowed to exceed that rate, it is a hard cap without any allowance for bursting. This allows the ISP to tightly control the amount of bandwidth entering and leaving their network, which is required on large broadband networks where a given segment is likely oversubscribed rather than overprovisioned.
These rates are controlled by the protocol or physical limitations of the circuit itself, or by shaping and policing the rates on the circuit, depending on the technology. Frame Relay (still used, but becoming less and less common) also allows for limited best-effort bursting. A Frame Relay circuit may have a CIR of 128kbps which is guaranteed to be available, and a PIR (Peak Information Rate) of 256kbps which allows for bursting up to double the CIR if excess bandwidth on the carrier network is available. Typically the PIR is no more than double the CIR and is often only available for customer use for a very limited time (either due to traffic control on the carrier network, or due to the bandwidth being unavailable).
Because most carrier connectivity is now ethernet, this model causes problems. Ethernet has no inherent ability to limit throughput at layer-2 like the above protocols and the data rates typically used make shaping more expensive and management more complex. In order to avoid the cost and complexity of controlling the rate at layer-3, the carrier would only be able to provide services at 10Mbps, 100Mbps, 1Gbps or 10Gbps, which are the ethernet protocol's physical caps. Purchasing bandwidth only in these increments would be cost prohibitive and undesirable for most customers.

Actual throughput billing

Billing based on actual throughput is metered similarly to 95th percentile. However, rather than calculating a rate, the ISP will simply record how much data you moved over the circuit for that interval. This value is usually seen in megabytes or gigabytes. This model is typically seen in limited and shared bandwidth networks such as mobile data networks where overprovisioning is not possible due to limited spectrum availability. You also see this method frequently used by web and virtual hosting companies where a given site may not use much bandwidth on a per-second basis so a more granular unit is needed to bill the customer. This method also has the potential for the most volatility, as bursts are not smoothed in any way, they simply are invoiced. This is one of the significant customer advantages for 95th percentile metering vs actual throughput used.

Average data rate

This sounds like it should be a potential method for billing. Take the 5-minute data rates like 95th percentile, and simply calculate the mean data rate for the month. This would certainly have the effect of smoothing out peaks, but it would require significant overprovisioning on the part of the carrier even beyond what is currently typical. To the best of my knowledge, this method is not used for bandwidth metering anywhere. (I'll show why in a later section)

Atypical traffic patterns

The pairs of charts below will show some less typical usage patterns and how they affect 95th percentile metering. We'll use these to perform some analysis on why 95th percentile is used for transit connections vs some other potential methods.

Sustained traffic patterns

Sustained Traffic Patterns - Original Data
Sustained Traffic Patterns - Ranked Data
This traffic pattern looks a good deal like the normal one, except that the average rates in its 24 hour cycle are fairly flat rather than curved. The slope of the sorted data is much more gradual and the area under the curve is greater on a relative scale. However, the bursts still occupy the top 5th percentile and will be clipped prior to billing. This pattern in particular demonstrates the advantage of 95th percentile vs actual throughput for a customer.

Bursty traffic patterns

Bursty Traffic Patterns - Original Data
Bursty Traffic Patterns - Ranked Data
In this traffic pattern, you can see short periods of extremely high bursting with minimal data rates outside these periods. This performance driven traffic pattern is seen for customers who need extremely high throughput but typically don't use it. Most carriers will require a minimum base commit rate for a customer attached to a Gigabit Ethernet port like the one above, so the 95th percentile measurement isn't all that relevant to this traffic pattern. This allows the carrier to provision their network to allow the large bursts required by the customer and still provide a reasonable compromise of price vs performance.

Sustained-burst traffic patterns

Sustained-burst Traffic Patterns - Original Data
Sustained-burst Traffic Patterns - Ranked Data
This unusual traffic pattern shows several long bursts at line rate (100Mbps), putting the peak mostly below the 95th percentile. This is effectively a capped service similar to a DSL model where the customer would commit to 100Mbps to receive the best per-Mbps pricing assuming this is a typical month.

Data Analysis

95th Percentile Data Table
As you can see from the data table above, the 95th percentile measurement smoothed the peaks out effectively in all but the last model. However, the ratio of actual data to 95th percentile data is fairly proportional in all 4 cases. As expected, the 95th percentile calculation is higher for more bursty traffic, but not unfairly so. However, if you compare the Mean Data Rate to 95th Percentile, the mean data rate ends up higher relative to 95th percentile for customers with smoother traffic patterns (more area under the curve). This is exactly the opposite behavior of what is desired.
Using this data, we can compare 95th percentile to the other potential models.

95th Percentile vs CIR


  • Bursting up to line rate available, bandwidth caps can be many times the committed rate
  • Customers can pay for what they use (to some extent) rather than having a bandwidth cap applied
  • Carriers do not need to try to shape or police ethernet traffic down to the customer's commit rate which lowers the carrier's cost basis for infrastructure and management
  • 95th Percentile is industry standard for Tier 1 and Tier 2 settlement-based peering, transit and customer connections


  • 95th percentile is more volatile than CIR on a particularly bursty month and your monthly invoice could cycle dramatically. (Somewhat offset by base commit rates being correctly set)

95th Percentile vs Actual Usage


  • 95th percentile is a good deal less volatile than actual usage, since it is still an averaging method
  • The carrier has guaranteed minimum bandwidth commits that allow them to scale upstream capabilities more consistently
  • Bursty customers are billed more than sustained customers for the same data, which takes into account that the impact of bursty traffic is greater and more difficult to plan for
  • 95th Percentile is industry standard for Tier 1 and Tier 2 settlement-based peering, transit and customer connections


  • 95th percentile results in paying for a lot of idle bandwidth in most circumstances. (Although not as much as CIR)


None! I invite you to draw your own conclusions. =)
I hope the method by which 95th percentile is calculated is a good deal more clear, and the ramifications of various types of traffic on the final number are more obvious.
The one thing I will mention as far as the comparisons between different models go, is that when you compare the three, 95th percentile strikes a balance between the other two methods. Many of the advantages of 95th over CIR are disadvantages against Actual Usage, and vice versa. By design, 95th percentile seeks to draw a compromise between scalability, cost and volatility for both the carrier and the customer. I think it does a fairly good job at that and that the above data bears that out, but no model is perfect.

Thursday, November 8, 2012

High Scalability - High Scalability - Spanner - It's About Programmers Building Apps Using SQL Semantics at NoSQL Scale

High Scalability - High Scalability - Spanner - It's About Programmers Building Apps Using SQL Semantics at NoSQL Scale:

Spanner - It's About Programmers Building Apps Using SQL Semantics At NoSQL Scale

A lot of people seem to passionately dislike the termNewSQL, or pretty much any newly coined term for that matter, but after watching Alex Lloyd, Senior Staff Software Engineer Google, give a great talk on Building Spanner, that’s the term that fits Spanner best.
Spanner wraps the SQL + transaction model of OldSQL around the reworked bones of a globally distributed NoSQL system. That seems NewSQL to me.

As Spanner is a not so distant cousin of BigTable, the NoSQL component should be no surprise. Spanner is charged with spanning millions of machines inside any number of geographically distributed datacenters. What is surprising is how OldSQL has been embraced. In an earlier 2011 talk given by Alex at the HotStorage conference, the reason for embracing OldSQL was the desire to make it easier and faster for programmers to build applications. The main ideas will seem quite familiar:
  • There’s a false dichotomy between little complicated databases and huge, scalable, simple ones. We can have features and scale them too.
  • Complexity is conserved, it goes somewhere, so if it’s not in the database it's pushed to developers.
  • Push complexity down the stack so developers can concentrate on building features, not databases, not infrastructure.
  • Keys for creating a fast-moving app team: ACID transactions; global Serializability; code a 1-step transaction, not 10-step workflows; write queries instead of code loops; joins; no user defined conflict resolution functions; standardized sync; pay as you go, get what you pay for predictable performance.
Spanner did not start out with the goal of becoming a NewSQL star. Spanner started as a BigTable clone, with a distributed file system metaphor. Then Spanner evolved into a global ProtocolBuf container. Eventually Spanner was pushed by internal Google customers to become more relational and application programmer friendly.
Apparently the use of Dremel inside Google had shown developers it was possible to have OLAP with SQL at scale and they wanted that same ease of use and time to market for their OLTP apps. It seems Google has a lot of applications to get out the door and programmers didn’t like dealing with real-world complexities of producing reliable products on top of an eventually consistent system. 
The trick was in figuring out how to make SQL work at truly huge scales. As an indicator of how deep we are still in the empirical phase of programming, that process has taken even Google over five years of development effort. Alex said the real work has actually been in building a complex reliable distributed systems. That’s the hard part to get correct.  
With all the talk about atomic clocks, etc., you might get the impression that there’s magic in the system. That you can make huge cross table, cross datacenter transactions on millions of records with no penalty. That is not true. Spanner is an OLTP system. It uses a two phase commit, so long and large updates will still lock and block, programmers are still on the hook to get in and get out. The idea is these restrictions are worth the programmer productivity and any bottlenecks that do arise can be dealt with on case by case basis. From the talk I get the feeling over time, specialized application domains like pub-sub, will be brought within Spanner's domain. While the transaction side may be conventional, except for all the global repartitioning magic happening transparently under the covers, their timestamp approach to transactions does have a lot of cool capabilities on the read path.

As an illustration of the difficulties of scaling to a large number of replicas per Paxos group, Alex turned to a hydrology metaphor:
You could use a Spanner partition as a strongly ordered pub-sub scheme where you have read-only replicas all over the place of some partition and you are trying to use it to distribute some data in an ordered way to a lot of different datacenters. This creates different challenges. What if you are out of bandwidth to some subset of those datacenters? You don’t want data buffered in the leader too long. If you spill it to disk you don’t want to incur the seek penalty when bandwidth becomes available. It becomes like hydrology. You have all this data going to different places at different times and you want to keep all the flows moving smoothly under changing conditions. Smoothly means fewer server restarts, means better latency tail, means better programming model.
This was perhaps my favorite part of the talk. I just love the image of data flowing like water drops through millions of machines and networks, temporarily pooling in caverns of memory and disk, always splitting, always recombining, always in flux, always making progress, part of a great always flowing data cycle that never loses a drop. Just wonderful.
If you have an opportunity to watch the video I highly recommend that you do, it is really good, there’s very little fluff at all. The section on the use of clocks in distributed transactions is particularly well done. But, in case you are short on time, here’s a gloss of the talk:
  • 5 years of effort.
  • SQL semantics at NoSQL scale.
  • Trying to get an abstraction that looks like a single giant MySQL.
  • Relational databases a very familiar and productive environment to build apps in.
  • Spanner is an existence proof that is possible to scale a relational database to a global distributed storage system.
  • Write your app without thinking about transaction semantics. Just get it right. Then you go back and optimize a few high transaction writes, where the optimization will really pay off.
  • Wanted to offer really straightforward semantics to app developers. App developers should be thinking about business logic and not concurrency.
  • They way they did this was by building clocks with bounded absolute error. Then integrated them with timestamp assignment in concurrency control:
    • Total order of timestamps respects the partial order of transactions. If transaction A happens before transaction B we know transaction A’s timestamp is smaller than transaction B’s timestamp.
    • This implies efficient serializable queries over everything. You can add up to the penny your petabyte database that spans dozens of datacenters. It might take awhile to fetch all the data. But you can expect your answer to be correct.
  • BigTable early NoSQL database. Paper came out in 2006.
  • MegaStore was built on top of BigTable. It added a Paxos synchronization layer and a richer data model. Paper came out in 2011.
  • Spanner in its low level architecture looks alot like BigTable. Updates are appended to logs that live in a datacenter local distributed file system. Periodically they are compacted into immutable b-trees (SSTables) and periodically those SSTables are merged together. Leveldb is an Open Source version.
  • Developers must still think about how to partition data for efficiency, but as much as possible developers should concentrate on business logic.
  • Goal was no down time for data repartitioning. Everything Google does is racing with data movement because moving users between datacenters and resharding is a continuous background activity. Because it is a continuous process all kinds of concurrency bugs kept arising. Transactions helped get their constant repartition stream logic right. Before they had transactions they had a lot of bugs, which is part of why it took 5 years.
  • Wanted programmers to be less bound by partitioning decisions they made early in the design process.
  • Wanted to capture the most successful part of Megastore:
    • Handle large scale datacenter outages without user visible impact.
    • Handle smaller outages, little micro outages internal to a cell. Examples: outage of underlying tablet because it was overloaded or corrupted, power went out on just one rack, etc.
    • Users might see a latency bump as a timer fires and they move over to a different datacenter, but they see no impact on the semantics of the transaction.
  • But Megastore had some problems:
    • The database is partitioned into a bunch of entity groups. Entity groups are their own transaction domain, you can’t do transaction across entity groups.
    • It uses optimistic concurrency. When replicas are spread 50 msecs apart and you are doing synchronous replication the writes will take at least 50 msecs which creates a large vulnerability window for optimistic concurrency failures.
    • Benefits to consolidating layered system into a single integrated system. The interface between Spanner and the physical storage is much richer and more optimized than the interface between Megastore and Bigtable.
  • Cultural shift to SQL
    • SQL based analytics system Dremel made a lot of SQL converts at Google because of it’s power to push the semantics of your query down into the storage system and let it figure out what to do.
    • The culture is so ingrained that you can have scale or you can have SQL. Dremel showed that for analytics you can have both. Spanner shows you can have both for OLTP.
  • Business concerns drove the requirement for easier geographic partitioning. For example, moving user data from one region to another.
    • Legal concerns
    • Product growth means you want to be as efficient as possible how many people you want to put where
  • Spanner’s Data Model
    • Did not always have a relational model. Quite different than what Jeff Dean presented in 2009.
    • Example with a single relational table of Users, each user has a name and home region.
    • The User database can be divided into a couple partitions. One read-only partition in the US and another partition in Europe. A large database will have millions of partitions. Partition 1 will have three replicas in the US. The other partition has three replicas in Europe. There’s a read-only replica in the US. This is useful so you can have a write quorum in Europe which means you are never blocking on transatlantic RPC calls yet you can still query all the data, though it might be a bit old, in the US.
    • Master detail hierarchy is physically clustered together. More important on a distributed system where the odds would be the records would be stored on some other server.
    • Underneath the relational abstraction is how programmers might hand encode the keys into Bigtable. Each table cell just an entry like:
        • Customer.ID.1.Name@11 -> Alice
        • Customer.ID.1.Name@10 -> Alize
        • Customer.ID.1.Order.ID.100.Product@20 -> Camera
        • Customer.ID.2.Name@5 -> Bob
      • The @10 part is the time stamp.
      • Orders for Alice will be stored with Alice before the Bob record begins.
  • Concurrency
    • At a high level Spanner uses a combination two-phase locking and snapshot isolation.
    • Didn’t try to create a crazy new model. Goal with to figure out how to scale proven models.
    • This model is best for read dominated workloads. You spend most of your time in cheap snapshot isolation reads and not a lot of time with pessimist transaction writes.
    • Blogger example of worrying about correctness first, optimization later.
      • Blogger has 280 servlets. Low frequency high complexity operations, like user creates a blog by sending a text message, then they want to merge that blog into an existing blog.
      • It took an embarrassing amount of time to create this as a series of idempotent operations orchestrated by an elaborate work flow.
      • With ACID transactions Blogger would have been faster. Time spent on these complicated programming tasks with no performance benefit could have gone into shaving 50msecs of some high frequency page with a much bigger overall impact.
    • Same process as programming on a single machine. You start with mutexes and only then do you try atomic operations.
    • NoSQL databases that only enforce weak consistency are enforcing a broadly applied premature optimization on the entire system. It should be opt-in for the pages where it is worth it.
  • Preserving commit order on a pattern Google sees a lot
    • Rule of thumb they think about during the design:
      • If T1 finishes before T2 then they want preserve that fact. There’s a commit order dependency between T2 on T1.
      • Say T3 wrote something T4 reads, so there’s a traditional data dependency, so T3 must always happen before T4.
      • T1 and T2 have no relationship between T3 and T4.
      • System performance comes from running transactions concurrently that don’t have dependencies.
      • Goal is to preserve the same dependency order as the original history.
      • Serializability is an overloaded term with a large number of variations.
      • Linearizability is an idea borrowed from concurrent programming and applies well to programming a distributed system on top of a distributed database.
        • Includes serializability and can’t commute commit order.
        • Even when there’s no detectable dependency, if a transaction happens before another that order must be preserved, even when it happens across different machines.
    • Example schema:
      • One partition is table of purchased ads. Write quorum in the US and a read-only replica in Europe.
      • One partition in the US of impressions of these ads. At 2:00 and 2:01 someone viewed a puppies ad, for example.
      • One partition in Europe for the impression of ads.
      • There’s a read-only replica of data from Europe in the US and vice versa. This allows both sides to have stale reads and fast writes without crossing the pond. You could still to a MapReduce efficiently on either side.
    • Example transactions:
      • Transaction 1: a user buys an ad.
      • In the background the ad serving system is continually scanning the partition for ads to show, sending a retrieve ads query.
      • The ad serving system accumulates over time a batch of impressions that it wants to save in the database.
      • Transaction 2: as server writes to impressions partition to record the impressions.
      • These are two different partitions, different replicas, different datacenters, potentially different continents.
    • Now you want write a SQL query to audit the impressions at the hour level. Pick a timestamp.
      • There are only three legal outcomes depending on the timestamp:
        • Sees neither the ad or the impressions.
        • Sees the ad but no impressions.
        • See both the add and all impressions.
      • In all the systems they are replacing the MapReduce or the Query has to tolerate an infinite variation of results.
        • It might see the impressions but not the ad.
        • Writing a query against such weak semantics means it’s difficult to tell the difference between corruption or a bug or concurrency anomaly.
      • The way around this was to serialize every update through a single central server. Works if updates are located together, but does not work in a decentralized model where partitions are all over the world.
  • Options for scaling Spanner’s desired semantics in a global distributed database:
    • One partition model. Lots of WAN communication. Involve all partitions in every transaction.
    • Centralized timestamp oracle. Doesn’t work well if you have updates happening on two different continents at the same time.
    • Lamport clocks. Propagate the timestamp through every external system and protocol. This works if you have few enough systems and few enough protocols, doesn’t work so well when you have a huge legacy of different distributed systems or protocols that you don’t control, like with trading partners or they are just protocols you would rather not touch. Tried several times at Google but were never successful at threading timestamps all the way through a complicated system.
    • Build a distributed timestamp oracle. Had one of these already, TrueTime, produced from a general time cleanup at Google. Time has a time an epsilon so you know the real-time of when you made that now call is somewhere within that interval. Derived from GPS receivers in a bunch of different datacenters, which are backed up by atomic clocks. The GSP system does sometimes have bugs. One code push took down a bunch of satellites so the backup is useful.
  • TrueTime
    • The target invariant: with write A and B, if A happens before B, meaning A finishes before B starts, then A should have a smaller timestamp than B. Finishes means anyone could see the effects. Not just a client, but of a Paxos slave too.
      • With this invariant it means you can say that snapshot reads at a particular timestamp are serializable.
    • TrueTime works analogously to celestial navigation, except that is hard error bounds instead of just guessing.
    • A time daemon in every Google server, every server has a crystal, every datacenter has a few time masters with GPSs from different manufacturers for bug diversity, some of them have atomic clocks to crosscheck the GPSs.
    • Every 30 seconds the daemon talks to the time masters and gets a time fix. in between it dead recons based on its own crystal. The server error margin widens over time, they picked 200 parts-per-million.
    • What time is it? Read time on local machine. Send GetTime to the time master. Return time is T. Read local server time again. You get some delta. Then you dilate that delta for whatever error margin you think is necessary. Back comes an epsilon. Now you can say time is in [t t+e)]. Time is not early than t because there’s a causal relationship from the time master reporting t from your receiving his response. But there’s a lot of slop because it’s possible that message was sent epsilon ago. You don’t know where in the round trip t was generated. The dominant error before it starts drifting is the round trip time to the masters.
    • You can build other systems for distributing time other than GPS. For example, have a blinking LED and pulse a clock to all systems. Use a heartbeating system that periodically talks to a central server and inbetween those heartbeats all of your servers think time is stationary. GPS is there and it works. Atomic clocks are a convenient cross check. And none of the hardware is that expensive.
  • Flow of a Paxos leader for a partition when it receives a commit request from a client
    • Receives a start commit.
    • Acquires transaction locks. Typical of any two phase commit system. Grab read and write locks to make sure conflicting transactions are not executed simultaneously.
    • Pick a time stamp so that the true now.max is greater than the current time.
    • In parallel do two things: run Paxos to get consensus on whatever the write was, wait until the true time is definitely past that write timestamp.
    • Typically Paxos will take longer than the wait so it adds no extra overhead.
    • Notify Paxos slaves to unlock.
    • Ack back to the client.
  • Why does this work? By waiting until the commit timestamp is in the past we are pushing all future transactions into the future into bigger timestamps. The next transaction is guaranteed to have a timestamp bigger than the previous transaction. Every transaction agrees to pick a timestamp that’s bigger than it’s start point and every transaction agrees to defer its commit until its own commit timestamp is in the past.
    • When Paxos goes too quickly or just one replica and you are just committing to local disk the TrueTime epsilon can be large compared to what it would take to commit the write otherwise.
    • Happens when TrueTime epsilon has spiked, as in the case when TrueTime masters have gone down and you have to go to a more remote TrueTime master. Or when Paxos replicas are unusually close.
    • Real world epsilon bounces between 1-7ms.
    • Tail latency went down when networking was improved by having time packets at a higher QoS. < An argument for SDN, having control packets like time and Paxos at a higher QoS through the system.
    • Reduce epsilon by polling time masters more often, poll at a high QoS, improve kernel handling, record timestamps in NIC driver, buy better oscillators, watch out for kernel bugs (power savings mode, which clock you are using).

Read Path

  • Kinds of reads:
    • Within read-modify transaction looks like it does with standard two phase locking except that it is happening at the Paxos leader. Acquire a read lock.
    • Strong reads that are not part of a transaction, where a client will not write based on the reads. Spanner will pick a bigger timestamp and read at that timestamp.
    • Boundedly-stale reads for when you just want to know your data is 5 or 10 seconds old. Spanner will pick the largest committed timestamp that falls within your staleness bounds.
    • MapReduce / batch read - don’t care if data is fresh. Let the client pick a timestamp and say I want to know everything as of noon, for example.
  • Picking timestamps for strong reads:
    • Ask TrueTime what is a time bigger than now. You know that is also bigger than the commit timestamp of all previously committed transactions. Not always the best timestamp to pick because you want to maximize the replicas that are capable of serving the read at that timestamp.
    • Look at commit history. Pick a timestamp from recent writes.
    • Forced to declare the scope of the read up front. For example, these 5 users and this range of products. The idea of scope is above the idea of partitions.
    • Prepared distributed transactions. Because a distributed transaction has to commit at the same timestamp on every partition there is an uncertainty window where you don’t what data is going to commit a timestamp for certain objects.
  • Principle for effective use:
    • Data locality is still important. Reading a bunch of stuff from one machine is better than reading from a bunch of machines. Put customers and orders in the same partition.
    • Big users will span partitions, but there’s no semantic impact at the transaction level when crossing partitions.
    • Design app for correctness. Deal with the hundreds of nitty gritty corner cases that just need to work, that don’t need to be that fast.  For example, how many times a day do you change your gmail filters?
    • Relax the semantics for carefully audited high-traffic queries. Maybe there are cases where a boundedly-stale read will do. The further you can read in the past the more replicas will be able to serve that read.
    • Default semantics in Spanner is that they are linearizable.
    • Using flash backing would allow for millisecond writes.
  • First big user: F1
    • Migrated revenue-critical shared MySQL instances to Spanner. Big impact on Spanner data model. F1 needed a giant MySQL.
    • Spanner started out as Son of BigTable
      • Included a lot of forked BigTable code, with a distributed file system metaphor. You had a tree of directories and each directory was a unit of geographic placement. That doesn’t relate to someone building a database.
      • They added structured keys but in the end they were just building the next Megastore.
      • They decided Spanner needed a richer data model, deciding Spanner was a store for protocol buffers. Spanner would be a giant protocol buffer. This seemed reasonable, but again, it was not how users modeled data.
      • They decided F1 was a better model so they moved in the end to a relational data model. 

What Are They Doing Now?

  • Polishing SQL engine. Timestamps plus iterator position is enough information to restart a cross partition query. If a query will take a minute and in that minute a load balancer moves a tablet you depend on to a different server, or that server is preempted because you are running in a shared cell and the preemption forces those tablets to move, the query does not have to be aborted and the work doesn’t have to be tossed. Moving servers like this is hard to make work, but it’s really valuable to users.
  • Fine grained control over memory usage so you don’t create a distributed deadlock where when a bunch of servers are all depending on each they all need memory to make progress and they all need to make progress to free that memory.
  • Fine grained CPU scheduling. Keep fast queries fast even when a big query that is hoplessly slow comes in. The hopelessly slow query should be timesliced to make the fast queries stay fast. Keep latency tails in.
  • Strong reads based on snapshot isolation are still pretty green. Those are entering production in incrementally more use cases. 
  • Scaling to large numbers of replicas per Paxos group. You could use a Spanner partition as a strongly ordered pub-sub scheme where you have read-only replicas all over the place of some partition and you are trying to use it to distribute some data in an ordered way to a lot of different datacenters. This creates different challenges. What if you are out of bandwidth to some subset of those datacenters? You don’t want data buffered in the leader too long. If you spill it to disk you don’t want to incur the seek penalty when bandwidth becomes available. It becomes like hydrology. You have all this data going to different places at different times and you want to keep all the flows moving smoothly under changing conditions. Smoothly means fewer server restarts, means better latency tail, means better programming model.


  • How do you prove there are no dependencies between transactions? Say someone emails some data to a person and that person clicks on a link based that data that causes a write in another datacenter. It’s very hard to say there is no causal dependency between transactions in two different centers. And they want you to be able relate the ordering of transactions across the whole system on the assumption that there could be causal dependencies between them.
  • Room for eventual consistency and transaction models.
    • Mobile is a case where users are updating local caches and the data makes it’s way back to the servers so you can’t depend on transactions, you have to have some sort eventual consistency mechanism to merge changes and handle conflicts.
    • Google docs allowing concurrent editing is another case. You have five windows open and they are each making local updates. Those updates are asynchronously bubble back to the server. An operational transforms algebra for merging the updates is applied and only then are those updates applied to the canonical copy of the database.

High Scalability - High Scalability - Gone Fishin': 10 Ways to Take your Site from One to One Million Users by Kevin Rose  

High Scalability - High Scalability - Gone Fishin': 10 Ways to Take your Site from One to One Million Users by Kevin Rose  :

Gone Fishin': 10 Ways To Take Your Site From One To One Million Users By Kevin Rose  

This is the post that got me kicked off my original shared hosting service and prompted the move to SquareSpace. I couldn't figure out why so many people were reading this article. But they kept on coming. The site went down and I was told to vamoose.  It finally dawned on me nobody actually cared about the article, it was the name Kevin Rose that was magic. Learned a good lesson about publishing biz...
At the Future of Web Apps conference Kevin Rose (Digg, Pownce, Wefollow) gave a cool presentation on the top 10 down and dirty ways you can grow your web app. He took the questions he's most often asked and turned it into a very informative talk.

This isn't the typical kind of scalability we cover on this site. There aren't any infrastructure and operations tips. But the reason we care about scalability is to support users and Kevin has a lot of good techniques to help your user base bloom.

Here's a summary of the 10 ways to grow your consumer web application:

1. Ego. Ask does this feature increase the users self-worth or stroke the ego? What emotional and visible awards will a user receive for contributing to your site? Are they gaining reputation, badges, show case what they've done in the community? Sites that have done it well: followers. Followers turns every single celebrity as spokesperson for your service. Celebrities continually pimp your service in the hopes of getting more followers. It's an amazing self-reinforcing traffic generator. Why do followers work? Twitter communication is one way. It's simple. Followers don't have to be approved and there aren't complicated permission schemes about who can see what. It means something for people to increase their follower account. It becomes a contest to see who can have more. So even spam followers are valuable to users as it helps them win the game. leader boards. Leader Boards show the score for a user activity. In digg it was based on the number of articles submitted. Encourage people to have a competition and do work inside the digg ecosystem. Everyone wants to see their name in lights. highlight users. Users who submitted stories where rewarded by having their name in a larger font and a friending icon put beside their story submission. Users liked this.

2. Simplicity. Simplicity is the key. A lot of people overbuild features. Don't over build features. Release something and see what users are going to do. Pick 2-3 on your site and do them extremely well. Focus on those 2-3 things. Always ask if there's anything you take out from a feature. Make it lighter and cleaner and easy to understand and use.

3. Build and Release. Stop thinking you understand your users. You think users will love this or that and you'll probably be wrong. So don't spend 6 months building features users may not love or will only use 20% of. Learn from what users actually do on your site. Avoid analysis paralysis, especially as you get larger. Decide, build, release, get feedback, iterate.

4. Hack the Press. There are techniques you can use that will get you more publicity.

Invite only system. Get press by creating an invite only system. Have a limited number of invites and seed them with bloggers.  Get the buzz going. Give each user a limited number of invites (4 or 5). It gets bloggers talking about your service. The main stream press calls and you say you are not ready. This amps the hype cycle. Make new features login-only, accessible only if you log in but make them visible and marked beta on the site. This increases the number of registered users.

Talk to junior bloggers. On Tech Crunch, for example, find the most junior blogger and pitch them. It's more likely you'll get covered.

Attend parties for events you can't afford.  You can go to the after parties for events you can't afford. Figure out who you want to talk to. Follow their twitter accounts and see where they are going.

Have a demo in-hand. People won't understand your great vision without a demo. Bring an iPhone or laptop to show case the demo. Keep the demo short, 30-60 seconds. Say: Hey, I just need 30 seconds of your time, it's really cool, and here's why I think you'll like it. Slant it towards what they do or why they cover.

5. Connect with your community.

Start a podcast. A big driver in the early days of Digg. Influencers will listen and they are the heart of your ecosystem.

Throw a launch party and yearly and quarterly events. Personally invite influencers and their friends. Just have a party at a bar. Throw them around conferences as people are already there.

Engage and interact with your community.

Don't visually punish users. Often users don't understand bad behaviour yet as they think they are just playing they game your system sets up. Walk through the positive behaviours you want to reinforce on the site.

6. Advisors. Have a strong group of advisors. Think about which technical, marketing and other problems you'll have and seek out people to help you. Give them stock compensation. A strong advisory team helps with VCs.

7. Leverage your user base to spread the world.

FarmVille. tells users when other players have helped them and asks the player to repay the favor. This gets players back into the system by using a social obligation hack. They also require having a certain number of friends before you expand your farm. They give away rare prizes.

Wefollow. Tweets hashtags when people follow someone else. This further publicizes the system. They also ask when a new user hits the system if they wanted to be added to the directory, telling the user that X hundred thousand of your closest friends have already added themselves. This is the number one way they get new users.

8. Provide value for third party sites. Wallstreet Journal, for example, puts FriendFeed, Twitter, etc links on every page because they think it adds value to their site. Is there some way you can provide value like that?

9. Analyze your traffic. Install Google analytics, See where people are entering from. Where they are going. Where they are exiting from and how you can improve those pages.

10. The entire picture. Step back and look at the entire picture. Look at users who are creating quality content. Quality content drives more traffic to your site. Traffic going out of your site encourages other sites to add buttons to your site which encourages more users and more traffic into your site. It's a circle of life. Look at how your whole eco system is doing.

Tuesday, November 6, 2012

Port Forwarding Using iptables | Linux and Virtualization

Port Forwarding Using iptables | Linux and Virtualization:

Port Forwarding Using iptables

By: Zhiqiang Ma In: Linux
Port forwarding is simple to do with iptables in a Linux box which may probably already being used as the firewall or part of the gateway operatioin. In Linux kernels, port forwarding is achieved by packet filter rules in iptables.

Port forwarding

Port forwarding also called “port mapping” commonly refers to the network address translator gateway changing the destination address and/or port of the packet to reach a host within a masqueraded, typically private, network.
Port forwarding can be used to allow remote computers (e.g., public machines on the Internet) to connect to a specific computer within a private network such as local area network (LAN), sothat xternal hosts can communicate with services provided by hosts within a LAN. For example, running a public HTTP server (port 80) on a host within a private LAN, or permitting secure shell ssh (port 22) access to hosts within the private LAN from the Internet.
In Unix/Linux box where port numbers below 1024 can only be listened by software running as root, port forwarding is also used to redirect incoming traffic from a low numbered port to software listening on a higher port. This software can be running as a normal user, which avoids the security risk caused by running as the root user.


iptables is a very powerfull firewall which handles packets based on the type of packet activity and enqueues the packet in one of its builtin ‘tables’. In Linux box, iptables is implemented in Linux kernel as some kernel modules.
There are three tables in total: mangle, filter and nat. The mangle table is responsible for the alteration of service bits in the TCP header. The filter queue is responsible for packet filtering. The nat table performs Network Address Translation (NAT). Each tables may have some built-in chains in which firewall policy rules can be placed.
The filter table has three built-in chains:
* Forward chain: Filters packets destined for networks protected by the firewall.
* Input chain: Filters packets destined for the firewall.
* Output chain: Filters packets originating from the firewall.
The nat table has the following built-in chains:
* Pre-routing chain: NATs packets when the destination address of the packet needs to be changed.
* Post-routing chain: NATs packets when the source address of the packet needs to be changed.
* Output chain: NATs packets originating from the firewall.
Below is a brief view of how packets are processed by the chains:
 - nat (dst)   |           - filter      - nat (src)
               |                            |
               |                            |
              INPUT                       OUTPUT
              - filter                    - nat (dst)
               |                          - filter
               |                            |
Note: if the packet is from the firewall, it will not go through the PREROUTING chain.
We only look into the packets that requires port forwarding which is the topic of this post.
The packet entering the firewall is inspected by the rules in the nat table’s PREROUTING chain to see whether it requires destination modification (DNAT). The packet is then routed by Linux router after leaving the PREROUTING chain. The packet which is destined for a “protected” network is filtered by the rules in the FORWARD chain of the filter table. The it will go through the packet undergoes SNAT in the POSTROUTING chain before arriving at the “protected” network. When the destination server decides to reply, the packet undergoes the same sequence of steps.

Port forwarding using iptables

A port-forwarded packet will pass the PREROUTING chain in nat table, FORWARD chain in filter table, POSTROUTING chain in nat table and other chains. We need to add rules to these chains.
Let’s use a senario to introduce how to configure iptables to do port forwarding. Suppose our gateway can connect to both the Internet ( and the LAN ( The gateway’s eth0 interface has a public IP while the eth1 has a LAN IP Now, suppose that we have set up a HTTP server on and we want to provides service to the Internet through the public IP. We need to configure iptables to forward packets coming to port 80 of to 8080 of in LAN.
Below is the network topology:
Normally we deny all incoming connections to a gateway machine by default because opening up all services and ports could be a security risk. We will only open the ports for the services that we will use. In this example, we will open port 80 for HTTP service.
This is the rules to forward connections on port 80 of the gateway to the internal machine:
# iptables -A PREROUTING -t nat -i eth0 -p tcp --dport 80 -j DNAT --to
# iptables -A FORWARD -p tcp -d --dport 8080 -j ACCEPT
These two rules are straight forward. The first one specifies that all incoming tcp connections to port 80 should be sent to port 8080 of the internal machine This rule alone doesn’t complete the job as described above that we deny all incoming connections by default. Then we accept the incoming connection to port 80 from eth0 which connect to the Internet with the publich IP by the second rule. From the process path in the “iptables” part, the packet will also pass the FORWARD chains. We add the second rule in FORWARD chain to allow forwarding the packets to port 8080 of
By now, we have set up the the iptables rules for forwarding the 80 port. For other service, the method is similiar with the HTTP service.

The conntrack entries

The “nf_conntrack_*” kernel modules enables iptables to examine the status of connections by caching the related information for these connections. A cat of /proc/net/nf_conntrack (in some old Linux kernels, the file is /proc/net/ip_conntrack) will give a list of all the current entries in the conntrack database.
A conntrack entry looks like this:
ipv4     2 tcp      6 431581 ESTABLISHED
src= dst= sport=53867 dport=80 packets=22 bytes=13861
src= dst= sport=8080 dport=53867 packets=14 bytes=3535
[ASSURED] mark=0 secmark=0 use=2
This entry contains all the information that the conntrack module maintains to know the state of a specific connection. We can find the version of ip protocal version and the decimal coding, the protocol and the normal decimal coding. After this, we get how long this conntrack entry should live. Next is the actual state that this entry is in at this present point of time. Then, we get the source IP address, destination IP address, source port and destination port. After that, we get the IPs and ports of both source and destination we expect of return packets.
In this entry we can find that the arriving connection is: -->
while the returning connection is: -->
which reflects the port forwarding which we have set.

Friday, November 2, 2012

PHP PCNTL in Debian/Ubuntu PHP-installation | Triple-networks

PHP PCNTL in Debian/Ubuntu PHP-installation | Triple-networks:

Process Control support in PHP implements the Unix style of process creation, program execution, signal handling and process termination.
Check the complete introduction over Process Control.
howto enable the PHP Process Control in a standard PHP installation on Debian/Ubuntu.

$ mkdir /tmp/phpsource
$ cd /tmp/phpsource
$ apt-get source php5
$ cd /tmp/phpsource/php5-*/ext/pcntl
$ phpize
$ ./configure
$ make
# then copy your module to php5 module-lib path (in my case:)
# and create an .ini-file to enable the module for sapi after graceful restart.

$ cp /tmp/phpsource/php5-*/modules/ /usr/lib/php5/20090626/
$ echo "" > /etc/php5/conf.d/pcntl.ini

Howto: Enable PCNTL in Ubuntu PHP installations « crimulus-dot-com

Howto: Enable PCNTL in Ubuntu PHP installations « crimulus-dot-com:

PCNTL in PHP allows for some handy advanced “trickery” using the OS process functions inherent in Linux (*nix?).  I believe some features are available in Windows, but I know for certain that pcntl_fork() is not.
Anyway, it is not enabled by default, so if you want to take advantage of the functions on your Ubuntu LAMP server, you might spend hours searching the web for that magic aptitude command.  But, as far as I can tell, it doesn’t exist.
Luckily, I stumbled across this article on the Ubuntu forums, so I’m dedicating a post here with the hopes that other will find it more easily.
Please note that you’ll probably need build-essentials and a few other source compilation basics, but as long as you have that, the following code will get you what you want.
First, in your home directory:
mkdir php
cd php
apt-get source php5
cd php5-(WHATEVER_RELEASE)/ext/pcntl
cp modules/ /usr/lib/php5/WHEVER_YOUR_SO_FILES_ARE/
echo "" > /etc/php5/conf.d/pcntl.ini
FYI: “make install” does not appear to put the files in the correct place.
Btw, please direct any thanks/praise to skout23 on the Ubuntu forums.

Problems with campaign targeting and banner clicks

Problems with campaign targeting and banner clicks:

We have investigated the targeting issue and this is indeed a problem inside the script, which made the Publication targeting not function correctly for campaigns. 

This issue has now been fixed, and a new version of the script is online. All you have to do to fix the issue is replace the following file: 


Simply download the new ZIP, grab this file and replace it on your server. Publication targeting will then work as it should. 

php - Max size of URL parameters in _GET - Stack Overflow

php - Max size of URL parameters in _GET - Stack Overflow:

Please note that PHP setups with the suhosin patch installed will have a default limit of 512 characters for get parameters. Although bad practice, most browsers (including IE) supports URLs up to around 2000 characters, while Apache has a default of 8000.
To add support for long parameters with suhosin, add suhosin.get.max_value_length = <limit> in php.ini

Does mAdserve detect fraudulent clicks?

Does mAdserve detect fraudulent clicks?:


mAdserve is a Publisher-centric ad server, and not a fully featured ad network. Since a Publisher would not fake his own traffic, mAdserve does not have sophisticated fraud detection features built in. If you intend to build your own mobile ad network based on mAdserve, please feel free to add such a feature.

However, there is one feature you can activate in mAdserve if you'd like to track only unique clicks. In order to activate this feature,

1. Open config_variables.php in the main directory of mAdserve
2. Change the MAD_TRACK_UNIQUE_CLICKS variable to TRUE

Once activated, mAdserve will only track unique clicks. Example: If a user would click on the same ad multiple times, mAdserve would only track one click.

* Please note that this feature works only if you have also enabled caching on your mAdserve ad server. You can enable caching by changing the MAD_ENABLE_CACHE variable to TRUE in config_variables.php

Make sure that the /data/cache folder is writeable if you want to enable the file-based cache.


Detect Mobile Browsers - Mobile User Agent Detection

Detect Mobile Browsers - Mobile User Agent Detection:

'via Blog this'

Server-Side Device Detection: History, Benefits And How-To

Server-Side Device Detection: History, Benefits And How-To: "mobiForge"

The expansion of the Web from the PC to devices such as mobile phones, tablets and TVs demands a new approach to publishing content. Your customers are now interacting with your website on countless different devices, whether you know it or not.
As we progress into this new age of the Web, a brand’s Web user experience on multiple devices is an increasingly important part of its interaction with customers. A good multi-device Web experience is fast becoming the new default. If your business has a Web presence, then you need fine-grained control over this experience and the ability to map your business requirements to the interactions that people have with your website.
Drawing on the work of people behind the leading solutions on the market, we’ll discuss a useful tool in one’s armory for addressing this problem: server-side device detection.

The History Of Sever-Side Device Detection

First-generation mobile devices had no DOM and little or no CSS, JavaScript or ability to reflow content. In fact, the phone browsers of the early 2000s were so limited that the maximum HTML sizes that many of them handled was well under 10 KB; many didn’t support images, and so on. Pages that didn’t cater to a device’s particular capabilities would often cause the browser or even the entire phone to crash. In this very limited environment, server-side device detection (“device detection” henceforth) was literally the only way to safely publish mobile Web content, because each device had restrictions and bugs that had to be worked around. Thus, from the very earliest days of the mobile Web, device detection was a vital part of any developer’s toolkit.
With device detection, the HTTP headers that browsers send as part of every request they make are examined and are usually sufficient to uniquely identify the browser or model and, hence, its properties. The most important HTTP header used for this purpose is the user-agent header. The designers of the HTTP protocol anticipated the need to serve content to user agents with different capabilities and specifically set up the user-agent header as a means to do this; namely, RFC 1945(HTTP 1.0) and RFC 2616 (HTTP 1.1). Device-detection solutions use various pattern-matching techniques to map these headers to data stores of devices and properties.
With the advent of smartphones, a few things have changed, and many of the device limitations described above have been overcome. This has allowed developers to take shortcuts and create full-fledged websites that partially adapt themselves to mobile devices, using client-side adaptation.
This has sparked the idea that client-side adaptation could ultimately make device detection unnecessary. The concept of a “one-size-fits-all” website is very romantic and seductive, thanks to the potential of JavaScript to make the device-fragmentation problem disappear. The prospect of not having to invest in a device-detection framework makes client-side adaptation appealing to CFOs also. However, we strongly believe that reality is not quite that simple.

The Importance Of Device Detection

Creating mobile websites with a pure client-side approach, including techniques such as progressive enhancement and responsive Web design (RWD), often takes one only so far. Admittedly, this might be far enough for companies that want to minimize the cost of development, but is it what every company really wants? From our long experience in the mobile space, we know one thing for sure: companies want control over the user experience. No two business models are alike, and slightly different requirements could affect the way a website and its content are managed by its stakeholders. That’s where client-side solutions often fall short.
We will look more deeply at the main issue, one that engineers and project managers alike will recognize: no matter how clever they are, usually “one-size-fits-all” approaches cannot really address the requirements of a large brand’s Web offering.


The first point to note is that not all devices support modern HTML5 and JavaScript features equally, so some of the problems from the early days of the mobile Web still exist. Client-side feature detection can help work around this issue, but many browsers still return false positives for such tests.
The other problem is that a website might look OK to a user on a high-end device with great bandwidth, but in the process of creating it, several bridges have been burned and very little room has been left for the adjustments, backtracking and corrections brought about by the company’s always-evolving business model (let alone the advent of a new class of HTTP clients on the market). The maintenance of responsive Web sites is difficult. As an analogy, an engineer who deals with responsive Web design is like a funambulist, up in the air, on a long wire, with a pole in their hands for balance, standing on one leg, with a pile of dishes on their head. They might have performed well so far, but there is little else that one can reasonably ask them to do at this point.
Device-detection engineers are in a luckier situation. Each one of them has a lot more freedom to deliver the final result to the various target platforms, be it desktop, mobile or tablet. They are also better positioned to address unexpected events and changes in requirements. Extending the analogy, device-detection engineers have chosen different ways (than the suspended wire) to reach their destinations. One person is driving, another is on a train, yet another is biking, and a fourth is on a helicopter. The one with the car decided to stop at a motel for the night and rest a bit. The one in the helicopter was asked to bring the CEO to Vegas for a couple of days. No big deal; those changes were unexpected, but the situation is still under control.
No analogy is perfect, but this one is pretty close. Device detection might seem to cost significantly more than responsive design, but device detection is what gives a company control. When the business model changes, device detection enables a company to shuffle quite a few things around and still avoid panic mode. One way to look at it is that device detection is one application of the popular divide-and-conquer strategy. On the other hand, RWD might be very limiting (particularly if the designer who has mastered all of that magic happens to have changed jobs).
While many designers embrace the flexible nature of the Web, with device detection, you can fine-tune the experience to exactly match the requirements of the user and the device they are using. This is often the main argument for device detection — it enables you to deliver a small contained experience to feature phones, a rich JavaScript-enhanced solution to smartphones and a lean-back experience to TVs, all from the same URL.
In my opinion, no other technique has this expressive range today. This is the reason why Facebook, Google, eBay, Yahoo, Netflix and many other major Internet brands use device detection. I’ve written about how Facebook and Google deliver their mobile Web experiences over on mobiForge.


Device detection has some other key attributes in addition to flexibility and control:
  • Rendering speed
    Because device-detection solutions can prepare just the right package of HTML and resources on the server, the browser is left to do much less work in rendering the page. This can make a big difference to the end-user experience and evidently is part of the reason why Twitter recently abandoned its client-side rendering approach in favor of a server-side model.
  • Efficiency
    Device-detection solutions allow the developer to send only the content required to the requesting browser, making the process as efficient as possible. Remember that even if you have a 3G or Wi-Fi connection, the effective bandwidth available to the user could be greatly less than it should be; ask anyone who has used airport Wi-Fi or congested cellular data on their mobile device. This can make the difference between a page that loads in a couple of seconds and one that never finishes loading, causing the user to abandon the website in frustration.
  • Choice of programming language
    You can implement the adaptation logic in whatever programming language you want, instead of being limited to just JavaScript.


Another area where device detection has an advantage is in fine-tuning media formats and other resources delivered to the device in question. A good device-detection system is able to inform you of the exact media types supported by the given device, from simple image formats such as PNG, JPEG and SVG to more advanced media formats involving video codecs and bit rates.


Many client-side JavaScript libraries have recently been released that enable developers to determine certain properties of the browser with a simple JavaScript API. The best known of these libraries is undoubtedly Modernizr. These feature tests are often coupled with “polyfills” that replace missing browser capabilities with JavaScript substitutes. These clients-side feature-detection libraries are very useful and are often combined with server-side techniques to give the best of both worlds, but there are limitations and differences in approach that limit their usefulness:
  • They detect only browser features and cannot determine the physical nature of the underlying device. In many cases, browser features are all that is required, but if, for example, you wish to supply a deep link to an app download for a particular Android OS version, a feature detection library typically cannot tell you what you need to know.
  • The browser features are available only after the DOM has loaded and the tests have run, by which time it is too late to make major changes to the page’s content. For this reason, client-side detection is mostly used to tweak visual layouts, rather than make substantive changes to content and interactions. That said, the features determined via client-side detection can be stored in a cookie and used on subsequent pages to make more substantive changes.
  • While some browser properties can be queried via JavaScript, many browsers still return false positives for certain tests, causing incorrect decisions to be made.
  • Some properties are not available at all via Javascript; for example, whether a device is a phone, tablet or desktop, its model name, vendor name and maximum HTML size, whether it supports click-to-call, and so on.
While server-side detection does impose some load on the server, it is typically negligible compared to the cost of serving the page and its resources. Locally deployed server-side solutions can typically manage well in excess of tens of thousands of recognitions per second, and far higher for Apache and NGINX module variants that are based on C++. Cloud-based solutions make a query over the network for each new device they see, but they cache the resulting information for a configurable period afterward to reduce latency and network overhead.
The main disadvantages of server-side detection are the following:
  • The device databases that they utilize need to be updated frequently. Vendors usually make this as easy as possible by providing sample cron scripts to download daily updates.
  • Device databases might get out of date or contain bad information causing wrong data to be delivered to specific range of devices.
  • Device-detection solutions are typically commercial products that must be licensed.
  • Not all solutions allow for personalization of the data stored for each device.

Time For A Practical Example

Having covered all of that, it’s time for a practical example. Let’s say that a company wishes to provide optimized user experience for a variety of devices: desktop, tablets and mobile devices. Let’s also say that the company splits mobile devices into two categories: smartphone and legacy devices. This categorization is dictated by the company’s business model, which requires optimal placement of advertising banners on the different platforms. This breakdown of “views” based on the type of HTTP client is often referred to as “segmentation.” In this example, we have four segments: desktop, tablets, smartphones and legacy mobile devices.
Some notes:
  • The green boxes below represent the advertising banners. Obviously, the company expects them to be positioned differently for each segment. Ads are typically delivered by different ad networks (say, Google AdSense for the Web and for tablets, and a mobile-specific ad network for mobile).
  • In addition to banners, the content and content distribution will vary wildly. Not all desktop Web content is equally relevant for mobile, so a less cluttered navigation model would make more sense.
  • We want certain content to be available in the form of a carousel (i.e. slideshow) for tablets, but not other devices.
  • Mobile should be broken down in smartphone and legacy devices. (Some companies want to stick to smartphones exclusively, whereas supporting the long tail of legacy devices would make more business sense for other companies, such as this one.)
Structure of a desktop website.
Structure of a desktop website.
Structure tablet site.
Structure tablet site.
On the left: Structure Smartphone site. On the right: Structure legacy device site.
On the left: Structure Smartphone site. On the right: Structure legacy device site.
We will show how the segmentation above can be achieved through device detection. A note about the main device-detection frameworks out there first.

Server-Side Detection Frameworks

Device-detection frameworks are essentially built around two components: a database of device information and an API that lets one associate an HTTP request or other device ID to the list of its properties. The main device-detection frameworks out there are WURFL by ScientiaMobile andDeviceAtlas by dotMobi. Another player in this area is DetectRight.
The code snippets below refer to the WURFL and DeviceAtlas APIs and show a rough outline of how an application could be made to route HTTP requests in different directions to support each segment.
The two frameworks are available either as a standalone self-hosted option or in the cloud. The pseudo-code below is inspired by the frameworks’ respective cloud APIs and should be easily adaptable by PHP programmers as well as programmers on other platforms.
A note about dispatching HTTP requests: The concept of “dispatching” a URL request is familiar to most Java and .NET programmers because most popular frameworks rely on it. Basically, it is about “assigning” an HTTP request to an appropriate “handler” behind the scenes, without implications for the URL that triggered all of the action. While not always immediately obvious, the same approach is also possible in all other Web programming languages. In PHP, it’s about having a “controller” PHP file invoke a different PHP file to handle the request.


03// include DeviceAtlas cloud API
04include './DeviceAtlasCloud/Client.php';
05$test_mode = false;
07// get device data from the cloud
08$da_data = DeviceAtlasCloudClient::getDeviceData($test_mode);
10if (isset($da_data[‘properties’][‘isBrowser’])) {
11  // Dispatch HTTP request to desktop view
12} else {
13  if (isset($da_data[‘properties’][‘isTablet’])) {
14     // Dispatch HTTP request to tablet view 
15  } else {
16     if (isset($da_data[‘properties’][‘mobileDevice’])) {
17        // time to handle mobile devices
18      if ($da_data['properties']['displayWidth'] < 320) {
19            // Dispatch HTTP request to feature phone view
20      }        
21     }
22  }
24// Dispatch HTTP request to SmartPhone view


03// Include the WURFL Cloud Client
04require_once '../Client/Client.php';
06// Create a configuration object
07$config = new WurflCloud_Client_Config();
09// Set your WURFL Cloud API Key
10$config->api_key = 'xxxxxx:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
12// Create the WURFL Cloud Client
13$client = new WurflCloud_Client_Client($config);
15// Detect your device
18if ($client->getDeviceCapability('ux_full_desktop')) {
19  // Dispatch HTTP request to desktop view
20} else {
21  if ($client->getDeviceCapability('is_tablet')) {
22     // Dispatch HTTP request to tablet view 
23  } else {
24     if ($client->getDeviceCapability('is_wireless_device')) {
25        // time to handle mobile devices
26      if ($client->getDeviceCapability(‘resolution_width') < 320) {
27            // Dispatch HTTP request to feature phone view
28      }
29     }
30  }
32// Dispatch HTTP request to SmartPhone view
Of course, the dispatching part will involve the execution of the relevant PHP files (tabletView.php,desktopView.php, etc.). It goes without saying that each of those PHP files will need to be dedicated to supporting one user experience, not multiple ones (with the possible exception of minor UX adjustments within the view itself, which is always possible — there is no reason why, say,tabletView.php could access device-detection and further optimize the user experience for, say, an iPad user).
We picked PHP for the example above, but Java, .NET and many other languages are equally well supported. Any programmer in any language will make sense of the code above.
For the sake of simplicity, let’s define a smartphone as any device whose screen is wider than 320 pixels. This probably does not quite cut it for you. Engineers can use any capability or combination of capabilities to define smartphones in a way that makes sense for the given company’s business model.
The following table lists the capabilities involved in the example above. Each device-detection solution comes with hundreds of capabilities that can be used to adapt a company’s Web offering to the most specific requirements. For example, Adobe Flash support, video and audio codecs, OS and OS version, browser and browser version are all dimensions of the problems that might be relevant to supporting an organization’s business model.
CategoryWURFL CapabilityDeviceAtlas Property
Desktop Web browserux_full_desktopisBrowser
Mobile deviceis_wireless_devicemobileDevice
Display widthresolution_widthdisplayWidth


One aspect that we did not touch on in our example but that certainly represents one of the main use cases of device detection is image resizing. Serving a large picture on a regular website is fine, but there are multiple reasons why sending the same picture to a mobile user is not OK: mobile devices have much smaller screens, (often) more limited bandwidth and more limited computation power. By providing a lower resolution and/or a smaller size of the picture, the mobile UX is made acceptable. There are two approaches to image resizing:
  • Each published image can be created in multiple versions, with device detection used to serve a pointer to the most appropriate version for that device or browser.
  • Software libraries such as JAI, ImageMagick and GD can be used to resize images on the fly, based on the properties of the requesting HTTP client.
An interesting article on .net magazine, “Getting Started With RESS,” shows an example of server-side enhanced RWD — i.e. how a website built with RWD can still benefit from image resizing and server-side detection. This technique is commonly referred to as RESS (responsive design + server-side components).


We touched on this already when mentioning the dispatching of HTTP requests. Segmenting your content could lead to the mapping of different segments into different URLs. This in itself is not good because users might exchange links across devices, resulting in URLs that point to the wrong experience. It doesn’t need to to be that way, though. Proper design of your application (and proper management of URLs) makes it possible to have common URL entry points for all segments. All you need to do is dispatch your request internally. This is easily achievable with pretty much any development language and framework around.

Criticism Of Device Detection

Recently, we have observed a certain level of criticism of device detection and have found it odd. After all, device detection is an option for those who want extra control, not a legal obligation of any kind. Much of this criticism can be traced back to the abstract ideal of a unified Web (“One Web” is the key term here). Ideals are nice, but end users don’t care about ideals nearly as much as they care about a decent experience. In fact, if we called end-users by their other name, i.e. consumers, the sentence above would make even more immediate sense to everyone. How can anyone be surprised that companies care about controlling the UX delivered to their consumers, by whatever means they choose?
Interestingly, one doesn’t need to go any further than this website to hear vocal critics of server-side detection (or UA-sniffing, as they disparagingly call it). Interestingly, Opera adopted WURFL (a popular open-source device-detection framework) to customize its content and services for mobile users. In addition, the Opera Mini browser sends extra HTTP headers with device information to certain partners to enable better server-side fine-tuning of the user experience (X-OperaMini-Width/Height). For good measure, the device original’s user-agent header is also preserved in the X-OperaMini-Phone-Ua header. In short, ideals are one thing, but then there is the reality of the devices that people have in their hands.


Client-side detection is a viable way to render websites on smartphones and tablets. We believe that the approach is unlikely to afford the level of optimization, control and maintainability that enterprises demand for their Web (and mobile Web) content and services. In these cases, we are confident that server-side detection still represents the best and most cost-effective solution to deliver a rich Web experience to multiple devices.


For organizations and brands with business critical applications, DeviceAtlas provides rich, actionable intelligence on the devices that your customers are using, using data sourced from major industry partners. This server-side enterprise-grade device-detection solution is highly extensible and customizable, delivering unsurpassed accuracy for mobile device detection, testing and analysis. Cloud, locally deployed and OEM options are available.


Created in 2002, WURFL (Wireless Universal Resource FiLe), is a popular open-source framework to solve the device-fragmentation problem for mobile Web developers and other stakeholders in the mobile ecosystem. WURFL has been and still is the de facto standard device-description repository adopted by mobile developers. WURFL is open source (AGPL v3) and a trademark of ScientiaMobile.