4/20/2018 David Mercer, Illinois Computer Science
Written by David Mercer, Illinois Computer Science
When BMW owners use the BMW Connected app for navigation or traffic updates, they’re relying on a little-known but widely used distributed-systems platform known as Service Fabric.
Service Fabric is the backbone of Microsoft’s Azure cloud-computing business, where BMW Connected runs, along with a number of key Microsoft products such as Skype, Cortana, and the InTune device manager. Service Fabric has more than a decade of history, but for most of its life has existed behind Microsoft’s proprietary walls.
But that’s changing, and a professor and PhD student from Illinois Computer Science are the first people outside the company to dig into Service Fabric and what makes it work, as Microsoft opened the service’s source code, a step the company took last month.Professor Indranil Gupta and his student, Shegufta Bakht Ahsan, were among 33 authors of the first paper published on Service Fabric. It is being presented this month at the European Conference on Computer Systems in Portugal. Gupta and Ahsan are the only authors who don’t work for Microsoft.
Service Fabric, Gupta says, is essentially an operating system for Azure and its data centers, collections of large numbers of servers.
“Say you have a thousand servers. Some servers are running BMW’s app, other servers are running Microsoft’s Intune framework. All of them are running on top of the same OS. This is a distributed OS, and that has to be correct and fast even though servers might crash and there are many servers,” Gupta said.
Gupta and Ahsan became part of this initial dive into Service Fabric through a conversation a few years ago with a Microsoft executive, Gopal Kakivaya, a corporate vice president working on Azure.
Kakivaya says opening Service Fabric up for study outside the company is intended to give the developer community access and to create a standard for reliability that can advance cloud computing.
“We decided that the best way to get everybody to understand the system and even to contribute to it is to open source it,” Kakivaya said. “Cloud software is a part of every industry, and systems like this, they all need that similar kind of reliability. Some platform needs to emerge.”
According to Gupta, Service Fabric provides a framework for writing web applications in the form of microservices, a relatively new method of software design that structures apps as sets of linked services rather than monolithic software design, in which an app is created as a single, large service.
Monolithic design “turns out to be more complex because you’re dealing with one big code that’s doing a lot of things – it’s like juggling a ball and trying to flip knives at the same time,” Gupta said. “With the microservices approach you say, ‘OK, this service that I’m building, I’m going to think of this as two separate microservices, one that only juggles balls and one that only juggles knives. So I’ll write the code for juggling balls separately and the code for juggling knives separately.’”
This allows for each piece of software to be written separately by separate people, provided there is an effective application programming interface to connect them.
"As long as they use an understandable API, two different micro services can talk to each other," Ahsan said.
Service Fabric provides what Gupta calls a substrate, a low-level layer like an operating system for all those apps to run on, at a huge scale – millions of users across, potentially, many thousands of servers.
Built on top of that lowest layer are more layers, the highest being the applications themselves, like BMW Connected and Skype.
“These higher applications, they have these two big requirements: They want consistency, and they also want fault tolerance -- when servers fail, you want to behavior of an application to be not different from when the servers didn’t fail,” Gupta said.
Microsoft works to build those fundamental requirements into the lowest levels of Service Fabric.
“So when the BMW guys wrote the BMW app, they did not worry about failures at all. They could just treat their data as being always consistent,” Gupta said. “We can actually tell you formally what consistency, what fault tolerance properties we provide. And you know that that’s always going to be true and you can build your application on top of that knowing that.”
Ahsan adds, “Inside the Service Fabric framework, we can imbue consistency and fault-tolerance from the ground up. Upper layers in the systems stack can rely on guarantees from the lower layers.”
According to the paper, the Microsoft Azure SQL Database service running on Service Fabric hosts 1.82 million databases -- 3.48 petabytes of data – and runs on more than 100,000 machines. Service Fabric’s cloud telemetry engine processes 3 trillion events a week.
The paper, Gupta said, is just a starting point, and sketches out directions where he and other researchers might head next, among them potential improvements to some of the components inside Service Fabric.
In all, though, Gupta, Ahsan, and Kakivaya believe Microsoft’s decision to make Service Fabric open source could be a step toward creating new, higher standards of consistency and reliability.
“Ideally, we would not want the developer to even think there are servers, we just want them to imagine one big beefy and powerful server that is the cloud,” Gupta said.