Building Secure Operating Systems for the Future

It’s not surprising that Boyd Multerer, Microsoft Xbox “father of invention,” has a device-orientated vision about the future of operating systems (OS) and that it considers decades of gaming and OS development experience.

Boyd Multerer

Multerer, CEO/CTO for Kry10, is building next-generation platforms for connected devices (IoT) and cross platform applications. His company’s branded KRY10 operating system (KOS) and platform is intended for mission critical devices which necessitate stability, and building, managing and extending applications, while also fending off cyber-attacks.

I asked Multerer about the future of operating systems as well as his experience in device management and security, and the following are his responses.

For our readers, comment on the evolution of operating systems and what you see for the future?

A: For the past 30 years, the operating system design has been strongly designed for single devices, where there is one device being used by one human, who is physically present. This includes PCs, laptops, phones, and more.

Servers changed this by taking OS’s built for people and modifying them to be physically away from the user and centrally managed in a data center. They are still assumed to be physically controlled – you can’t just walk into any data center and access computers, and you no longer assume a 1:1 relationship between a person and a compute unit. Or, in other words, you don’t really know where your work is being done, and it may move around from computer to computer.

While we’ve found ways to make it work, we’ve still been fundamentally using Operating Systems designed and built for a very different world.

Modern operating systems have much more variety in purpose than OS’s of the past and exist in very different realities – security threats being the most critical – and scenarios. They are designed to respond to questions such as: Is this device intended primarily to interact with a human? Is this device managed by the user or a central administrator? Is this device in a physically secure environment? And what are the consequences of an error or a successful attack that dramatically affect the design and technology that go into a modern system?

The next frontier is devices that are in the field – or even outer space – and may or may not have a human associated with them at all. These are the devices that run our infrastructure, make our vehicles work, harvest our food, and much more. While we do get the benefits of the computer, we don’t really think of it as “interacting with a computer”. Critically, these devices are not assumed to be physically isolated from the outside world, yet they still need to be maintained, controlled, and kept safe.

So, the future of OS designs for devices in the field needs to be designed for a very different security reality. Also, as these devices move into higher value scenarios [such as] hospitals, energy, vehicles, etc., the consequences of a failure go up, raising the bar on what is required.

Q: What can modern operating systems learn from gaming platforms?

A: As an early example of devices in the field, gaming consoles taught me a lot about how to think about device management and physical protection.

Game consoles are still more advanced than most devices when it comes to physical protection, but the OS designs are weaker than what is needed to run a vehicle or an energy system.

So, there are some similarities between physical protection and the philosophy of device management, but the actual OS implementations are pretty different. For example, game consoles care about seamless management of different workloads (games), the maximum performance delivered to the single game running at a time, and the consequence of a failure is having to re-load the game and maybe replay a little bit of it. Infrastructure devices care about the seamless management of different workloads, multiple critical components running simultaneously, and the consequences of a failure are that people could die.

These differences result in consoles emphasizing raw performance and authenticity – no cheating – while not minding so much about occasionally needing to restart while infrastructure needs less raw performance … possibly greater real-time needs – but that’s another subject, greater isolation between components, and absolutely minimal downtime.

Q: How can operating systems and/or gaming platforms be made more resilient in the face of cyberattacks?

A: There are two fundamental “advancements” that are based on technology and approaches to software design that have been around for some time, but only recently have we reached a point where we can leverage the benefits in modern computing: Micro-kernel architecture and formal methods to prove they are written correctly.

Microkernels have been around since the 1900s and are well known to be more stable, but it’s only in the 2000s+ that chips have been able to run them with enough performance.

The other big change is formal methods. Formal Methods uses pure mathematics to *prove* that code was written correctly. This is different from normal testing, which tried a bunch of things and then effectively says, “Yeah. This is probably going to work”. That is good enough when you can tolerate some minimal downtime. When you can’t tolerate downtime or when the consequences of a failure are too high, you need to *prove* your code is correct using formal methods.

It is only very recently that Formal Methods has become practical enough to use at any real scale. It is still very hard and expensive to do, so it is typically only used on encryption code and things like that. But recent breakthroughs, specifically on the seL4 project, have shown that you can use it on larger projects, like OS.

We believe that any device that does a high-value job will need to have formally proven code at its heart. It may be that only the most critical parts of the OS need to be formally proven, but those parts absolutely require it.

A: All devices in the field face an uncertain future. The laws regulating them, the technology available, and the people you need to hire are going through rapid change.

What we do know is that uptime, isolating failures, and formally proving critical components are all going to be requirements. How quickly we get there depends on the use of the specific device, but the days when we could casually assume humans can tolerate a restart after a failure are gone.

Gaming will probably lag behind infrastructure in these areas while leading in terms of anti-hacking and anti-cheating. Eventually, gaming design will influence technology designed for the “real world” and vice versa.

-30-

About Yasmin Ranade

Leave a Reply

Daniel Schneider

Theresa Bumstead

Neil Soll

About Yasmin Ranade

Read Previous

Read Next

Leave a Reply