UPDATE 1 - SEPTEMBER 2014

THE OPERATING SYSTEM OF THE METAVERSE

At Lucidscape we are building a new kind of massively-distributed 3D simulation engine to power the vast network of interconnected virtual worlds known as the Metaverse.

Fast Company has described our endeavor as the Big Bang Of The Next Internet.

Our work is necessary because incremental changes built atop existing 3D engine architectures will not be sufficient to create the kind of massively multi-user experiences we expect from tomorrow’s web-of-worlds.

Handling massive virtual worlds requires a fundamentally different approach to engine design, one that prioritizes the requirements of scale, thoroughly embraces distributed computing and many-core architectures, and that is also unencumbered by the legacy decisions which hold current engines back.

We are developing the open source software foundations for a future where millions of us will interact within enormous virtual worlds through standardized clients which allow users and programs to flow from one server to the next as easily as we navigate the web today. Our distributed simulation stack is intended to serve as the VR equivalent of the web server and browser, while also offering OS-like security and scheduling capabilities necessary to manage resource usage by users and nomadic programs.

We have recently conducted our first large scale test where we simulated the interactions of 10 million participants on a cluster of over 800 servers. The goal of this article is to update the community on our progress and to inspire you to share this journey with us.

ABOUT US

We are a team of developers who are passionate about creating a future where massively multi-user virtual reality is pervasive and everyone can participate in a Metaverse which is:

  • Free and open source so everyone may add their own worlds to the Metaverse the same way anyone can run a web server today.
  • Devoid of any form of centralized control, free of gatekeepers and censorship under the guise of curation. Because if anyone can tell you what you are allowed to do in it, then it is not the real Metaverse.

THE “10,000,000 THINGS IN A SEAMLESS WORLD” TEST

When building something that has never been built before, it is important to face your worst-case-scenarios early on in order to make sure you are building a solid foundation that will ultimately fulfill your requirements.

Online game engines of today, with a single notable exception, can only handle smaller-scale simulations with no more than several dozen participants in the same virtual space. To overcome this limitation world designers are forced to sacrifice the user's experience of scale by partitioning virtual worlds into a multitude of smaller instances where only a limited number of participants may interact.

In contrast, we are committed to building an engine capable of handling seamless worlds of unprecedented scale (e.g. whole populated cities, planets, solar systems).

As we approached this challenge, our biggest fear was to find ourselves to have built yet-another-traditional-3D-engine that could never serve as the foundations of the Metaverse as we have envisioned.

With this concern in mind, we decided that our first milestone had to be the kind of worst-case-scenario stress test that made us worry the most:

  • It had to involve a large number of servers contributing to a single, seamless virtual world.
  • There could be no opaque boundaries between servers, forcing renderers to take into consideration scene graph contributions of several adjacent servers in order to compose each rendered frame.
    • For example, while standing inside server A you must be able to see the content of adjacent servers B and C.
  • It must require broadcasting hundreds of millions of position, orientation and scale updates per second.
  • It must accurately handle millions of collisions per second while determining all contact points involved.
  • It must execute billions of spatial query predicates per second, across server boundaries when needed.
  • It must contain millions of autonomous programs that:
    • Are physically simulated, contain at least one rigid-body which is always in motion, are influenced by external force fields, and handle collision accurately.
    • Can move freely across server boundaries, taking both state and their source code with them.
    • Communicate with other programs across server boundaries.
    • Make large, volume based, cross-server spatial queries to locate other programs in order to make decisions in real time.
Note: This article contains several simplistic visualizations generated to help readers understand the scope and motivations behind the first large scale test of the Metaverse engine. A high tolerance to programmer art is required from this point forward - and you may rest assured that the current level of graphical fidelity is not representative of what will be possible in the Metaverse proper.

TEST PARTICIPANTS (STRESSORS)

We utilized two basic program types that act as the architecture stressors in this test. They both exist within a virtual world which is devoid of meaning and their sole purpose is to generate an elevated and sustained workload in order to put each critical engine subsystem under significant pressure.

The design process of the stressors was iterative and after each architecture review cycle we repeatedly increased: (1) the number of servers involved; (2) the amount of work generated by each stressor and (3) the number of stressors participating in the simulation. We repeated this process until there were no obvious architecture bottlenecks left to address.

Stressor Type #1 - The Emitter
  • Approximately 432,000 instances
  • Physically simulated, mostly stationary but can be moved by impact
  • Emits a gravitational field which affects other programs within range
  • Spawns two Drones per second on average
  • Coordinates swarms of at least 20 orbiting Drones to attack Emitters of different colors
    • Make spatial queries to locate nearby Drones
    • Make cross-server spatial queries to locate distant adversaries
    • Coordinate attacks by dispatching orders to Drones
  • Constantly communicate with its Drones across server boundaries to maintain situational awareness
Stressor Type #2 - The Drone
  • Approximately 10 million instances
  • Physically simulated, transfers kinetic energy to targets on impact
  • Always in motion, stochastic term added to all trajectory changes to make client-side prediction challenging
  • May attack enemy emitters by itself or as part of a swarm
    • Make cross-server spatial queries to locate targets
    • Cast rays to prevent colliding with friendly programs
    • Send cross-server messages to communicate with friendly programs
  • Despawn on impact, spawning an Explosion
    • Explosions are simple, short-lived programs which exist solely to put pressure on the code path used to spawn/despawn entities.
Sample simulation that requires programs to: (1) make cross-server spatial queries to locate their targets; (2) migrate across server boundaries taking both their state and source code with them.

TEST COMPONENTS (CELLS)

The virtual world for this test was composed by a large number of smaller simulation “cells”, each containing 16,000 stressors on average: 512 emitters, approximately 15,360 drones, and several hundred short-lived explosions.

Each cube-shaped cell ran inside its own independent Metaverse server which was connected to adjacent cells by six “portals”, each covering a face of the cube that defines the cell boundary.

Portals are transparent, user-defined structures which allow users, programs, spatial queries and messages to flow from one server to the next.
Each full-size test cell each was assigned to its own Amazon EC2 instance and contains 512 emitters and approximately 15,360 drones
Illustrative cluster composed of 8 adjacent cells connected by portals which allow users, programs, spatial queries and messages to flow between cells.
Brief tour of a single simulation cell used on stress test #1, demonstrating the participating stressors and their behavior.

BEHIND THE SCENES

The stressors were purposefully crafted to be inefficient in the same manner we recognize average user-written code may be. Additionally, we decided against modeling swarms of drones as a single entity, opting instead to make each drone an independent program and embrace the associated overhead as part of this test.

An important aspect of the Metaverse engine is that programs are never given direct access to another program or to the simulation scene graph. Instead, programs must sense the world through either spatial queries or virtual sensors. Similarly, all program-to-program communication is brokered by a messaging pipeline which is also capable of routing events across server boundaries.

This is a required abstraction to enable seamless cross-server interactions and allow the execution of low-trust, nomadic programs. As a consequence, writing a Metaverse program is more akin to programing an embodied robot than to writing a typical code because entities must rely on their virtual senses to gather information about the world instead of having privileged access to the global simulation state, which in the Metaverse is equivalent of having root access to the simulation.

The stressors designed for this test were purposefully crafted to be inefficient, generating a large number of spatial-queries as part of their decision-making process. Queries that intersect with cell boundaries are relayed to adjacent servers for fulfillment.

“AT SCALE, EVERYTHING BREAKS”

- Urs Hölzle, Distinguished Google Fellow

If it is true that everything breaks at scale, it is doubly so when network latency itself becomes a factor. Unlike a Hadoop cluster (for example) which is meant to yield the results of a computation, a Metaverse cluster is meant to deliver a soft real-time experience where users have a significant degree of agency over the outcome.

Those added requirements introduce several other possible failure modes where communication latency leads to problems ranging from minor and recoverable (e.g. momentary break of immersion for users), to more extreme cases where it may even compromise the integrity of complex simulations than span across several nodes (e.g. a 'frozen' block in the middle of a simulated city).

While we anticipated several of the challenges we faced as the simulation cluster grew, we also came across a handful of unexpected bottlenecks that only revealed themselves at scale. Finding those at-scale bottlenecks was one of the main purposes of this test.

In most cases scalability is not something that can be easily bolted onto an existing architecture, instead it must be a guiding principle for every decision made from the very beginning of the project. In our case, we wanted to make sure we had a solid foundation for the Metaverse as a real-time computing fabric before working on eye-candy.

Some of the bottlenecks we found could be easily addressed by changing the stressor code to be more 'friendly' to our architecture, which is something we did not allow ourselves to do because writing such code would require a deeper understanding of the Metaverse engine than a newcomer to the platform could possibly have.

We feel particularly strongly about keeping userland code simple and straightforward because we believe the Metaverse must be accessible on all dimensions. Simply being free and open source would not suffice if by design it excludes children or non-technical users for example.

It took a few failed attempts before we achieved a stable run with over 300,000 active stressors, from which point we were able to scale-up the ten million target without incident.

TEST STATISTICS

Mean Peak
Participating Servers (Amazon EC2 c3.2xlarge) 826 828
Active Stressors (entities) 10,529,796 10,966,376
Spawned Stressors (per second) 394,322 875,672
Despawned Stressors (per second) 368,518 678,118
Points of Contact (collisions, per second) 1,965,790 4,353,930
Late Bound Calls to Script-Based “Userland” Code (per second) 479,903,340 1,372,735,420
Spatial Query Predicates Handled (per second) 4,000,024,013 13,268,585,400
Cross-Server Events Dispatched (per second) 27,492,976 55,678,378
Cross-Server Entity Migrations (per second) 100,897 291,354
Cross-Server Spatial Query Relays (per second) 31,988 70,826
Cross-Server Transactional Links (1st degree, per server) 27 27
Servers Contributing to a Single Rendered Frame 6 11

Over-the-Internet tour of the test cluster spanning across 826 servers and containing over 10 million active programs.

WHAT IS NEXT

Unless technological progress grinds to a halt, the Metaverse is inevitable. However, what is yet to be determined is what kind of Metaverse it is going to be, one that belongs to everybody, embracing freedom, accessibility and personal expression without compromise - or one that is controlled and shaped by the will and whims of its creator.

As a team we know where we stand as we are committed to building a Metaverse which has no gatekeepers, is free of centralized control, and is composed of sovereign servers without sacrificing the mobility of users and programs alike.

Our goal today was to inspire you to share this journey with us. Despite the technical nature of this first update, everyone is welcome regardless of background or level of expertise. We want to hear from you, more importantly we want to work with you and rely on your guidance and support to make sure we are building a Metaverse that is for everyone.

Sound good? Join our mailing list and we will keep you posted!