This Part III of ‘THE METAVERSE PRIMER’, which focuses on the role of Networking in ‘The Metaverse’. Here, Networking is defined as “the provisioning of persistent, real-time connections, high bandwidth, and decentralized data transmission by backbone providers, the networks, exchange centers, and services that route amongst them, as well as those managing ‘last mile’ data to consumers.”
The three core areas of networking — bandwidth, latency, and reliability — are likely to be least interesting Metaverse-enablers to most readers. However, their constraints and growth shape how we design Metaverse products and services, when we can use them, and what we can (and may never be able to) do.
Bandwidth
Bandwidth is commonly thought of as ‘speed’, but it’s actually how much data can be transmitted over a unit of time. The requirements for the Metaverse are much higher than most internet applications and games, and beyond many modern connections. The best way to understand this is via Microsoft Flight Simulator.
Microsoft Flight Simulator is the most realistic and expansive consumer simulation in history. It includes 2 trillion individually rendered trees, 1.5 billion buildings and nearly every road, mountain, city and airport globally… all of which look like the ‘real thing’, because they’re based on high-quality scans of the real thing. But to pull this off, Microsoft Flight Simulator requires over 2.5 petabytes of data — or 2,500,000 gigabytes. There is no way for a consumer device (or most enterprise devices) to store this amount of data.
And even if they could, Microsoft Flight Simulator is a live service that updates to reflect real-world weather (including accurate wind speed and direction, temperature, humidity, raid, and lighting) and air traffic. You can literally fly into real-world hurricanes and storms, while trailing IRL commercial airliners on their exact flight path.
Above & around Hurricane LauraMicrosoft Flight Simulator (Live Weather) pic.twitter.com/T7v8aJ0jhG— Petri Levälahti (@Berduu) August 27, 2020
Microsoft Flight Simulator works by storing a core amount of data on your local device (which also runs the game, like any console game and unlike cloud-based game-streaming services like Stadia). But when users are online, Microsoft then streams immense volumes of data to the local player’s device on an as-needed basis. Think of it like a real world pilot might. When they come over a mountain or around a bend, new light information streams into their retinas, revealing and then clarifying what’s there for the first time. Before then, they have nothing but the knowledge something will be there.
Many gamers think this is what happens in all online multiplayer video games. But, in truth, most game services send only positional data, player input data (e.g. shoot, throw bomb), and summary-level data (e.g. players remaining in a battle royale) to individual players. All of the asset and rendering data is already on your local device, hence the brutal download and installation times, plus hard-drive usage.
By sending rendering data on an as-needed basis, games can have a much greater diversity of items, assets and environments. And they can do so without requiring game-delaying downloads and installations, update batching, or enormous user hard drives. As a result, many games are now embracing this hybrid model of locally stored information plus data streaming. However, this approach is most important for Metaverse-focused platforms. Roblox, for example, needs (and benefits more from) asset, item, and environmental diversity than a title like Mario Kart or Call of Duty.
As the complexity and importance of virtual simulation grows, the amount of data that needs to be streamed will increase. At least for now, Roblox benefits from the fact that a number of underlying prefabs and assets are widely repurposed and lightly customized. Given this, Roblox is mostly streaming data on how to tweak previously-downloaded items. But eventually the virtual platform will want a near-infinite number of permutations and creations (nearly all of which it won’t be able to fully predict).
Virtual twinning platforms (also known as ‘mirrorworlds’), such as Microsoft Flight Simulator, already need to recreate the nearly infinite (and provable) diversity of the real world. This means sending far more (i.e. heavier) data than ‘dark cloud here’ or ‘a dark cloud that is 95% like dark cloud C-95’. Instead, it’s a dark cloud exactly like this. And crucially, this data is changing in real-time.
This last point is key. If we want to interact in a large, real-time, shared, and persistent virtual environment, we will need to receive a superabundance of cloud-streamed data.
(One of these isn’t real)
Compare the ‘real world’ to Fortnite’s map. Everyone on earth is in the same ‘simulation’, at the same time, and with full permanence. If I cut down a tree, that tree is irrevocably gone and gone for everyone. When you play Fortnite, it’s only via a fixed, point-in-time version of the map. And everything you do within that map is shared only with a handful of users, and for a short period of time before it’s reset. Cut down a tree? It will be reset within 1–25 minutes, and it was only ever gone for up to 99 other users in the first place. The map only really changes when Epic Games sends out a new version. And if Epic Games wanted to send your world out to everyone else, they would be selecting your universe, disregarding theirs, and fixing your universe in a specific point of time. For many virtual experiences this is fine. It will be fine for many Metaverse-specific experiences, too. But some (if not the most important) experiences will want persistence across all users, and at all times.
Cloud data streaming is also essential if we want to seamlessly jump between different virtual worlds. Fortnite’s Travis Scott concert involved seamlessly transporting players from the game’s core map to the depths of a never-before-seen ocean, then to a never-before-seen planet, and then deep into outer space. To pull this off, Epic sent all of these game worlds to users days-to-hours before the event via a standard Fortnite patch (this, of course, meant that if users hadn’t downloaded and installed the update before the event started, they wouldn’t be able to participate in it). Then, during each set piece, every player’s device was loading the next set piece in the background. This system works incredibly well, but it requires a publisher to know which worlds a user will go to next and long in advance. If you want to choose, and to choose from a wide range of destinations, you must either download the entirety of all potential options (which isn’t possible), or cloud stream them.
In addition to increased environmental data there is incremental player data. When you see your friend in Fortnite today, the Fortnite server just needs to send you information on where your friend is and what they’re trying to do; the animations (e.g. reloading an assault rifle or falling) are already loaded onto your device and just need to run. But when you have a real-time motion capture being mapped to your friend’s avatar, this detailed information needs to be sent, too. Along with everyone else. If you want to watch a video file within this game, as Fortnite sometimes offers, then this too needs to be streamed inside a virtual world. Hear the spatial audio of a crowd? Same. Feel a passerby brush the shoulder of your haptic bodysuit? Same.
Many players already struggle with bandwidth and network congestion for online games that require only positional and input data. The Metaverse will only intensify these needs. The good news is that broadband penetration and bandwidth is consistently improving worldwide. Compute, which will be discussed more in Section #3, is also improving and can help substitute for constrained data transmission by predicting what should occur until the point in which the ‘real’ data can be substituted in.
Latency
The biggest challenge in networking is also its least understood: latency. Latency refers to the time it takes for data to travel from one point to another and back. Compared to network bandwidth (above) and reliability (below), latency is typically considered the least important KPI. This is because most internet traffic is one-way or asynchronous. It doesn’t matter if it takes 100ms or 200ms or even two-second delays between sending a WhatsApp message and receiving a read receipt. It also doesn’t matter if it takes 20ms or 150ms or 300ms after you click YouTube’s pause button until the video stops. When watching Netflix, it’s more important that the stream plays continuously than right away. To that end, Netflix artificially delays the start of a video stream so that your device can download ahead of the very moment you’re watching. That way, should your network crunch or hiccup for a moment or two, you’ll never notice.
Even video calls, which are synchronous and persistent connections, have a relatively high tolerance for latency. Video is the least important element of the calls, and so audio, which is the ‘lightest’ data, is usually prioritized by video-calling software if there’s a network crunch. And if your latency temporarily increases — even to the point of seconds, not milliseconds — software can save you by increasing the playback speed of your audio-backlog and rapidly editing out the pauses. In addition, it’s easy for participants to manage latency — you just learn to wait a bit.
The most immersive AAA online multiplayer games, however, require low latency. This is because latency determines how quickly a player receives information (e.g. where a player is, whether a grenade has been thrown or a soccer ball kicked) and how quickly their response is transmitted to other players. Latency, in other words, determines whether you win or lose, kill or end up killed. This is why most modern games are played at 2–4× the average framerate of video and why we’ve rapidly embraced these increases, even as we resist higher frame rate for traditional video. It’s required for performance.
The human threshold for latency is incredibly low in video gaming, especially versus other mediums. Consider, for example, traditional video versus video games. The average person doesn’t even notice if audio is out-of-sync with video unless it arrives more than 45ms too early, or more than 125ms late (170ms total). Acceptability thresholds are even wider, at 90ms early and 185ms late (275ms). With digital buttons, such as a YouTube pause button, we only think our clicks have failed if we don’t see a response after 200–250ms. In AAA games, avid gamers are frustrated at 50ms and even non-gamers feel impeded at 110ms. Games are unplayable at 150ms. Subspace finds that on average, a 10ms increase or decrease in latency reduces or increases weekly play time by 6%. That’s an extraordinary exposure — and one no other business faces.
View fullsize
With the above bands in mind, let’s look at average latency globally. In the United States, the median roundtrip time for data sent from one city to another and back is 35ms. Many pairings exceed this, especially cities with high density and intense demand peaks (e.g. San Francisco to New York during the evening). Then there’s the ‘city-to-the-user’ transit time, which is particularly prone to slowdowns. Dense cities, neighborhoods, or condominiums can easily congest. And if you’re playing via mobile, 4G technology today averages another 40ms. And if you live outside a major city center, your data may have to travel another 100 miles and on antiquated, poorly maintained wireline infrastructure. Globally, median delivery latency ranges from 100–200ms between cities.
View fullsize
To manage latency, the online gaming industry has developed a number of partial solves and hacks. However, none scale particularly well.
For example, most high-fidelity multiplayer gaming is ‘match made’ around server regions. By minimizing the player roster to those who live in the Northeast United States, or Western Europe, or Southeast Asia, game publishers are able to minimize latency on a geographic basis. As gaming is a leisure activity and typically played with one-to-three friends, this clustering works well enough. After all, you’re unlikely to game with someone several time zones away. And you don’t really care where your unknown opponents (who you usually can’t even talk to) live, anyway. Still, Subspace finds that roughly three quarters of all internet connections in the Middle East are outside playable latency levels for dynamic multiplayer games, while in the United States and Europe, a quarter are. This mostly reflects the limitations of broadband infrastructure, not server placement.
Multiplayer online games also use ‘netcode’ solutions to ensure synchronization and consistency and keep players playing. Delay-based netcode will tell a player’s device (e.g. a PlayStation 5) to artificially delay its rendering of its owner’s inputs until the more latent player’s (i.e. their opponent’s) inputs arrive. This will annoy players with muscle memory attuned to low latency, but it works. Rollback netcode is more sophisticated. If an opponent’s inputs are delayed, a player’s device will proceed based on what it expected to happen. If it turns out the opponent did something different, the device will try to unwind in-process animations and then replay them ‘correctly’.
These solutions work well for 1v1 games (e.g. 2D fighters), for small latency hiccups (e.g. ±40ms), and titles with a limited range of highly predictable actions (e.g. a driving game, a 2D fighter). But as we expand to more Metaverse-focused experiences with more players, greater variations in latency, and more dynamic scenarios, these solutions degrade. It’s difficult to coherently and correctly predict a dozen players, and to ‘roll them back’ in a non-disruptive fashion. Instead, it makes more sense to simply disconnect a laggy player. And while a video call has many participants, only one really matters at a time and thus there is a ‘core’ latency. In a game, getting the right information from all players matters, and latency compounds.
Low latency isn’t an issue for most games. Titles such as Hearthstone or Words with Friends are either turn-based or asynchronous, while other hits such Honour of Kings or Candy Crush need neither pixel perfect nor millisecond-precise inputs. It’s really just fast-twitch titles like Fortnite, Call of Duty and Forza that require low latency. These sorts of games are lucrative, but they are a small portion of the total games market by titles produced — and an even smaller share of total game time.
Yet while the Metaverse isn’t a fast-twitch AAA game, its social nature and desired importance means it will require low latency. Slight facial movements are incredibly important to human conversation — and we’re incredibly sensitive to slight mistakes and synchronization issues (hence the uncanny valley problem in CGI). Social products, too, depend on their ubiquity. Just imagine if FaceTime or Facebook didn’t work unless your friends or family were within 500 miles, for example. Or only when you were at home. And if we want to tap into foreign or at-distance labor in the virtual world, we need considerably more than just excess bandwidth.
View fullsize
Unfortunately, latency is the hardest and slowest to fix of all network attributes. Part of the issue stems from, as mentioned above, how few services and applications need ultra-low latency delivery. This constrains the business case for any network operator or latency-focused content-delivery network (CDN) — and the business case here is already challenged and in contention with the fundamental laws of physics.
At 11,000–12,500km, it takes light 40–45ms to travel from NYC to Tokyo or Mumbai. This meets all low-latency thresholds. Yet while most of the internet backbone is fiber optics, fiber-optic cable falls ~30% short of the speed of light as it’s rarely in a vacuum (+ loss is typically 3.5 dB/km). Copper and coaxial cables have even worse latency degradation at distance, and more limited bandwidth, which means greater risks of congestion and delayed delivery. These cables still make up a larger portion of those found in residential and commercial building interiors, as well as neighborhoods.
In addition, none of these cables are laid as the crow flies. And what we typically think of as ‘the internet backbone’ is really a loose federation of private networks, none of which fully deliver a data packet (or have the incentive to trade off stretches to a competitor with a faster segment or two). As a result, the networking distance between a pair of servers, or a server and client, can be substantially larger than their geographic distance. Furthermore, network congestion can result in traffic being routed even less directly in order to ensure reliable and continuous delivery, rather than minimizing latency. This is why average latency from NYC to Tokyo is over 3× the time it takes light to travel between the two cities, and 4–5× from NYC to Mumbai.
It is incredibly costly and difficult to upgrade or relay any cable-based infrastructure, especially if the goal is to minimize geographic distance. It also requires considerable regulatory/government approval, typically at many levels. It’s easier to fix wireless, of course. And 5G certainly helps, as it shaves 20–40ms off 4G on average (and promises as low as 1ms of latency). However, this only helps the last few hundred meters of data transmission. Once your data hits the tower, you return to traditional backbones.
Starlink, SpaceX’s satellite Internet constellation company, promises to provide high-bandwidth, low-latency internet service across the United States, and eventually the rest of the world. But this doesn’t solve for ultra-low latency, especially at great distances. While Starlink achieves 18–35ms travel time from your house to the satellite and back, this extends when the data has to go from New York to Los Angeles and back. After all, this requires relaying across multiple satellites. In some cases, Starlink even exacerbates travel distances. New York to Philadelphia is around 100 miles in a straight line and potentially 125 miles by cable, but over 700 miles when traveling to a low-orbit satellite and back down. In addition, fiber-optic cable is much less lossy than light transmitted through the atmosphere, especially during cloudy days. Dense city areas are also noisy and thus subject to interference. In 2020, Elon Musk emphasized that Starlink is focused “on the hardest-to-serve customers that [telecommunications companies] otherwise have trouble reaching.” In this sense, it brings more into the Metaverse, rather than boosting those already participating.
Entirely new technologies, business lines, and services are being developed to cater to the growing need for real-time bandwidth applications. Subspace (Disclosure: portfolio company), for example, deploys hardware across hundreds of cities in order to develop ‘weather maps’ for low latency network pathfinding, operates a networking stack that then coordinates the needs of a low latency application with the many third-parties that make up this path, and has even built an optical network that splices across various fiber networks to further shorten the distance between servers and minimize the use of non-fiber cabling.
Fastly, meanwhile, provides a CDN optimized for low-latency applications, rather than just delivery reliability and bandwidth. The company uses an ‘infrastructure-as-code’ approach that allows clients to customize nearly every aspect of the company’s edge-computing clusters, promises that a software application can clear and replace all cached content across all of these clusters globally within 150ms, and that it can cache and accelerate individual blockchain transactions in real time.
Reliability
Reliability is fairly obvious. Our ability to shift to virtual labor and education is directly dependent on reliable quality of service. This spans both overall uptime, as well as the consistency of other attributes such as download/upload bandwidth, and latency. For many of those who ‘live online’ today, much of the above might seem alarmist. Netflix streams in 1080p or even 4K perfectly fine most of the time! However, services such as Netflix leverage reliability solutions that don’t work well for games or Metaverse-specific applications.
Non-live video services like Netflix receive all video files hours to months before they’re made available to audiences. This allows them to perform extensive analysis so as to shrink (or ‘compress’) file sizes by analyzing frame data to determine what information can be discarded. Netflix’s algorithms will ‘watch’ a scene with blue skies and decide that, if a viewer’s internet speed drops, 500 different shades of blue can be simplified to 200, or 50, or 25. The streamer’s analytics even do this on a contextual basis — recognizing that scenes of dialogue can tolerate more compression than those of faster-paced action. This is multipass encoding. As discussed earlier, Netflix also uses spare bandwidth to send video to a user’s device before it’s needed — thus, if there is a temporary drop in connectivity or increase in latency, the end-user experiences no change. In addition, Netflix will pre-load content at local nodes; so when you ask for the newest episode of Stranger Things, it’s actually only a few blocks away. This isn’t possible for video or data created live, which, per above, also needs to arrive faster. This is why it’s harder to cloud-stream 1GB of Stadia than 1GB of Netflix.
So even though its objective isn’t necessarily competitive in nature, we should think of the Metaverse as raising the requirements for all aspects of networking — latency, reliability/resilience, and bandwidth — to that of AAA multiplayer games. It doesn’t matter how powerful your device is (see hardware and compute) if it can’t receive all the information it needs in a timely fashion.
This is Part III of the nine-part ‘METAVERSE PRIMER’.
Matthew Ball (@ballmatthew)