Edge cases

March 13, 2021 Daniel Liberman, Datagen

Computer vision systems are being embedded in, and changing, a huge variety of applications and disciplines. Many of these applications require a high level of accuracy and have very small margins of error. A good example of this is autonomous vehicles. Injuries and fatalities as a result of driving are tragically common, and autonomous vehicles potentially offer a better future with fewer accidents and safer transportation. However, this potential can only be realized if the systems are accurate and error-free. At the moment, autonomous driving systems are still imperfect and cause crashes. In many cases, these accidents occur when autonomous vehicles encounter Edge Case scenarios. In this post, we will discuss edge cases, why they are so challenging for computer vision systems, and how synthetic data can help.

Edge Cases are situations that occur outside of normal operating conditions and parameters. These are situations that aren’t planned for and aren’t supposed to happen but do, even if only rarely. Given a large enough scale, strange and unforeseen events will occur. For example, a Google autonomous vehicle was involved in an accident when it detected sandbags surrounding a storm drain and had difficulty interpreting the situation correctly. The chances that a vehicle will encounter sandbags in the street is minuscule in most geographies. But, for a globally-deployed autonomous driving system to work, it needs to handle even the rarest occurrences. Otherwise, when confronted with a situation that the model hasn’t been trained for or fails to recognize, things get tricky. 

Essentially, the main safety challenge for autonomous vehicles is to improve a 98% safety rate to a 99.999999% safety rate. That increase is incredibly challenging as it requires training data for practically every single edge case imaginable, and even unforeseeable cases. Elon Musk described the need for this level of accuracy on Twitter.  In order for humans to rely on autonomous vehicles, or other applications of Computer Vision, the margin of error needs to be infinitesimal. 

Another example: drones are often used for overhead monitoring for different applications such as security or analytics. These applications require a high level of constant accuracy, for example when monitoring the integrity and functionality of railway infrastructure.  But, if faced with an edge case of sudden lightning storms or sudden flooding or fires, the model that interprets the visual data might fail to correctly interpret the information it sees and react accordingly, putting the safety of train crews, passengers, and cargo at risk.

The challenge of ensuring that CV models are robust enough to successfully handle edge cases is a serious one and is holding back deployment and full reliance on key applications. 

Generally, we would solve this problem with more data. Manually collecting enough data over time increases the chances that our training data will contain edge cases that can be used to train. This is a reason autonomous vehicle training requires millions of miles of driving data.

But, edge cases present specific challenges. First, some edge cases may simply never appear in manually collected data, even if we collect a lot of it. Second, trying to act out or recreate edge cases in the real world in order to generate training data can be impossible, dangerous, and even illegal. For instance, to collect data to train robots to save lives in the event of a burning building you have to burn down a building.  

Fortunately, synthetic data provides an exciting solution to this Edge Case problem by offering the ability to simulate edge cases that can’t be captured manually, through a combination of algorithms, software, and 3D modeling.

We, at Datagen, can help teams bring edge-cases that they are aware of but can’t manually capture, to life. In some cases, it might be scenarios like those described earlier that have safety or other constraints that make capturing them manually impossible. In other cases, teams might simply want more random variation across a wider range of variables. With an awareness of the importance of edge cases, we can design unique solutions together, supplying teams with the datasets needed to train their models effectively.