Eric Yu
Eric Yu

Robotics Institute Research

A mock-up image of synthetic data to be used for training machine learning models, complete with bounding boxes as well.

A mock-up image of synthetic data to be used for training machine learning models, complete with bounding boxes as well.

NOTE: These projects are still ongoing, so some data has been omitted so as to not reveal sensitive information.

February 1st, 2018

I am a Senior Animation Designer at the Robotics Institute of Carnegie Mellon University, and my job is to support research into Computer Vision and Machine Learning. My main project involves researching the benefits of Synthetic Data in training object-recognition, machine learning models. Thus, as artists, we were given the unique chance of applying our skills to create as many generated images as we could, not for the eyes of an audience, but for the eyes of a machine.

The Problem

In the burgeoning fields of both AI and Computer Graphics, there exist two studies that are increasing in popularity; machine learning and computer vision.  Machine Learning is a field in AI research which looks into how systems can learn and improve from data, without explicitly being programmed to.  Computer Vision is a field of study which delves into developing techniques to help computers “see” or understand objects in digital images or videos.

With a powerful computer, you could train a machine learning model to recognize dozens of objects with a higher accuracy than humans, so long as we feed it tons of images or videos.  However, the main issue is that in order to train it, you need to have an extremely large amount of data, especially data that is labeled to help the algorithm learn.  The amount of real-life photos needed is immense, and the time to acquire and label that amount is extraordinary, to the point where it is prohibitive for most people to start, and of course more is always better.

To solve this, many places are now looking into synthetic data, or fake data generated in animation/game software or generated via programming, to augment their datasets. It’s cheaper, easier to amass and label, and can sometimes improve the dataset as well. This is where our job comes in.

The Team

Jessica Hodgins, a professor at Carnegie Mellon and President of SIGGRAPH, put together a small team of artists and programmers together in order to help the researchers in the Robotics Institute create synthetic data.

  • Eric Yu (Technical Artist, Modeler/Texturer): I was one of the original people hired, and am in charge of creating and texturing assets that can’t be acquired from online. I also make scripts, primarily in Python, to help with the scene creation process.

  • Melanie Danver (Scene Layout, Render Master): Melanie is in charge of assembling all of our assets into a scene that we can make videos in (primarily in Maya). She is also in charge of managing our rendering processes, done in a specialized cloud server.

  • Kevin Carlos (Motion Capture, Game Engine Programmer): Kevin was a mocap specialist who has recorded, cleaned, and animated most of the motions we use. Having been a game design major he has also helped us with creating scenes in Unity and Unreal. He currently works at Gearbox Software.

  • Jimmy Krahe (Modeler/Texturer, Scene Layout): Jimmy was a 3D artist who helped shape our very first scenes in Maya and ZBrush, having been the other person who was originally hired. He currently is studying at Virginia Commonwealth University.

The Solution

With synthetic data we are able to generate millions of “fake” images that we can use to train machine learning models, at a fraction of the time and cost of acquiring real life images. Labeling the images is also handled by the program, so we don’t need to do it by hand anymore. Plus, if there are any gaps in the data set (such as not enough dogs or not enough women in the data), we can add any underrepresented objects or actions, and generate millions of those to fill in the gaps. We can also export any additional information that is required, such as depth maps or bounding boxes.

An example of domain randomization, programmed in Unity by Kevin Carlos. Notice the random textures, vehicle layout, and distractor shapes.

An example of domain randomization, programmed in Unity by Kevin Carlos. Notice the random textures, vehicle layout, and distractor shapes.

Of course, this is an ever evolving field, so new information is bound to come up. We originally tried to make realistic images to match real life data. Thus, we used Autodesk Maya and the Arnold Renderer to make our scenes as realistic as possible, with textures done in Substance Painter.

However, according to a paper from researchers at NVIDIA (click here to see it), domain randomization, or random unrealistic scenes, proved to offer some more beneficial results. Instead of making realistic scenes, randomly change the textures, or throw some shapes in there. These unrealistic scenes have actually been shown to improve performance, since networks are forced to focus on only the essential characteristics of the objects. With this, we used game engines such as Unreal and Unity, since it’s easier to make quick randomization algorithms that are ready to use without the need to render.


My Role

Making the Environment

Different sun angles allow us to multiply our number of unique images by 13.

Different sun angles allow us to multiply our number of unique images by 13.

I was one of the artists tasked with creating realistic assets, using Maya primarily for modeling and Arnold for rendering. I also learned Substance Painter and Substance Designer to create realistic textures, with high quality scans from Quixel, so that the areas we built resembled realistic urban centers. For foliage, we primarily used the MASH network to make instances of trees and bushes, keeping the file sizes down. In order to create a large number of images, for each 5 second sequence we made, we changed the sun angles and camera that we used, turning 150 individual images into 58,500 images. Render times were kept below 20 seconds per frame. I was able to bring my modeling/texturing skills into game engines as well, particularly the Unreal Engine.

I also helped our Mocap Artist, Kevin, with recording motion capture shoots. With that data, I was able to use Motionbuilder to bake the animation onto the skeletons of our characters before combining separate clips to form a cohesive motion to be imported into the scene.

Garage model with sand brick material created in Substance Designer.

Garage model with sand brick material created in Substance Designer.

Turnaround of a small apartment, made in Maya and textured in Substance Painter.

Turnaround of a small apartment, made in Maya and textured in Substance Painter.

Recording bicycle motion capture at the Carnegie Mellon Motion Capture Lab.

Recording bicycle motion capture at the Carnegie Mellon Motion Capture Lab.

Scripting

With my background in programming, I was able to create scripts that helped my team generate scenes more efficiently, cutting down scene creation time to one third of what it was previously. Mostly done in Python, the scripts I made ranged from helping with rigging to producing depth maps for the researchers to use.

Vehicle Rigging: Many of the vehicles we find online are not rigged, so before we rigged them by hand. Now, this script produces controllers for the hood/trunk, the body, and the doors, rigs them with the corresponding vehicle part, and links wheel rotation with forward movement. A movable vehicle with a click of a button!

car_rigging.gif

Vehicle Spawning: Before when we made a scene, we would have to place vehicles by hand. Now, at designated spawn points, we are able to randomly generate vehicles at appropriate locations, with a window popping up to determine the number of vehicles, type of vehicles, and whether the vehicles would move or not.

vh_spawn.gif

Depth Map Creation: Made with help from researcher Stanislav Panev, this script generates a depth map of the scene, with white being closest, and black being furthest away. This script automatically generates a depth map from each frame for every camera.

depth_map.gif

Blinking: Allows us to induce blinking onto a person model without having to manually animate it. You designate a time period and the script puts in blinking at random intervals by manipulating the blend shape deformations.

eyeblink_1.gif

What I Learned

Though this project is still ongoing, I have learned so much during my time here. Having never before studied in the fields of Computer Vision nor Machine Learning, I had to learn a lot in order to keep up with the researchers, so as to better aid them. Indeed, this cross-functional environment has taught me how to communicate the intricacies of the 3D pipeline to the researchers of the Robotics Institute. Through these frequent meetings we were able to determine what they needed and were able to communicate solutions (not everyone knows about UV’s or joints in a skeletal system), which helped them scope their projects. Collaboration is key in our workplace.

Plus, since the fields we are working in are ever-evolving, we needed to come up with creative solutions for new problems, such as domain randomization and depth map creation to name a few. Thus, we had to think on our feet, drawing on what we knew and learning new methods or software in order to keep up with the demands of the field. When domain randomization came to our eyes we needed to learn Unreal and Unity. When a depth map was required to train the model on depths we needed to find a way to do that in Maya efficiently (ultimately finding a way to do it with Legacy Render Layers or Arnold AOVs). We had to become adept problem solvers, but we were always up for the challenge.

The future of Synthetic Data is looking bright, with autonomous cars and AI driven solutions on the rise. Who knows what the future has in store? Whatever it is… we’re ready!