Start United States USA — software Can synthetic data help train your AI model?

Can synthetic data help train your AI model?

118
0
TEILEN

Yes and no. It’s complicated.
The saying „data is the new oil,“ was reportedly coined by British mathematician and marketing whiz Clive Humby in 2006. Humby’s remark rings true more now than ever with the rise of deep learning. Data is the fuel powering modern AI models; without enough of it the performance of these systems will sputter and fail. And like oil, the resource is scarce and controlled by big businesses. What do you do if you’re a small computer vision company? You can turn to fake data to train your models, and if you’re lucky it might just work. The market for synthetic data generation grew to over $110 million in 2021 and is expected to increase to $1.15 billion by the end of 2027, according to a report published by research firm Cognilytica. Numerous startups have built tools to spin up synthetic images to help companies train their machine learning algorithms. There are many benefits to using computer-generated data, Gil Elbaz, co-founder and CTO of Datagen, explained to The Register. The startup, founded in 2018 and based in Israel, has built a software platform that allows customers to easily create mock images at the click of a button. Synthetic data provides a way to scale up datasets and automatically annotate each picture with the necessary metadata without much human labor. Issues of privacy and bias can be avoided too. „Privacy for human faces is very, very hard, and it’s not ideal to even hold [that kind of data] in your servers,“ Elbaz says. „With our data, there’s no [personally identifiable information]. This is not a real person. This is completely synthetic, so there’s no privacy issues. And bias-wise we can generate whatever distribution of ethnicities, ages, genders you want in your data, so we are not biased in any way,“ he says as he shows us a three-dimensional fake face. Datagen works with companies to train computer vision models for different tasks. Simulated data is used by the automotive industry to develop AI software that automatically detects driver behavior, such as when they’re distracted or falling asleep at the wheel. Fake data has also been used by surveillance camera companies to flag whenever packages have been delivered outside people’s homes. AI applications in augmented and virtual reality also benefit from ingesting copious amounts of synthetic data. Rendering fake data is a complicated process. Datagen uses multiple methods to create computer-made images, from physics-based ray tracing algorithms to generative adversarial networks (GANs).

Continue reading...