Home United States USA — software Facebook advances computer vision using hashtagged pictures

Facebook advances computer vision using hashtagged pictures

May 2, 2018

455

At F8, Facebook explains how it’s more efficiently and more effectively training computer vision models with the help of photos already labeled with hashtags.
Hashtagging pictures of your #pitbull on Instagram is accomplishing more than just connecting you to other dog lovers.
Facebook announced Wednesday that it’s been using publicly available, hashtagged photos to train computer vision models — and it’s achieved breakthrough results.
Computer vision models typically rely almost entirely on hand-curated, human-labeled data sets. This makes it the biggest limiting factor in computer vision, Facebook CTO Mike Schroepfer said on Day 2 of the F8 developer conference in San Jose.
To address this, Facebook has instead trained models with a set of 3.5 billion publicly available images and 17,000 hashtags. By using 1,500 user-supplied hashtags as labels for a 1 billion-image version of the data set, Facebook achieved a record score of 85.4 percent accuracy on ImageNet, a common benchmarking tool.
In a blog post, Facebook explains that it trained a large-scale hashtag prediction model to sort through hashtags that aren’t useful for tagging images, like #tbt.
In the future, Facebook plans to open source the embeddings of these models for the research community.
Schroepfer called AI « the foundation of everything we do » at Facebook, running through the company’s various AI research efforts .
In addition to working on computer vision, Facebook is working on natural language processing. It’s open sourcing Translate, a PyTorch language library, for fast machine translations. Schroepfer also mentioned Facebook’s early work on Multilingual Unsupervised and Supervised Word Embeddings ( MUSE), which should help increase the number of languages available for translation on Facebook.
Meanwhile, Facebook AI Research (FAIR) is working on advancing reinforcement learning. In partnership with researchers at Georgia Tech, FAIR developed a collection of virtual agents that use vision, dialog, and reasoning to physically navigate environments and answer questions. For instance, Schroepfer explained, the agent could answer a question like, « In which room is the light on? » Because the agents are trained in 3D virtual environments rather than the real world with robots, Facebook can train them several times faster. The research team is open sourcing their EmbodiedQA and House3D projects.
Facebook also announced PyTorch 1.0, the latest version of the open source framework PyTorch, as well as the expansion of the Open Neural Network Exchange (ONNX) format, which enables engineers to easily move AI models between frameworks.