Just in:
Experience Ultimate Shopping Freedom at 4.4 Shopee Spree: Don’t Worry, Shop Shopee! // TUMI Hosts Global Launch Event in Singapore to Unveil Women’s Asra Collection and Announce Global Ambassador, Mun Ka Young // Global Audience to Witness Thrill of Dubai World Cup // Emirati Aid Reaches Ukraine as Food Shortages Bite // New Nylon Constant Torque Hinge From Southco Provides Position Control In A Compact Package // Samsung Electronics Launches 2024 Neo QLED 8K, Neo QLED, and OLED Displays to Spark the AI Screen Era // Hong Kong Crypto Exchange Application Stalled by US Lawsuit // Sunshine’s Debut Features Leave Tech World Scratching Its Head // Samsung Partners National Heritage Board to Bring a Slice of Singapore’s Cultural Heritage to Samsung The Frame TV // U.S. Compliance Takes Center Stage at OKX Following Industry Jitters // Saudi Arabia Unveils Green Financing Tool to Achieve Net-Zero Goals // Sharjah Chamber Breaks Ground on Final Expansion with New HQ Pact // Melco Style Presents “SANRIO CHARACTERS STUDIO CITY CARNIVAL” – Explore a SANRIO World of Unlimited Love and Cuteness // CABSAT 2024 Ushers in 30 Years of Media Innovation // Universal Language for Healthcare: General Authority Embraces Global Coding System // Infineon and HD Korea Shipbuilding & Offshore Engineering jointly develop ship electrification technology // Hope for Respite as UAE Endorses UN Plea for Gaza Truce // Digital Hub Unveiled: Xposure Launches Platform for Global Photography Community // Simplified Business Moves for Al Reem Island Firms // Octa seeks to clarify Forex swap and swap-free accounts //
HomeBiz TechMIT's new AI: So smart it can predict what happens next from a still image

MIT's new AI: So smart it can predict what happens next from a still image

1480430109 mitvideoainov16

mitvideoainov16.jpg

The AI can add animations to static images, although the researchers acknowledge that it rarely generates the “correct” future.


Image: MIT

Researchers have developed a deep-learning system that can do the very human task of interpreting what’s happening in a photo and guessing what’s likely to happen next.

Better yet, the system, developed by machine-learning researchers at MIT, can express its idea of a plausible future by adding animations to still images, such as waves that would ultimately crash, people who might move in a field, or a train that might roll forward on its tracks.

ADVERTISEMENT

The work could provide a new direction for exploring computer vision by giving machines the ability to understand how objects move in the real world.

The researchers achieved their objective by training two deep networks on thousands of hours of unlabelled video. As the researchers note, annotating video is expensive, but there’s no shortage of unlabeled video to begin training machines to read other signals about the world.

Carl Vondrick, a PhD student at MIT, who specializes in machine learning and computer vision, told New Scientist that the ability to predict movements in a scene could ensure tomorrow’s domestic helper robots don’t become a hindrance. You wouldn’t, for example, appreciate a robot pulling a chair out from under you as you’re about to sit down, he said.

The model was trained on two million videos from Flickr amounting to 5,000 hours of content covering four main scene types, including golf courses, beaches, train stations, and hospital rooms, consisting mostly of images of babies.

As New Scientist notes, the videos the model produces are grainy and short, lasting about one second, but they do capture the right general movement of a given scene, such as a train moving forward or a baby scrunching up its face.

However, the model still has much to learn about how the world works, such as that a train doesn’t infinitely depart from a scene. Still, it did show that machines can be taught to dream up brief plausible futures.

The model is also able to “hallucinate” fictional but reasonable motions for each of the scene categories they explored.

The model was based on a machine-learning technique called adversarial learning, where two deep networks compete against each other. One network generates synthetic video while the other tries to discriminate between generated video and real videos.

Vondrick has previously also trained deep-learning models on hundreds of hours of unlabeled YouTube videos and TV programs such as ‘The Office’ to predict human interactions and gestures, such as a handshake, hug or kiss.

Read more research from MIT

(via PCMag)

ADVERTISEMENT

ADVERTISEMENT
Just in: