Generative artificial intelligence (AI) has notoriously struggled to create consistent images, often getting details like fingers and facial symmetry wrong. Moreover, these models can completely fail when prompted to generate .
Generative artificial intelligence (AI) has notoriously struggled to create consistent images, often getting details like fingers and facial symmetry wrong. Moreover, these models can completely fail when prompted to generate images at different image sizes and resolutions.
Rice University computer scientists‘ new method of generating images with pre-trained diffusion models ⎯ a class of generative AI models that „learn“ by adding layer after layer of random noise to the images they are trained on and then generate new images by removing the added noise ⎯ could help correct such issues.
Moayed Haji Ali, a Rice University computer science doctoral student, described the new approach, called ElasticDiffusion, in a peer-reviewed paper presented at the Institute of Electrical and Electronics Engineers (IEEE) 2024 Conference on Computer Vision and Pattern Recognition (CVPR) in Seattle.
„Diffusion models like Stable Diffusion, Midjourney, and DALL-E create impressive results, generating fairly lifelike and photorealistic images“, Haji Ali said. „But they have a weakness: They can only generate square images. So, in cases where you have different aspect ratios, like on a monitor or a smartwatch … that’s where these models become problematic.“
If you tell a model like Stable Diffusion to create a non-square image, say a 16:9 aspect ratio, the elements used to build the generated image gets repetitive. That repetition shows up as strange-looking deformities in the image or image subjects, like people with six fingers or a strangely elongated car.