In filmmaker Riccardo Fusetti’s experimental animated short Generation, AI-generated imagery is used to tell an expansive yet brief story of human existence. A short film which began with Fusetti’s fascination with text-to-image AI software (which has increasingly been making the news as access to tools such as DALL-E 2, Stable Diffusion and Midjourney open up to the public), it quickly evolved into a full-fledged project where Fusetti would use such imagery on the canvas of an interpretative dancer to tell a story that covers the history of human life. Clocking in at a little over two minutes, Fusetti’s film packs a sensory overload of colourful particles and textures that soar through space and time. Directors Notes invited Fusetti to join us in conversation, as Generation arrives online, to talk about his keen interest in overwhelming the viewer with images, the decision to centre his experimental short on a human element, and the lengthy post-production journey it took to manifest the film into existence.
What inspired you to make a short about the beginning of human existence as seen through the eyes of AI?
I have been following the AI text-to-image generation scene for a while but I only had the chance to start experimenting with it around last May. I was immediately blown away by the potential of this tool and had a lot of fun trying things out at first.
The goal is to completely overwhelm the viewer by the sheer intensity of what’s happening on screen.
My main interest lay in the video application of this new technology and I was very curious to see what other people were doing. Although I could see some incredible work being made out there, everything I saw felt like some kind of proof of concept of what could be achieved. I couldn’t see anyone using it as a storytelling tool so I thought I would try my hand at it.
What attracted you to centre the story around the journey of human existence?
I’ve had the vague idea of portraying the overwhelming feeling of human life for a while, and this tool really gave me the means to fully visualise and express the concept.
Visually, it’s incredibly striking. Was it important for you to make the visual language as intense as possible?
The goal is to completely overwhelm the viewer by the sheer intensity of what’s happening on screen. The film can be looped infinitely and it’s up to the viewer whether they want to be caught in the madness of it all, or pause on a certain frame and appreciate the intricate details it has to offer.
What text-to-image software did you use for the imagery? And were there any other elements of the project that you considered being AI generated?
I used Disco Diffusion, a free and open source text-to-image software that allowed me to generate incredibly detailed individual frames that I could combine together.
The initial idea was for the narration to be written by AI software as well, but I couldn’t get interesting results from this process, and I realised that it was very important to focus on the human element, so I decided to write the narration myself. Me and Teodosia Dobriyanova, our producer, were constantly bouncing ideas off each other until we ended up with the final draft.
At what stage did you decide to use a dancer as your canvas?
I knew immediately that I wanted a human element to piece everything together, so using an interpretive dancer as a canvas for the computer generated images was a no brainer.
Could you walk us through the trajectory of the film from the inception of the idea to its practical production?
This process of experimentation and writing took less than two months, after which I asked Paul Thompson to record the narration. Paul is an actor and musician I had collaborated with before and as I was writing the script I could hear the narration in his voice. We then needed the most important piece of the puzzle, a dancer and choreographer who could really bring life to the narration and we couldn’t have found a better fit than Evie Webzell. Evie’s delicate yet powerful performance is the main drive of the whole film, even when you cannot see her, her movement is leading the whole animation process.
We got in touch with DoP Natalja Safronova, who was excited about the project and brought along Focus Puller Dominika Besińska and Gaffer Dorothy Dee. It was a simple setup but they all did an incredible job. It was great to have such a talented crew join our project and I am very excited about the absolutely striking visuals we pulled off in such a small space and on a very tight budget.
And when it finally came to post, how challenging was it to meld the imagery with Evie’s performance?
I didn’t want to rely completely on AI-generated images. It was crucial for me to be able to control the visuals whilst keeping Evie’s silhouette legible throughout the piece. The way I achieved this was by completely rotoscoping Evie’s performance and then manually animating the imagery seen inside of her body. This was made by combining stock footage, stock images and custom graphics made from scratch. The result of this was a very rough pass of the video that was then fed to the AI to refine and to “paint” on top.
I wanted a human element to piece everything together, so using an interpretive dancer as a canvas for the computer generated images was a no brainer.
It wasn’t too dissimilar to working with a human illustrator, where I would make a rough pass of what I had in mind and then ask them to work on each frame and make it better and more detailed. I would then get the results from the AI frame by frame and composite them on top of the original footage. The film was then scored and sound designed by the artist SINK, who did an excellent job in finding the perfect tone for the project and truly elevated it.
It sounds incredibly time-consuming as a project. How long were you able to work on it from beginning to end?
This whole process was extremely time-consuming and there was a lot of trial and error, especially because each individual frame took around one to five minutes to render. I considered animating the film at 12fps to effectively cut in half the amount of rendering required but I love the fluidity of 24fps so I committed to that. This being a passion project, I had to work on it in between paid work, and it took me about one month and a half of late nights to complete this process, but it was very rewarding in the end.
There’s a skilled, nuanced approach needed when writing text prompts for the AI to interpret. How much trial and error did it take you to arrive at the generated images which feature in the film?
When working with text-to-image AI, the prompt is absolutely crucial. There are many AI artists who achieve incredible results with their prompts but after some initial testing, I found out that I couldn’t really get very consistent outputs. That’s when I realised that my personal way to achieve this control was to animate everything manually at first and only then feed it to the AI with some pretty simple prompts, in order to enhance what was already there. There was still a lot of trial and error and experimentation, whenever something didn’t work I would have to go back to the animation stage, which probably added more time to the project overall.
Has working on this project changed your perspective on AI-generated imagery?
I don’t think so. I find the debate around AI-generated imagery to be extremely interesting but I think it’s really polarized. Most of the opinions I have heard seem to consider this tech either as a complete revolution or as something that will put every artist out of a job. I honestly think this is nothing more than a new tool, a very powerful one for sure, but I don’t think that these tools will dramatically change the way people make art.
Is there anything you can tell us about what you’re working on next?
I’m currently developing some ideas both with and without AI, still in early stages so nothing I can really talk about for now but I’m very excited for the future and can’t wait to be doing interesting work.