Genie

Reimagining homescreen interfaces using generative AI

Genie

Reimagining homescreen interfaces using generative AI

Type

Master's Graduation Project

Year

2025

Timeline

Jan - Apr (4 Months)

Team Size

Individual + 2 Supervisors

Keywords

Generative AI, Generative UI, Design Futures, Design Fiction, Multi-Agent Systems.

The Context

The app had a cluttered interface, making it difficult for users to navigate and find essential features. Users were facing issues with the onboarding process, which was affecting new user adoption rates. The app lacked personalization and customization options, making it less engaging and user-friendly.

Objectives

The redesigned app features a clean, clutter-free interface, making it easier for users to navigate and access essential features.

The improved onboarding process resulted in a 35% increase in new user adoption rates.

The addition of personalization and customization options enhanced user engagement, leading to a 25% increase in user retention rates.

Is this the best we can do?

For decades, home-screen interfaces have remained fundamentally unchanged: a passive, app-centric model that forces users to adapt to the system.

For over forty years, the way we interact with our digital devices has remained largely the same. We navigate through a static grid of applications, forcing our intentions to conform to a rigid, app-centric model. In fact, our home screens have looked essentially the same since Xerox invented the graphical interface Alto and the System 1 by Apple. Of course, we have come a long way when it comes to what our operating systems could do. However, what hasn’t changed all that much is how we interact with it.

This realization hit me during my master's program at Simon Fraser University, leading to Genie, my exploration of what interfaces could become. This project explores an alternative by reimagining the home-screen as a dynamic, responsive environment.

Identifying the problem

This approach works for many good reasons, which is why we've used it for 40 years. But it assumes users know exactly which apps to use, how to use them, and in what order.

The current paradigm led me to believe that there is a "crisis of imagination," a stagnation in design where we've accepted a passive, fragmented user experience. The process of performing a simple task like checking flight times or creating a calendar event requires multiple manual steps and navigation across different apps, creating unnecessary cognitive load.

It’s also inaccessible for people who aren’t tech-savvy, or who have physical or cognitive disabilities as they find these systems rigid and unforgiving.

And even when the user knows what they’re doing, there are so many options. There are multiple apps that all do the same thing slightly differently. You spend time downloading and comparing tools to find the right one, which is then behind a subscription.

Designing for the future?

So then how do we go beyond incremental improvements and rethink the way we interact with our devices?

Acknowledging this crisis of imagination led me to ask a broader question:

How do we go beyond incremental improvements and rethink the way we interact with our devices?

To tackle that, I needed a methodology that wasn’t just about fixing what’s broken, but one that could help me imagine what should be and what could be. I employed a dual methodological approach, using Design Futures and Design Fiction.

Design Futures merges Design Thinking's problem-solving with Futures Thinking's foresight. The (modified) diagram above by Elliot P. Montgomery shows where these methods lie in this spectrum between the Unconstrained & Imaginary and Constrained & Applied design practices.

In this approach, I looked at what’s currently trending today—AI, LLMs, agentic frameworks and tools and designed for probable futures i.e. scenarios that are likely enabled by near-term technological advancements in other words, what could we realistically build now?

Acknowledging this crisis of imagination led me to ask a broader question:

How do we go beyond incremental improvements and rethink the way we interact with our devices?

To tackle that, I needed a methodology that wasn’t just about fixing what’s broken, but one that could help me imagine what should be and what could be. employed a dual methodological approach, using Design Futures and Design Fiction.

Acknowledging this crisis of imagination led me to ask a broader question:

How do we go beyond incremental improvements and rethink the way we interact with our devices?

But designing only for what’s currently possible wasn’t enough. You cannot imagine the future without thinking further ahead. Drawing inspiration from Dunne and Raby in their book Speculative Everything: Design, Fiction and Social Dreaming, who posit speculative artifacts as tools for critical inquiry and debate, I developed a diegetic prototype video. This approach, which David A. Kirby defines as a prototype that exists within a fictional narrative in his book Lab Coats in Hollywood , allowed for the visualization of Genie's potential in a more technologically mature ecosystem. This approach makes use of the power of 'vision videos,' which are identified as 'very effective' platforms for telling stories and projecting technological implications over time, much like Apple's historical 'Knowledge Navigator' vision

Design × Futures

Design Futures is a fairly new industry practice; it's a hybrid speculative design methodology combining elements of Design Thinking and Futures Thinking. According to Jod Kaftan, Head of Product Design & Research at Oracle, to design for the future (in UX contexts) requires an understanding of future thinking. He explains that the first step in this process is to identify a future's drivers and signals.

Drivers are "tides" or underlying factors in the present that shape the future with deep roots in historic patterns of change. This can be cultural, economic, technological, or environmental but most often they are a combination of these forces. Signals are "waves" or a "hit" when we scan our surroundings for evidence of a "specific example of the future happening today." A signal is often a recent small or local innovation (a new product service, initiative, policy, data, social convention or technology) with the potential to affect how we might live in the future.

My research identified two major drivers shaping the future of computing: the advancement of generative AI and the concept of multi-agent systems. I found my signals in emerging products. Vercel's v0, Lovable and Bolt are for instance, a clear signal that generative AI could create functional user interfaces. Similarly, the development of multi-agent frameworks like AutoGen and LangGraph were signals that AI systems were becoming more capable of complex, collaborative tasks. These observations gave me the confidence to choose generative AI and multi-agent systems as the core technologies for my solution.

Phil Balagtas, Experience Design Director at McKinsey and founder of the Design Futures Initiative emphasizes that the future should not be seen as a straight line but as a cone of possibilities (many "futures"), encompassing probable, plausible, and possible outcomes. Speculative design encourages exploring the outer realm of possibility to ignite imagination and understand the impacts and implications of design responses.

I chose to focus on two: the probable future and the preferred future. The probable future is what is most likely to happen based on current trends. I designed and built my functional proof-of-concept for this future, grounded in the technological constraints of today. The preferred future, by contrast, is the ideal future we wish to create, informed by our values and a vision for what could be. By designing for both, I was able to create a product that not only works today but also serves as a guiding vision for tomorrow.

Identifying the Drivers of Future Interfaces

As part of identifying drivers, I looked into how large language models (LLMs) and multi-agent systems are reshaping software development and interface design. Li et al. (2024) and Liu et al. (2024) showed how LLMs can handle everything from requirements analysis to debugging, shifting effort from coding to orchestration. Udoidiok et al. (2024) explored multi-agent setups where specialized “workers” collaborate under a “supervisor”. MAxPrototyper and PrototypeAgent, developed by Yuan et al. (2024a, 2024b), use coordinated agents to generate text, images, and layouts for coherent UI prototypes.

I also studied the role of voice interaction in next-generation interfaces. Corbett & Weber (2016) and Myers et al. (2018) outlined how early voice systems struggled with rigid commands and poor recovery. With LLMs, those constraints ease and models can interpret flexible phrasing, handle follow-up naturally, and even reveal hidden features mid-conversation. This opens the door for voice to act as a fast, low-friction way to feed intent into a generative UI pipeline.

In short, the takeaway I got from this was that LLMs provide the reasoning, agents provide the structure, and voice lowers the barrier to stating what you need.

Okay, so what now?

Going a step further, I analyzed some tools and devices that have attempted to change how we interact with interfaces. I chose three interesting different products that target different needs to understand the strengths they offer and the limitations they faced. My work was informed by closely analyzing Vercel's v0, Natural by Brain Technologies Inc., and the Rabbit R1 device.

Vercel’s v0 can transform plain-text descriptions into working React components, making it a powerful tool for developers. Yet its focus remains front-end prototyping, not complete end-to-end experiences, and it still assumes a developer in the loop.

Natural by Brain Technologies Inc. takes a more radical stance, abandoning the app grid entirely in favor of dynamically generated interfaces from natural language. But its closed architecture and limited extensibility make it feel less like a platform and more like a specialized search engine.

Rabbit R1 positions itself as an AI-powered personal assistant with a “Generative UI” that adapts to the user. It’s impressive, but its scope is bound to a fixed set of predefined tasks, leaving no room for dynamic, extensible orchestration.

Taken together, these tools hint at something larger, a future where the interface itself is ephemeral, summoned only when needed. But none fully bridges the gap between intent and execution.

Genie explores how Generative AI and multi-agent systems can translate user input into interfaces.

Creating Genie

Genie is a proof-of-concept that reimagines the home screen as a dynamic, responsive environment. Instead of hunting for the right app, users simply state their needs in natural language, and Genie instantly generates a tailored, context-specific interface. This project was a deep dive into how generative AI and multi-agent systems could orchestrate a fluid user experience that is proactive, intuitive, and efficient.

How does it work?

Genie uses a small network of specialized agents coordinated by a Supervisor. The Supervisor parses user intent (via LLMs) and orchestrates Worker agents — for example: Weather, Stocks, Calendar, and a Coder agent that produces React widgets. Orchestration is implemented as visual workflows in n8n, which maps well to the agent metaphor: its node-based editor makes decision logic explicit, enables LangChain-style nodes, and integrates with many LLMs and external APIs. During the prototype the Supervisor used GPT-4o for intent reasoning, a memory buffer maintained recent context, and Worker agents executed API calls and returned structured JSON that the Coder agent translated into components committed to GitHub. This approach let me iterate quickly on agent coordination without building a bespoke orchestration engine.

Squashing early assumptions

The objective was to challenge preconceived notions, refine the project direction, and build a stronger foundation before moving into development.

The project applied my supervisor’s 5 Intelligent Failures exercise, grounded in Research Through Design, to surface and test foundational assumptions. I decomposed the work into five components and deliberately induced small failures using sketches, mockups, and simple code prototypes. Each failure was tested with peers and external observers to gather feedback and iterate quickly.

Early Development

This approach has many good reasons, which is why we've used it for 40 years. It's the least-worst design method. It assumes users know exactly which apps to use, how to use them, and in what order.

The initial design process began with a rough framework diagram that mapped out the essential components of the AI-driven interface system. This diagram provided a visual outline of how different agents would interact with user inputs and generate corresponding outputs. At this stage, the focus was on:

Defining a decision-making “Supervisor” agent responsible for delegating tasks based on user input.

Mapping specialized “Worker” agents (e.g., weather, stock market, calendar) that would handle specific queries. These were chosen based on their high relevance to common user tasks and the relative ease of integrating public APIs for real-time data retrieval.

Outlining how generated content would be transformed into visual components for display.

Preprototyping

Before writing production code I built two parallel, low-cost artifacts

(1) tldraw sketches that explored layout and interaction states (how a generated UI collapses, expands, or hands control back to the user)

(2) n8n workflow prototypes that encoded voice → action sequences.

The tldraw frames validated layout affordances and interaction hand-offs; the n8n prototypes validated orchestration: intent parsing, worker sequencing, and the structured JSON the frontend would consume. These artifacts reduced risk before investing in the full stack.

Development Phase

Genie is built as a hybrid prototype that blends Generative AI with a multi-agent orchestration layer to create dynamic, task-specific interfaces. At its core, the system takes a natural language request, parses it into structured intent, delegates subtasks to specialized agents, and then returns a fully functional interface assembled in real time. This process is coordinated through n8n, a visual workflow automation tool. By using n8n as the orchestration layer, I could design, debug, and refine agent flows visually without investing in a custom backend from scratch.

How does it work?

Frontend Architecture

The interface is rendered in React, using GridStack.js to manage layout, resizing, and drag-and-drop interactions. The renderer interprets the JSON specification coming from the orchestration layer and maps each component definition to a corresponding widget. GridStack’s persistence features, paired with a Supabase backend, allow widget positions and configurations to be stored between sessions. This means that even though Genie dynamically generates new widgets on demand, users retain control over how those widgets live on their screen.

Backend and Multi-Agent Architecture

The multi-agent architecture in Genie is expressed entirely within n8n. The Supervisor workflow receives the parsed intent and decides which Workers to activate. For example, a travel-related request might trigger Calendar, Flight Search, and Weather Workers in sequence, with their results fed into the Coder Agent. The Coder Agent, an AI model prompted to produce React components from structured data, transforms these outputs into functional UI code.

During prototyping, GPT-4o served as the reasoning model for the Supervisor, while o3-mini was used for code generation due to its speed and efficiency. n8n’s LangChain-style nodes made it possible to structure these calls and chain results between agents. Context was preserved between turns via a lightweight memory buffer in the Supervisor flow, enabling multi-step interactions without losing track of prior actions.

Explaining what each agent does.

Results and Limitations

Design Fiction

So what could be next? While our core interface concepts haven't shifted much, another area of technology has exploded: Generative Artificial Intelligence. We've seen the rise of incredibly powerful Large Language Models – systems like ChatGPT, Gemini, Claude, and others. These systems go beyond chat—they can plan, delegate, and collaborate across tasks. We’re already seeing hints of this potential manifesting in the real world. Tools like Vercel's v0 can generate functional web UI components directly from text descriptions without needing to know how to code. My work was informed by a detailed analysis of these tools and their limitations.