Generative AI models, such as DALL-E or Midjourney, learn to create images by analyzing billions of existing image-text pairs. However, these datasets are not neutral; they are predominantly composed of Western media. As a result, when a user prompts for a generic concept like a 'beautiful garden,' the AI reverts to its statistical mean—often a manicured English or French garden—ignoring other cultural styles like Mughal or Zen gardens. This phenomenon reinforces specific cultural norms as 'universal' or 'natural' while rendering others invisible. In STEM, understanding the composition of training data is crucial because it determines the output's accuracy and cultural relevance.
Australian research institutions and government bodies actively study data ethics and the impact of algorithmic bias on society.
Computer vision and AI image generation are deeply rooted in mathematical patterns. Historically, Western art since the Renaissance has prioritized 'linear perspective,' where all lines converge at a single vanishing point, simulating a viewer standing in one spot. This is how cameras (and thus most training data) record the world. In contrast, other traditions, such as Indian miniature painting, utilize 'floating' or multi-point perspectives. These allow a viewer to see a scene from above and the side simultaneously, or to see a character in multiple places within one frame to show the passage of time. Current AI struggles to replicate these non-Western spatial geometries because it quantifies vision based on the dominant linear patterns in its database.
To understand different visual systems and indigenous mapping, Australian cultural institutions provide resources on non-Western art history and First Nations knowledge systems.
Modern AI image generators rely on 'diffusion models,' which are complex systems that turn random noise into clear images based on text prompts. These systems are often called 'black boxes' because their internal decision-making processes are opaque, proprietary, and difficult to audit. Furthermore, these models rely on the pairing of text and image, creating a constraint where the 'visible is enslaved to the sayable.' If a visual concept (like the multisensory experience of a garden's sound and temperature) cannot be easily described in words or captured in a dataset, the AI cannot generate it. This limits AI's ability to capture the 'ineffable' or untranslatable aspects of human experience.
Several Australian organizations are dedicated to the responsible development of machine learning and the regulation of digital technologies.