KAIKAI

Singapore’s public sector is examining how AI agents could support government services through a joint sandbox initiative with Google. The four‑month project involved the Cyber Security Agency of Singapore (CSA), the Government Technology Agency of Singapore (GovTech) and the Infocomm Media Development Authority (IMDA). As outlined in the official report on the AI Agents Sandbox, the initiative tested how autonomous systems behave in practical government scenarios while identifying governance, security and operational considerations for future deployment.

The sandbox focused on computer-use agents, systems capable of interacting with digital interfaces in ways similar to human users. The objective was to explore both their practical benefits and the risks associated with increased autonomy. The work builds on Singapore’s broader efforts to develop responsible AI governance, including initiatives such as efforts to advance global standards for generative AI testing and benchmarking.

Testing AI Agents Across Government Use Cases

To generate practical insights, participants tested AI agents in three government-related scenarios with varying levels of operational risk.

Automated Quality Assurance for Digital Services

One trial examined whether AI agents could automate aspects of quality assurance (QA) testing for government digital services. The system evaluated websites by assessing response times, verifying search functions and checking page integrity.

The agent was also able to identify intentionally inserted issues such as inactive pages, placeholder text and mismatched staging URLs. Unlike conventional scripted testing tools, the agent used natural language understanding and contextual interpretation, demonstrating how agentic systems may improve efficiency in software testing and maintenance.

Automating AI Safety Testing

A second use case explored how AI agents could support safety testing for AI applications, including chatbots. Government agencies must ensure such systems meet defined safety requirements before deployment.

The sandbox demonstrated that agents can perform large-scale testing across multiple languages and formats, significantly reducing the manual effort required for assessment. While the approach is not entirely error-free, it may provide a more scalable way to strengthen AI assurance as adoption grows. This aligns with broader regulatory attention to AI risk management, such as Singapore’s work to mitigate AI risks across the financial sector.

Supporting Social Assistance Applications

The third scenario examined how AI agents could help individuals navigate complex social assistance application processes. In the trial, agents guided applicants or social workers through programme requirements and submission steps.

The results suggest that agentic systems could reduce administrative burdens associated with incomplete submissions, follow-up requests and helpline enquiries. Such capabilities reflect a growing interest in agent-based automation across public services, including initiatives highlighted in discussions on how agentic automation is transforming Singapore’s public sector.

Key Risk Areas Identified in the Sandbox

While the trials demonstrated potential productivity gains, the sandbox also revealed several risk areas that require careful management as AI agents become more capable and autonomous.

Human oversight: Maintaining accountability and ensuring appropriate supervision when AI agents perform actions that may affect individuals.
Customisation and control: Allowing flexibility for experimentation while ensuring safeguards remain in place.
Cybersecurity risks: Including indirect prompt injection, where agents may be manipulated into executing unintended commands or code.
Data protection and privacy: Managing risks when agents interact with sensitive or personal data.

Practical Considerations for Deploying AI Agents

The sandbox identified several near-term considerations for organisations evaluating the use of AI agents.

Controlled testing environments and gradual real-world deployment were highlighted as effective ways to build trust in agentic technologies. Oversight mechanisms should also be calibrated according to risk: high‑impact actions may require prior approval, while lower‑risk activities may be reviewed retrospectively if appropriate safeguards exist.

Security responsibilities must be shared across multiple layers, including the underlying model or platform, organisational systems and end users. At the same time, systems should be secure by default while allowing bounded flexibility for experimentation and adaptation.

Future Governance and Technical Challenges

The initiative also raised longer-term questions about the infrastructure and governance frameworks required to support an agent-driven digital environment.

Participants highlighted current technical limitations, such as reliance on screenshot-based perception for interpreting interfaces. Additional techniques may be required to improve accuracy in environments containing dense or complex information.

The potential for multi-agent collaboration was also discussed. In such models, several agents may review or refine each other’s outputs. While this could enhance capabilities, it also introduces interoperability and governance questions, particularly if agents developed by different organisations need to interact through common protocols.

More broadly, today’s digital infrastructure is largely designed around human users. Identity systems, authentication mechanisms and access controls may need to evolve if autonomous agents are to operate reliably at scale. Policymakers will also need to balance the benefits of personalisation with privacy protections as agents gain access to more contextual user data.

The sandbox provides an early foundation for understanding how AI agents could be deployed responsibly in public services. Continued collaboration between government, technology providers and the research community will be important as both the capabilities and governance requirements of agentic AI continue to evolve.

This article is created with the assistance of OpenGov AI.