Popular generative AI web browser assistants are collecting and sharing sensitive user data, such as medical records and social security numbers, without adequate safeguards, finds a new study led by researchers from UCL and Mediterranea University of Reggio Calabria.
The study, which will be presented and published as part of the USENIX Security Symposium, is the first large-scale analysis of generative AI browser assistants and privacy. It uncovered widespread tracking, profiling, and personalisation practices that pose serious privacy concerns, with the authors calling for greater transparency and user control over data collection and sharing practices.
The researchers analysed nine of the most popular generative AI browser extensions, such as ChatGPT for Google, Merlin, and Copilot (not to be confused with the Microsoft app of the same name). These tools, which need to be downloaded and installed to use, are designed to enhance web browsing with AI-powered features like summarisation and search assistance, but were found to collect extensive personal data from users’ web activity.
Analysis revealed that several assistants transmitted full webpage content – including any information visible on screen – to their servers. One assistant, Merlin, even captured form inputs such as online banking details or health data.
Extensions like Sider and TinaMind shared user questions and information that could identify them (such as their IP address) with platforms like Google Analytics, enabling potential cross-site tracking and ad targeting.
ChatGPT for Google, Copilot, Monica, and Sider demonstrated the ability to infer user attributes such as age, gender, income, and interests, and used this information to personalise responses, even across different browsing sessions.
Only one assistant, Perplexity, did not show any evidence of profiling or personalisation.
Dr. Anna Maria Mandalari, senior author of the study from UCL Electronic & Electrical Engineering, said: “Though many people are aware that search engines and social media platforms collect information about them for targeted advertising, these AI browser assistants operate with unprecedented access to users’ online behaviour in areas of their online life that should remain private. While they offer convenience, our findings show they often do so at the cost of user privacy, without transparency or consent and sometimes in breach of privacy legislation or the company’s own terms of service.
“This data collection and sharing is not trivial. Besides the selling or sharing of data with third parties, in a world where massive data hacks are frequent, there’s no way of knowing what’s happening with your browsing data once it has been gathered.”
For the study, the researchers simulated real-world browsing scenarios by creating the persona of a ‘rich, millennial male from California’, which they used to interact with the browser assistants while completing common online tasks.
This included activities in both the public (logged out) space, such as reading online news, shopping on Amazon or watching YouTube videos.
It also included activities in the private (logged in) space, such as accessing a university health portal, logging into a dating service or accessing pornography. The researchers assumed that users would not want this activity to be tracked due to the data being personal and sensitive.
During the simulation the researchers intercepted and decrypted traffic between browser assistants, their servers and third-party trackers, allowing them to analyse what data was flowing in and out in real time. They also tested whether assistants could infer and remember user characteristics based on browsing behaviour, by asking them to summarise the webpages then asking the assistant questions, such as ‘what was the purpose of the current medical visit?’ after accessing an online health portal, to see if they had retained personal data.
The experiments revealed that some assistants, including Merlin and Sider, did not stop recording activity when the user switched to the private space as they are meant to.
The authors say the study highlights the urgent need for regulatory oversight of AI browser assistants in order to protect users’ personal data. Some assistants were found to violate US data protection laws such as the Health Insurance Portability and Accountability Act (HIPAA) and the Family Educational Rights and Privacy Act (FERPA) by collecting protected health and educational information.
The study was conducted in the US and so compatibility with UK/EU data laws such as GDPR was not included, but the authors say this would likely be a violation in the EU and UK as well, given that privacy regulations in those places are more stringent.
The authors recommend that developers adopt privacy-by-design principles, such as local processing or explicit user consent for data collection.
Dr. Aurelio Canino, an author of the study from UCL Electronic & Electrical Engineering and Mediterranea University of Reggio Calabria, said: “As generative AI becomes more embedded in our digital lives, we must ensure that privacy is not sacrificed for convenience. Our work lays the foundation for future regulation and transparency in this rapidly evolving space.”