Google’s Rush to Take Its AI Chatbot Bard Public Led to Ethical Lapses, Employees Say
Shortly before Google introduced Bard, its AI chatbot, to the public in March, it asked employees to test the tool.
One worker’s conclusion: Bard was “a pathological liar,” according to screenshots of the internal discussion. Another called it “cringe-worthy.” One employee wrote that when they asked Bard suggestions for how to land a plane, it regularly gave advice that would lead to a crash; another said it gave answers on scuba diving “which would likely result in serious injury or death.”
Google launched Bard anyway. The trusted internet-search giant is providing low-quality information in a race to keep up with the competition, while giving less priority to its ethical commitments, according to 18 current and former workers at the company and internal documentation reviewed by Bloomberg. The Alphabet-owned company had pledged in 2021 to double its team studying the ethics of artificial intelligence and to pour more resources into assessing the technology’s potential harms. But the November 2022 debut of rival OpenAI’s popular chatbot sent Google scrambling to weave generative AI into all its most important products in a matter of months.
That was a markedly faster pace of development for the technology, and one that could have profound societal impact. The group working on ethics that Google pledged to fortify is now disempowered and demoralized, the current and former workers said. The staffers who are responsible for the safety and ethical implications of new products have been told not to get in the way or to try to kill any of the generative AI tools in development, they said.
Google is aiming to revitalize its maturing search business around the cutting-edge technology, which could put generative AI into millions of phones and homes around the world — ideally before OpenAI, with the backing of Microsoft, beats the company to it.
“AI ethics has taken a back seat,” said Meredith Whittaker, president of the Signal Foundation, which supports private messaging, and a former Google manager. “If ethics aren’t positioned to take precedence over profit and growth, they will not ultimately work.”
In response to questions from Bloomberg, Google said responsible AI remains a top priority at the company. “We are continuing to invest in the teams that work on applying our AI Principles to our technology,” said Brian Gabriel, a spokesperson. The team working on responsible AI shed at least three members in a January round of layoffs at the company, including the head of governance and programs. The cuts affected about 12,000 workers at Google and its parent company.
Google, which over the years spearheaded much of the research underpinning today’s AI advancements, had not yet integrated a consumer-friendly version of generative AI into its products by the time ChatGPT launched. The company was cautious of its power and the ethical considerations that would go hand-in-hand with embedding the technology into search and other marquee products, the employees said.
By December, senior leadership decreed a competitive “code red” and changed its appetite for risk. Google’s leaders decided that as long as it called new products “experiments,” the public might forgive their shortcomings, the employees said. Still, it needed to get its ethics teams on board. That month, the AI governance lead, Jen Gennai, convened a meeting of the responsible innovation group, which is charged with upholding the company’s AI principles.
Gennai suggested that some compromises might be necessary in order to pick up the pace of product releases. The company assigns scores to its products in several important categories, meant to measure their readiness for release to the public. In some, like child safety, engineers still need to clear the 100 percent threshold. But Google may not have time to wait for perfection in other areas, she advised in the meeting. “‘Fairness’ may not be, we have to get to 99 percent,” Gennai said, referring to its term for reducing bias in products. “On ‘fairness,’ we might be at 80, 85 percent, or something” to be enough for a product launch, she added.
In February, one employee raised issues in an internal message group: “Bard is worse than useless: please do not launch.” The note was viewed by nearly 7,000 people, many of whom agreed that the AI tool’s answers were contradictory or even egregiously wrong on simple factual queries.
The next month, Gennai overruled a risk evaluation submitted by members of her team stating Bard was not ready because it could cause harm, according to people familiar with the matter. Shortly after, Bard was opened up to the public — with the company calling it an “experiment”.
In a statement, Gennai said it wasn’t solely her decision. After the team’s evaluation she said she “added to the list of potential risks from the reviewers and escalated the resulting analysis” to a group of senior leaders in product, research and business. That group then “determined it was appropriate to move forward for a limited experimental launch with continuing pre-training, enhanced guardrails, and appropriate disclaimers,” she said.
Silicon Valley as a whole is still wrestling with how to reconcile competitive pressures with safety. Researchers building AI outnumber those focused on safety by a 30-to-1 ratio, the Center for Humane Technology said at a recent presentation, underscoring the often lonely experience of voicing concerns in a large organization.
As progress in artificial intelligence accelerates, new concerns about its societal effects have emerged. Large language models, the technologies that underpin ChatGPT and Bard, ingest enormous volumes of digital text from news articles, social media posts and other internet sources, and then use that written material to train software that predicts and generates content on its own when given a prompt or query. That means that by their very nature, the products risk regurgitating offensive, harmful or inaccurate speech.
But ChatGPT’s remarkable debut meant that by early this year, there was no turning back. In February, Google began a blitz of generative AI product announcements, touting chatbot Bard, and then the company’s video service YouTube, which said creators would soon be able to virtually swap outfits in videos or create “fantastical film settings” using generative AI. Two weeks later, Google announced new AI features for Google Cloud, showing how users of Docs and Slides will be able to, for instance, create presentations and sales-training documents, or draft emails. On the same day, the company announced that it would be weaving generative AI into its health-care offerings. Employees say they’re concerned that the speed of development is not allowing enough time to study potential harms.
The challenge of developing cutting-edge artificial intelligence in an ethical manner has long spurred internal debate. The company has faced high-profile blunders over the past few years, including an embarrassing incident in 2015 when its Photos service mistakenly labeled images of a Black software developer and his friend as “gorillas.”
Three years later, the company said it did not fix the underlying AI technology, but instead erased all results for the search terms “gorilla,” “chimp,” and “monkey,” a solution that it says “a diverse group of experts” weighed in on. The company also built up an ethical AI unit tasked with carrying out proactive work to make AI fairer for its users.
But a significant turning point, according to more than a dozen current and former employees, was the ousting of AI researchers Timnit Gebru and Margaret Mitchell, who co-led Google’s ethical AI team until they were pushed out in December 2020 and February 2021 over a dispute regarding fairness in the company’s AI research. Samy Bengio, a computer scientist who oversaw Gebru and Mitchell’s work, and several other researchers would end up leaving for competitors in the intervening years.
After the scandal, Google tried to improve its public reputation. The responsible AI team was reorganized under Marian Croak, then a vice president of engineering. She pledged to double the size of the AI ethics team and strengthen the group’s ties with the rest of the company.
Even after the public pronouncements, some found it difficult to work on ethical AI at Google. One former employee said they asked to work on fairness in machine learning and they were routinely discouraged — to the point that it affected their performance review. Managers protested that it was getting in the way of their “real work,” the person said.
Those who remained working on ethical AI at Google were left questioning how to do the work without putting their own jobs at risk. “It was a scary time,” said Nyalleng Moorosi, a former researcher at the company who is now a senior researcher at the Distributed AI Research Institute, founded by Gebru. Doing ethical AI work means “you were literally hired to say, I don’t think this is population-ready,” she added. “And so you are slowing down the process.”
To this day, AI ethics reviews of products and features, two employees said, are almost entirely voluntary at the company, with the exception of research papers and the review process conducted by Google Cloud on customer deals and products for release. AI research in delicate areas like biometrics, identity features, or kids are given a mandatory “sensitive topics” review by Gennai’s team, but other projects do not necessarily receive ethics reviews, though some employees reach out to the ethical AI team even when not required.
Still, when employees on Google’s product and engineering teams look for a reason the company has been slow to market on AI, the public commitment to ethics tends to come up. Some in the company believed new tech should be in the hands of the public as soon as possible, in order to make it better faster with feedback.
Before the code red, it could be hard for Google engineers to get their hands on the company’s most advanced AI models at all, another former employee said. Engineers would often start brainstorming by playing around with other companies’ generative AI models to explore the possibilities of the technology before figuring out a way to make it happen within the bureaucracy, the former employee said.
“I definitely see some positive changes coming out of ‘code red’ and OpenAI pushing Google’s buttons,” said Gaurav Nemade, a former Google product manager who worked on its chatbot efforts until 2020. “Can they actually be the leaders and challenge OpenAI at their own game?” Recent developments — like Samsung reportedly considering replacing Google with Microsoft’s Bing, whose tech is powered by ChatGPT, as the search engine on its devices — have underscored the first-mover advantage in the market right now.
Some at the company said they believe that Google has conducted sufficient safety checks with its new generative AI products, and that Bard is safer than competing chatbots. But now that the priority is releasing generative AI products above all, ethics employees said it’s become futile to speak up.
Teams working on the new AI features have been siloed, making it hard for rank-and-file Googlers to see the full picture of what the company is working on. Company mailing lists and internal channels that were once places where employees could openly voice their doubts have been curtailed with community guidelines under the pretext of reducing toxicity; several employees said they viewed the restrictions as a way of policing speech.
“There is a great amount of frustration, a great amount of this sense of like, what are we even doing?” Mitchell said. “Even if there aren’t firm directives at Google to stop doing ethical work, the atmosphere is one where people who are doing the kind of work feel really unsupported and ultimately will probably do less good work because of it.”
When Google’s management does grapple with ethics concerns publicly, they tend to speak about hypothetical future scenarios about an all-powerful technology that cannot be controlled by human beings — a stance that has been critiqued by some in the field as a form of marketing — rather than the day-to-day scenarios that already have the potential to be harmful.
El-Mahdi El-Mhamdi, a former research scientist at Google, said he left the company in February over its refusal to engage with ethical AI issues head-on. Late last year, he said, he co-authored a paper that showed it was mathematically impossible for foundational AI models to be large, robust and remain privacy-preserving.
He said the company raised questions about his participation in the research while using his corporate affiliation. Rather than go through the process of defending his work, he said he volunteered to drop the affiliation with Google and use his academic credentials instead.
“If you want to stay on at Google, you have to serve the system and not contradict it,” El-Mhamdi said.
© 2023 Bloomberg LP