10 Most-Read Blog Posts of 2024
This year’s most-read blog posts focus on the impact of AI across fields like healthcare, law, and democracy, underscoring both AI’s potential and its challenges. As detailed in the AI Index, AI is becoming more open-source and integrated into industries, with investment surging, particularly in generative AI. Yet as studies on legal, medical, and mental health applications reveal, these models often produce errors, “hallucinate” false information, and raise concerns about privacy, transparency, and safety. In their work, our scholars also called for stronger oversight, ethical frameworks, rigorous evaluation, and a human-centered approach to AI’s expansion.
The AI Index, a comprehensive report from Stanford HAI, tracks significant global trends in AI. This story offered up the highlights for readers: The year saw a shift toward open-source models, surging investments in generative AI, and increasing AI regulation. We saw a record 149 foundation models released this past year, 66% of which were open-source (though closed-source models still outperform in benchmarks). The U.S. led in model development and private investment, with $67.2 billion in AI funding – far outpacing other countries. AI has reached or surpassed human-level performance in many benchmarks, driving businesses to adopt AI tools for automation and personalization. Despite these advances, concerns about job security, AI product safety, and the need for regulatory measures are on the rise, with younger and more educated demographics particularly attuned to AI’s impact on employment.
The rise of general-purpose AI, particularly large language models (LLMs), brings serious privacy concerns: How is our personal data being used and protected? Potential for misuse runs the gamut from web data scraped for training to AI-driven threats like voice cloning and identity theft. To address these, Stanford HAI’s Jennifer King and Caroline Meinhardt suggest in their white paper, “Rethinking Privacy in the AI Era,” that stronger regulatory frameworks are essential. They advocate for a shift to opt-in data sharing, a supply chain approach to data privacy, and collective solutions like data intermediaries to empower users in an era dominated by AI and vast data collection.
The emergence of LLMs like ChatGPT and PaLM is transforming the legal field, but with concerning risks, especially around “hallucinations” that generate inaccurate legal information. A recent study by Stanford’s RegLab and Stanford HAI reveals that LLMs frequently produce false or misleading responses to legal queries, with error rates ranging from 69% to 88% on key tasks. These errors are particularly common in complex or localized legal matters, where LLMs tend to misinterpret case precedents, misattribute authorship, and respond with overconfidence to flawed premises.
While LLMs hold potential for democratizing access to legal information, their current limitations pose risks, particularly for users who most need accurate and nuanced legal support. The findings suggest that AI tools in the legal domain require careful, supervised integration to ensure they complement, rather than undermine, human judgment and legal diversity.
Nearly three-quarters of lawyers plan to use generative AI for tasks like contract drafting, document review, and legal research. However, reliability is a concern: These tools are known to “hallucinate” or generate false information. This study tested the claims of AI-powered legal research tools from LexisNexis and Thomson Reuters, finding that although these tools reduced errors compared to general models, they still hallucinated up to 34% of the time. The study highlights issues in the AI-assisted legal research process, such as inaccurate citations and “sycophancy,” where AI tools agree with false user assumptions. The findings underscore the need for transparency and rigorous benchmarking in legal AI products, as current opacity around these tools’ design and performance makes it difficult for lawyers to evaluate their reliability and comply with ethical obligations.
Large language models are rapidly making their way into healthcare, with one in ten doctors using ChatGPT for everyday tasks and some patients turning to AI for self-diagnosis. Despite the enthusiasm, a recent Stanford study highlights significant challenges with LLMs’ reliability in healthcare, particularly around substantiating medical information. Researchers found that even the most advanced LLMs frequently hallucinate unsupported claims or cite irrelevant sources, with models like GPT-4’s retrieval-augmented generation producing unsupported statements up to 30% of the time. These issues are more pronounced for lay inquiries, such as those found on Reddit’s r/AskDocs, suggesting that patients seeking information without a doctor’s mediation may be misled. As AI tools become increasingly common in healthcare, experts urge for more rigorous evaluation and regulation to ensure these systems provide reliable, evidence-based information.
During her time at Google, Stanford computer scientist Fei-Fei Li saw firsthand how AI was transforming industries from agriculture to energy. Inspired, she returned to Stanford with a vision: to make AI serve humanity ethically. This led to the founding of Stanford HAI, now five years into its mission of shaping ethical AI. Through interdisciplinary research, industry collaborations, and active policy engagement, HAI has become a leading voice in responsible AI development, investing over $40 million in research projects spanning healthcare, refugee assistance, and sustainable mining while educating the next generation of AI leaders and policymakers.
Stanford’s James Zou and his team investigated the growing use of LLMs in academic writing and peer reviews, revealing that nearly 18% of computer science papers and 17% of peer reviews include AI-generated content. Through linguistic analysis and expert verification, they identified certain “AI-associated” words that surged in usage following the release of ChatGPT. This rapid adoption, particularly in AI and computer science fields, underscores both the potential benefits and ethical challenges of LLMs in research. Zou argues for more transparency in LLM usage, noting that while AI can enhance clarity and efficiency, researchers must remain accountable for their work to maintain integrity in the scientific process.
Despite the promise of LLMs in healthcare, we have some major challenges to overcome before they can safely be integrated into clinical practice, Stanford scholars find. Their recent study highlights that while LLMs could alleviate physician workload by handling administrative tasks and answering patient queries, these tools pose safety risks and create errors that could lead to harmful outcomes. Current evaluations of LLMs often rely on curated data rather than real-world patient information, and evaluation efforts are uneven across healthcare tasks and specialties. The research team recommends more rigorous, systematic assessments using real patient data and suggests leveraging human-guided AI agents to scale evaluation efforts.
In this conversation with Stanford HAI Policy Fellow Marietje Schaake, the former European Parliament member warns about the unchecked influence of tech companies on democratic institutions. She argues that private companies are increasingly performing functions traditionally reserved for governments – such as surveillance, cybersecurity, and even influence over military and election infrastructure – without the necessary public accountability. Schaake draws on her experiences in the European Parliament and at Stanford to advocate for stricter regulations, transparency, and oversight of tech firms, especially as they control vast resources and data critical to democracy. She suggests reforms including independent tech advisory boards for lawmakers and greater public accountability for companies performing government functions. Schaake calls on citizens to demand federal tech regulation, support data protection laws, and promote transparency around data centers and AI developments to safeguard democratic principles.
As mental health needs surge, Stanford medical students Akshay Swaminathan and Ivan Lopez developed an AI tool called Crisis Message Detector 1 (CMD-1) to improve response times for patients in crisis. CMD-1 uses natural language processing to identify and prioritize high-risk messages, enabling rapid triage within a Slack interface where human responders review flagged cases. Tested on data from mental health provider Cerebral, CMD-1 achieved 97% accuracy in identifying urgent cases and reduced patient wait times from over 10 hours to 10 minutes. The project highlights the potential of AI to support clinicians by streamlining workflows and enhancing crisis response in healthcare settings, and underscores the importance of collaborative, interdisciplinary development to meet clinical needs effectively.
link