Intro
This week was an exciting one for two reasons:
It snowed in New Haven a lot. It is extremely cold, but also exceptionally pretty and, as of yet, I am not over the novelty or waking up in Narnia every day. Even if it means I currently have to run on the treadmill (yawn!).
I officially became Dr. Jess Morley. I passed my viva on 1st November last year with no corrections, but it took until Friday of this week (19/01/2024) to get what Oxford refers to as 'leave to supplicate.' This is the official university confirmation that you having completed your degree, with the examiner's board having signed off the report from the viva examiners, and it gives you permission to a) use the 'Dr' title; b) book graduation; and c) upload your thesis to the Oxford Research Archive for all to read. I feel relieved and strangely emotional about the whole thing. I also immediately updated my title on X and LinkedIn - this might be cringey, but I worked super hard for the privilege and I'm going to take full advantage of it.
Ok, now on with the weeknotes!
Things I worked on
Auditing the quality of AI/ML clinical trials. Over the last few years there has been a significant push to encourage those developing AI/ML-based clinical interventions to conduct clinical trials. Similarly, there has been a concurrent push to encourage the publication of trial results. The impact of both these 'pushes' is currently unknown i.e., it is not clear whether the drive to increase AI/ML-based trials and their publication has resulted in: (a) an increased proportion of the AI/ML-based products currently available on the market being subject to clinical trials; (b) an increase in the number of AI/ML-based clinical trials completing and reporting results; and (c) an increase in the quality of AI/ML clinical trials and their results. I would like to fill this knowledge gap, so I am currently in the process of scoping what an audit of clinicaltrials.gov would look like in this context.
App Store audit. As I mentioned last week I'm working with a couple of other members of the DEC to update a large audit I conducted a few years ago of health, wellness, and medicine apps available on the Apple and Google 'App Stores' and the availability/quality of the evidence available to support their claims of efficacy. We have now completed the first round of analysis of the results and I am in the process of scoping what the paper will look like based on the results. This largely involves (a) developing the supporting theory that explains why the results are what they are and what the implications of the results are; and (b) devising the relevant policy recommendations in response to the results.
Risk stratification. For some time now, I have been working with a couple of colleagues, and good friends, on a couple of papers about the (largely) unregulated use of risk-stratification tools in the NHS. We wanted to empirically assess how many such tools are in use and understand the impact of their use. The papers of these studies are now nearly finished and ready for submission, so I was just doing a final round of checks to ensure no references are missing etc.
Ethics as a Service workshops. Back in 2021, I published a paper introducing the concept of Ethics as a Service, based on the idea of software as a service in cloud computing, which aimed to find a balance between top-down externally-driven ethics-based practices in AI development, and bottom-up internally-driven ethics-based practices. I am currently been involved in developing some workshops that will eventually result in a piloting of the concept, but for now simply involve exploring how much is currently understood about AI ethics by developers of AI-based healthcare products. Mostly this involves posing very open questions and listening to the ensuing discussion. For example:
Data Collection & Curation:
How might the the data being used to train and test the model be assessed for ethical validity?
How should decisions regarding what data types and items are needed to train, test, and operate the AI product be made? Who should be involved in these decisions?
What does data quality mean from an ethical perspective? How can this be assessed? How should quality be reported? What might be the implications of poor data quality?
Model Building:
How might the choice of model affect the ethical outcomes of an AI product?
What trade-offs might need to be considered in the model selection and design process? How should decisions involving trade-offs be resolved?
What are the ethical implications of parameter selection and tuning? How might these implications be anticipated?
How might the principles of pro-ethical design be built into the model build and design process?
Model Validation:
How is model ‘performance’ defined?
How can the causation of clinical outcomes be assessed and determined?
How can the model be safely tested in a real world environment?
How can the potential social consequences of the model be anticipated and assessed?
Model Deployment:
What sociotechnical considerations might arise when the model is packaged and deployed?
What are the responsibilities of the AI developer with regards to assessing ‘system readiness’ of the model purchaser?
What are the ethical implications of model versioning? How might this be controlled?
Should the model be sold with ‘on-label’ and ‘off-label’ uses? What might be the consequences if the model is used off-label?
What needs to be considered when deploying a model into a clinical pathway? How can the effects of this deployment be tested?
Things I did
Published a paper in the BMJ with my colleagues from Oxford, looking at the availability of trial results from clinical trials registered on the EU Clinical Trials Registry: https://bmjmedicine.bmj.com/content/3/1/e000738
Submitted the below abstract, based on my thesis, to the Yale AI in Medicine Symposium:
Healthcare systems across the globe are under increasing strain from spiraling healthcare costs, staff shortages, inequitable coverage, and increasing numbers of complex multi-morbid patients. Consequently, many healthcare systems are witnessing diminishing returns. Life expectancy is, for example, declining in both the US and the UK. Datafication, increased interest in the use of AI, and the emergence of data-driven approaches to systems biology have combined to give national and international policymakers the idea that the answer to overcoming these systemic issues might lie in the learning healthcare system (LHS) concept. The LHS concept centers on the importance of informational feedback loops where “real-world” data is recorded in clinic when a patient is “treated” or “diagnosed”. This data is used to train computational models that lead to new understandings of disease and its underlying causes (systems medicine). This new understanding is translated into tools used in the clinic to help clinicians advise patients on how to prevent disease. The outcomes of these preventive interventions are recorded and fed back both into new computational models and into the wider system for the development of, for example, new clinical guidelines. Via these feedback loops, global healthcare systems can “learn” how to produce the best outcomes most efficiently and so become more efficient, more effective, and more sustainable, as well as more evidence-based, less wasteful, and less harmful. Capitalizing on these opportunities requires the implementation of translational tools that can bring together the three elements (data, AI, and systems biology) underpinning the hopes in the LHS concept. Specifically, turning global healthcare systems into national learning healthcare systems (NLHS) requires the implementation of ‘Algorithmic Clinical Decision Support Software’ (ACDSS). ACDSS works by using algorithms, such as Neural Network or Decision Tree, housed within the backend (or knowledge engine) of a software application to ask and answer questions such as “if this patient’s physiological and medical conditions are similar to the majority of other patients with the same category of disease, then what will the most effective treatment be?” and pass this information back to individual clinicians and the wider system, so enabling the transition to NLHS. The difficulty is that, whilst researchers have successfully developed examples of ACDSS ‘in the lab,’ ACDSS has not yet made its way into clinical practice (at least not at scale). Implementation has been beset with multiple technical, ethical, regulatory, and cultural difficulties. This gap between the ‘hype’ and ‘reality’ of ACDSS has become known as the ‘implementation gap’. Drawing from a 3-year mixed-methods research project, the purpose of this talk is to present the conceptual model of information infrastructure requirements designed to enable the successful (technically feasible, socially acceptable, ethically justifiable, and legally compliant) closure of this gap.
Recorded a television programme about AI and disinformation.
Checked proofs for one of the articles we have publishing soon as part of the BMJ Future of the NHS Commission.
Peer-reviewed 3 papers for 3 different journals.
Things I thought about
The Chevron deference. There is currently a major case in the US supreme court, discussing whether the chevron deference (1984) doctrine should be overturned. Chevron states that in circumstances where there are ambiguous legal statutes, the interpretation of the statute by a relevant technical federal agency (e.g., the FDA in health) is prioritised and deemed ‘correct’ even if the court disagrees. It’s a huge case, for multiple reasons, but it piqued my interest because the regulation of AI was explicitly mentioned several times during the court proceedings. It was used as an argument both for/against the overturning of Chevron. Those who wish to keep Chevron (mostly liberals) argue that AI will pose increasingly technical questions/problems that courts will not be sufficiently knowledgeable about to deal with and so any interpretations of how specific legal statutes apply to AI should be deferred to the relevant federal agencies ( i.e.., judges should not become policymakers). Those who wish to overturn Chevron (mostly conservatives) argue the other way and that it is the courts responsibility to determine how the law should be applied to AI. Within this healthcare, defense, the environment, and education are all named as specific ‘industries’ that might be the most affected. The result won't be known until the summer, but either way I think there are fairly significant implications for the regulation of AI SaMD - given the limited degree of technical understanding of current senior non-specialist policymakers and lawmakers.
Trust in digital health technologies/AI and Explainability. One of the frustrations I have with high-level or highly abstract discussions about the ethics and governance of AI in healthcare is the lack of reasoning underpinning assertions. It is, for example, very common to see papers claim that "expandability" is essential for the purpose of building trust in the use of AI. But the exact mechanism for building this trust via explainability is rarely discussed. Reading this paper by Müller et al, which discusses Cathy O'Neil's model of trust and the role this plays in individuals deciding whether or not health apps are trustworthy, made some things clearer for me. The paper does not explicitly mention Explainability, but it does highlight that, if an individual is judge whether another individual or thing is trustworthy, they will need to assess that person or thing according to honesty, competence, and reliability with regards to a specific task i.e., can a person' thing complete a specific task honestly, competently, and reliably. The honesty, competencey, and reliability of a black box cannot be assessed and hence we have come to rely on explicability as a proxy for trustworthiness. I suspect there are also implications of better understanding meaningful accountability and what we should be testing during algorithmic validation and evaluation processes but those musings will be for another day.
WHO Guidance on LLMs. This week the WHO published guidance on the ethics and governance of LLMs (large language models) in healthcare. It is a reasonably comprehensive and well-referenced document and highlights the main well-versed points of LLMs being able to hallucinate, potentially threatening privacy, and being currently outside regulatory remit. It also highlights the potential threat posed by LLMs to the epistemic authority of clinicians which I was really pleased to see - it's a point often missed. In general, it's great to see the WHO paying attention to this rapidly developing technology and trying to be proactive rather than purely proactive to its governance. However, I think a couple of tricks were missed in the recommendations. Almost all the recommendations would apply to all AI and most software - not just foundation models, and so there is a lack of specificity which means that a few crucial areas are missed. The risk of future LLMs or AI/ML models being trained on the outputs of a pre-existing LLM and the exacerbation of the 'rubbish in rubbish out' that would result is, for example, missed. Similarly, whilst there is acknowledgement of the need for governments to publicly outline what they want LLMs for (i.e., what tasks they are introduced in them being targeted at) and the need to robustly evaluate LLMs, there is no discussion of societal redlines regarding their use (i.e., are there areas of care we do not want foundation models going near ever), nor the numerous complexities associated with the evolution of LLMs. If the utility of the guidance is to be maximised, these gaps ned to be filled either by the WHO or by others.
The ethics of digital mental health. I am currently working on expanding my guide to critical thinking about AI to cover a wider range of digital health technologies, and to highlight the specific ethical concerns raised by the use of digital health technologies in mental health care. There appears to be a particularly pressing problem with the assumption that evidence of efficacy for face-to-face or 'in-person' treatment can automatically be transferred to digitised versions of face-to-face treatments in the mental health sphere. It is, for example, very common to digital CBT apps on the App Store claiming to be 'evidence-based.' In reality, what the apps mean is that there is evidence of efficacy for in-person CBT, not for a digitised version of CBT delivered by chatbot.
NHS 'tech' priorities for the next Government. As the UK heads into an election year, I've spent quite a bit of time thinking about what I think the next Government should prioritise in terms of tech/digital/data strategy for the NHS. The main things I would like to see are:
A change in attitude towards the role regulation plays in innovation. I think the current Government's position on"pro-innovation regulation"is antithetical to designing digital services that serve the public. In my opinion we should be aiming for pro-ethical design enabled by good regulation i.e., "Regulation-friendly innovation." More prosaically, the problem I spend the most time thinking about is attempted leap-frogging in the NHS. There has been a growing trend over the last five years to funnel money into the adoption of new technologies, whether that be via the accelerated access collaborative, or the more recent £21million for the adoption of AI in imaging, or the COVID-19 chest-imaging database. There seems to be little recognition of the fact that the NHS's information infrastructure is far from being fit-for-purpose for implementing these technologies. Considerable more investment in the basics is needed before any of the high-tech ambitions are realistic. I am seriously concerned that unless a more step-wise approach is taken, harm will result, chilling effects will kick in, and ultimately opportunities to save and improve lives will be lost.
Funding for curation of NHS data - to ensure its quality is maximised, thus minimising harms associated with (e.g.,) bias, or epistemic uncertainty in the development of data-driven technologies for healthcare.
Mandated interoperability standards and an equivalent to the 'data portability' clause within HIPAA in England.
Revised regulation of software as a medical device, making clear the evidence of efficacy and safety requirements of any software deployed within the healthcare sector, and requiring all software tools (including risk calculators) to be registered and listed publicly for auditing purposes.
A change in the direction of impetus from "this is the technology that we want the health service to adopt what can we achieve with it" to "these are the specific healthcare system and patient outcomes that we wish to achieve, and these are the technologies that may help us achieve these goals."
(A selection of) Things I read
The highlighted papers are those I particularly enjoyed.
Al-Uqdah, Lola, F. Abron Franklin, Chu-Chuan Chiu, and Brianna N. Boyd. “Associations Between Social Media Engagement and Vaccine Hesitancy.” Journal of Community Health 47, no. 4 (August 2022): 577–87. https://doi.org/10.1007/s10900-022-01081-9.
D’Hotman, Daniel, and Jesse Schnall. “A New Type of ‘Greenwashing’? Social Media Companies Predicting Depression and Other Mental Illnesses.” The American Journal of Bioethics 21, no. 7 (July 3, 2021): 36–38. https://doi.org/10.1080/15265161.2021.1926583.
Haque, M D Romael, and Sabirat Rubya. “An Overview of Chatbot-Based Mobile Mental Health Apps: Insights From App Description and User Reviews.” JMIR mHealth and uHealth 11 (May 22, 2023): e44838. https://doi.org/10.2196/44838.
Kahane, K., J. François, and J. Torous. “The Digital Health App Policy Landscape: Regulatory Gaps and Choices through the Lens of Mental Health.” Journal of Mental Health Policy and Economics 24, no. 3 (2021): 101–8.
Khan, Wishah, Bertina Jebanesan, Sarah Ahmed, Chris Trimmer, Branka Agic, Farhana Safa, Aamna Ashraf, et al. “Stakeholders’ Views and Opinions on Existing Guidelines on ‘How to Choose Mental Health Apps.’” Frontiers in Public Health 11 (November 22, 2023): 1251050. https://doi.org/10.3389/fpubh.2023.1251050.
Kunkle, Sarah, Manny Yip, Watson Ξ, and Justin Hunt. “Evaluation of an On-Demand Mental Health System for Depression Symptoms: Retrospective Observational Study.” Journal of Medical Internet Research 22, no. 6 (June 18, 2020): e17902. https://doi.org/10.2196/17902.
Laacke, S., R. Mueller, G. Schomerus, and S. Salloch. “Artificial Intelligence, Social Media and Depression. A New Concept of Health-Related Digital Autonomy.” American Journal of Bioethics 21, no. 7 (2021): 4–20. https://doi.org/10.1080/15265161.2020.1863515.
Lagan, Sarah, Ryan D’Mello, Aditya Vaidyam, Rebecca Bilden, and John Torous. “Assessing Mental Health Apps Marketplaces with Objective Metrics from 29,190 Data Points from 278 Apps.” Acta Psychiatrica Scandinavica 144, no. 2 (August 2021): 201–10. https://doi.org/10.1111/acps.13306.
Lau, Nancy, Alison O’Daffer, Joyce P Yi-Frazier, and Abby R Rosenberg. “Popular Evidence-Based Commercial Mental Health Apps: Analysis of Engagement, Functionality, Aesthetics, and Information Quality.” JMIR mHealth and uHealth 9, no. 7 (2021): e29689.
Lin, X., L. Martinengo, A.I. Jabir, A.H.Y. Ho, J. Car, R. Atun, and L.T. Car. “Scope, Characteristics, Behavior Change Techniques, and Quality of Conversational Agents for Mental Health and Well-Being: Systematic Assessment of Apps.” Journal of Medical Internet Research 25 (2023). https://doi.org/10.2196/45984.
Lipschitz, J.M., S.L. Connolly, C.J. Miller, T.P. Hogan, S.R. Simon, and K.E. Burdick. “Patient Interest in Mental Health Mobile App Interventions: Demographic and Symptom-Level Differences.” Journal of Affective Disorders 263 (2020): 216–20. https://doi.org/10.1016/j.jad.2019.11.083.
Marshall, Jamie M, Debra A Dunstan, and Warren Bartik. “Clinical or Gimmickal: The Use and Effectiveness of Mobile Mental Health Apps for Treating Anxiety and Depression.” Australian & New Zealand Journal of Psychiatry 54, no. 1 (2020): 20–28. Martinez-Martin, Nicole, Henry T Greely, and Mildred K Cho. “Ethical Development of Digital Phenotyping Tools for Mental Health Applications: Delphi Study.” JMIR mHealth and uHealth 9, no. 7 (2021): e27343.
Mazlan, Idayati, Noraswaliza Abdullah, and Norashikin Ahmad. “Exploring the Impact of Hybrid Recommender Systems on Personalized Mental Health Recommendations.” International Journal of Advanced Computer Science and Applications 14, no. 6 (2023).
McCashin, Darragh, and Colette M Murphy. “Using TikTok for Public and Youth Mental Health–A Systematic Review and Content Analysis.” Clinical Child Psychology and Psychiatry 28, no. 1 (2023): 279–306.
Mendes, Jean PM, Ivan R Moura, Pepijn Van de Ven, Davi Viana, Francisco JS Silva, Luciano R Coutinho, Silmar Teixeira, Joel JPC Rodrigues, and Ariel Soares Teles. “Sensing Apps and Public Data Sets for Digital Phenotyping of Mental Health: Systematic Review.” Journal of Medical Internet Research 24, no. 2 (2022): e28735.
Milton, Ashlee, Leah Ajmani, Michael Ann DeVito, and Stevie Chancellor. “‘I See Me Here’: Mental Health Content, Community, and Algorithmic Curation on TikTok,” 1–17, 2023.
Müller, Regina, Nadia Primc, and Eva Kuhn. “‘You Have to Put a Lot of Trust in Me’: Autonomy, Trust, and Trustworthiness in the Context of Mobile Apps for Mental Health.” Medicine, Health Care and Philosophy 26, no. 3 (September 2023): 313–24. https://doi.org/10.1007/s11019-023-10146-y.
Naslund, John A, Kelly A Aschbrenner, Lisa A Marsch, and Stephen J Bartels. “The Future of Mental Health Care: Peer-to-Peer Support and Social Media.” Epidemiology and Psychiatric Sciences 25, no. 2 (2016): 113–22. Naslund, John A, Ameya Bondre, John Torous, and Kelly A Aschbrenner. “Social Media and Mental Health: Benefits, Risks, and Opportunities for Research and Practice.” Journal of Technology in Behavioral Science 5 (2020): 245–57.
Ng, Michelle M, Joseph Firth, Mia Minen, and John Torous. “User Engagement in Mental Health Apps: A Review of Measurement, Reporting, and Validity.” Psychiatric Services 70, no. 7 (2019): 538–44.
Orben, Amy, and Sarah-Jayne Blakemore. “How Social Media Affects Teen Mental Health: A Missing Link.” Nature 614, no. 7948 (February 16, 2023): 410–12. https://doi.org/10.1038/d41586-023-00402-9.
Oudin, Antoine, Redwan Maatoug, Alexis Bourla, Florian Ferreri, Olivier Bonnot, Bruno Millet, Félix Schoeller, Stéphane Mouchabac, and Vladimir Adrien. “Digital Phenotyping: Data-Driven Psychiatry to Redefine Mental Health.” Journal of Medical Internet Research 25 (2023): e44502.
Parker, Lisa, Vanessa Halter, Tanya Karliychuk, and Quinn Grundy. “How Private Is Your Mental Health App Data? An Empirical Study of Mental Health App Privacy Policies and Practices.” International Journal of Law and Psychiatry 64 (2019): 198–204.
Rutter, Lauren A, Jacqueline Howard, Prabhvir Lakhan, Danny Valdez, Johan Bollen, and Lorenzo Lorenzo-Luaces. “‘I Haven’t Been Diagnosed, but I Should Be’—Insight Into Self-Diagnoses of Common Mental Health Disorders: Cross-Sectional Study.” JMIR Formative Research 7, no. 1 (2023): e39206.
Tong, Fangziyun, Reeva Lederman, Simon D’Alfonso, Katherine Berry, and Sandra Bucci. “Conceptualizing the Digital Therapeutic Alliance in the Context of Fully Automated Mental Health Apps: A Thematic Analysis.” Clinical Psychology & Psychotherapy 30, no. 5 (September 2023): 998–1012. https://doi.org/10.1002/cpp.2851.
Valentine, Lee, Simon D’Alfonso, and Reeva Lederman. “Recommender Systems for Mental Health Apps: Advantages and Ethical Challenges.” AI & Society 38, no. 4 (2023): 1627–38.
Wellman, Mariah L. “‘A Friend Who Knows What They’re Talking about’: Extending Source Credibility Theory to Analyze the Wellness Influencer Industry on Instagram.” New Media & Society, 2023, 14614448231162064.
Wongkoblap, Akkapon, Miguel A Vadillo, and Vasa Curcin. “Researching Mental Health Disorders in the Era of Social Media: Systematic Review.” Journal of Medical Internet Research 19, no. 6 (2017): e228.
Yıldırım, Seda. “The Challenge of Self-Diagnosis on Mental Health Through Social Media: A Qualitative Study.” In Computational Methods in Psychiatry, 197–213. Springer, 2023.
Zsila, Ágnes, and Marc Eric S. Reyes. “Pros & Cons: Impacts of Social Media on Mental Health.” BMC Psychology 11, no. 1 (July 6, 2023): 201, s40359-023-01243–x. https://doi.org/10.1186/s40359-023-01243-x.