Exploring Pre-trained Models in Software Engineering

In our recent article, PeaTMOSS: A Dataset for Investigating the Supply Chain of Pre-trained Models in Open-Source Software, we examine how deep learning models are integrated into software development. The growing reliance on pre-trained models (PTMs) raises important questions about their trustworthiness, maintenance, and evolution within software projects. By curating and analyzing a dataset that tracks PTMs in open-source repositories, we provide insights into how these models propagate, how they are updated, and what challenges arise in their long-term use. Our goal is to help researchers and practitioners better understand the software supply chain dynamics of AI-driven components.

Since its publication, the article has drawn interest from both software engineering and machine learning communities, particularly those concerned with reproducibility, licensing, and security. As PTMs become more prevalent, ensuring their responsible use in software projects will require collaboration between AI and software engineering researchers. We hope this work contributes to ongoing discussions about the sustainability of AI models in production environments. If you're curious, you can read the full paper here.

Wenxin Jiang, Jerin Yasmin, Jason Jones, Nicholas Synovic, Jiashen Kuo, Nathaniel Bielanski, Yuan Tian, George K. Thiruvathukal, and James C. Davis, Challenges and practices of deep learning model reengineering: A case study on computer vision. Empirical Software Engineering 29, 142 (2024). https://doi.org/10.1007/s10664-024-10521-0