Lesegruppe/2020-04-22

Datum	2020/04/22 11:30:00 – 2020/04/22 12:30:00
Ort	https://global.gotomeeting.com/join/121469005
Vortragende(r)	Hamideh Hajiabadi
Forschungsgruppe	ARE
Titel	What Do Developers Ask About {ML} Libraries? {A} Large-scale Study Using Stack Overflow
Autoren	Md Johirul Islam and Hoan Anh Nguyen and Rangeet Pan and Hridesh Rajan
PDF	http://arxiv.org/abs/1906.11940
URL	http://arxiv.org/abs/1906.11940
BibTeX	https://dblp.uni-trier.de/rec/bibtex/journals/corr/abs-1906-11940
Abstract	Modern software systems are increasingly including machine learning (ML) as an integral component. However, we do not yet understand the difficulties faced by software developers when learning about ML libraries and using them within their systems. To that end, this work reports on a detailed (manual) examination of 3,243 highly-rated Q&A posts related to ten ML libraries, namely Tensorflow, Keras, scikit-learn, Weka, Caffe, Theano, MLlib, Torch, Mahout, and H2O, on Stack Overflow, a popular online technical Q&A forum. We classify these questions into seven typical stages of an ML pipeline to understand the correlation between the library and the stage. Then we study the questions and perform statistical analysis to explore the answer to four research objectives (finding the most difficult stage, understanding the nature of problems, nature of libraries and studying whether the difficulties stayed consistent over time). Our findings reveal the urgent need for software engineering (SE) research in this area. Both static and dynamic analyses are mostly absent and badly needed to help developers find errors earlier. While there has been some early research on debugging, much more work is needed. API misuses are prevalent and API design improvements are sorely needed. Last and somewhat surprisingly, a tug of war between providing higher levels of abstractions and the need to understand the behavior of the trained model is prevalent.