数据库主要信息：Today, medical image analysis papers require solid experiments to prove the usefulness of proposed methods. However, experiments are often performed on data selected by the researchers, which may come from different institutions, scanners, and populations. Different evaluation measures may be used, making it difficult to compare the methods. In this paper, we introduce a dataset of 7909 breast cancer histopathology images acquired on 82 patients, which is now publicly available from http://web.inf.ufpr.br/vri/breast-cancer-database. The dataset includes both benign and malignant images. The task associated with this dataset is the automated classification of these images in two classes, which would be a valuable computer-aided diagnosis tool for the clinician. In order to assess the difficulty of this task, we show some preliminary results obtained with state-of-the-art image classification systems. The accuracy ranges from 80% to 85%, showing room for improvement is left. By providing this dataset and a standardized evaluation protocol to the scientific community, we hope to gather researchers in both the medical and the machine learning field to advance toward this clinical application.
Federal University of Parana (UFPR), Rua Cel. Francisco H. dos Santos, 100, Curitiba, PR, Brazil. C. Petitjean and L. Heutte are with LITIS EA 4108, University of Rouen, 76801 Saint-Etienne-du-Rouvray, France
Contact name (PI/Team):
Fabio A. Spanhol
Contact email (PI/Helpdesk):