There have been an ever-growing collection of face image datasets in the past decade and a standard test dataset is recommended for researchers to compare their results. The choice of an appropriate dataset is made based on several characteristics including the task to be performed, algorithm to be trained or tested, and the properties of datasets to which it needs to be compared. The following are the most prominent face image datasets used for evaluating face recognition technology.
MS-Celeb-1M
Published: 2016
Images: 8.2 million
Subjects: 4,101
Source: American and British actors
Publicly available: Yes
Download at: Microsoft Celeb Dataset
Download clean version at: C-MS-Celeb
Reference: Yandong Guo, Lei Zhang, Yuxiao Hu, Xiaodong He, and Jianfeng Gao. MS-Celeb-1M: A dataset and benchmark for large scale face recognition. In European Conf. on Computer Vision (ECCV), 2016.
Megaface
Published: 2016
Images: 4.7 million
Subjects: 672,057
Source: Flickr users’ photo albums
Publicly available: Yes
Download at: Megaface
Reference: Ira Kemelmacher-Shlizerman, Steven M Seitz, Daniel Miller, and Evan Brossard. The megaface benchmark: 1 million faces for recognition at scale. In Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), 2016.
VGG2
Published: 2018
Images: 3.31 million
Subjects: 9,131
Source: Google images of actors, athletes, and politicians
Publicly available: Yes
Download at: VGG2
Reference: Q. Cao, L. Shen, W. Xie, O.M. Parkhi, and A. Zisserman. Vggface2: A dataset for recognising faces across pose and age. In Intl. Conf. on Automatic Face and Gesture Recognition (FG), 2018.
VGG
Published: 2015
Images: 2.6 million
Subjects: 2,622
Source: Google images of actors, athletes, and politicians
Publicly available: Yes
Download at: VGG
Reference: O. M. Parkhi, A. Vedaldi, and A. Zisserman. Deep face recognition. In BMVC, 2015.
IMDB-Face
Published: 2018
Images: 1.7 million
Subjects: 59K
Source: Celebrities collected from movie screenshots and posters from the IMDb website
Publicly available: Yes
Download at: IMDB-Face
Reference: Fei Wang, Liren Chen, Cheng Li, Shiyao Huang, Yanjie Chen, Chen Qian, and Chen Change Loy. The devil of face recognition is in the noise. In European Conf. on Computer Vision (ECCV), 2018.
Diversity in Faces (DiF)
Published: 2019
Images: 0.97 million
Source: Users of the Flickr photo service
Publicly available: Yes
Download at: DiF
Reference: Merler, Michele, Nalini Ratha, Rogerio S. Feris, and John R. Smith. “Diversity in faces.” arXiv preprint arXiv:1901.10436(2019).
IMDB-Wiki
Published: 2018
Images: 523,051
Source: Celebrities from IMDb and Wikipedia
Publicly available: Yes
Download at: MDB-Wiki
Reference: T. Rothe, R. Timofte, and L. Van Gool. Deep expectation of real and apparent age from a single image without facial landmarks. L. Int J Comput Vis, pages 126–144, 2018.
Casia-Webface
Published: 2014
Images: 494,414
Subjects: 10,575
Source: Crawled from Internet
Publicly available: Yes
Download at: Casia-Webface
Reference: Shengcai Liao Dong Yi, Zhen Lei and Stan Z. Li. Learning face representation from scratch. In arXiv preprint, 2014.
UMDFaces
Published: 2016
Images: 367,888
Subjects: 8,277
Source: Crawled from Internet
Publicly available: Yes
Download at: UMDFaces
Reference: Ankan Bansal, Anirudh Nanduri, Carlos D Castillo, Rajeev Ranjan, and Rama Chellappa. Umdfaces: An annotated face dataset for training deep networks. arXiv preprint, 2016.
CelebA
Published: 2015
Images: 202,599
Subjects: 10,177
Source: Celebrity images
Publicly available: Yes
Download at: CelebA
Reference: Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In IEEE Intl. Conf. on Computer Vision (ICCV), 2015.
CACD
Published: 2014
Images: 163,446
Subjects: 2,000
Source: Celebrity images
Publicly available: Yes
Download at: CACD
Reference: B. C. Chen, C. S. Chen, and W. H. Hsu. Face recognition and retrieval using cross-age reference coding with cross-age celebrity dataset. IEEE Trans. on Multimedia, 17(6):804–815, 2015.
FaceScrub
Published: 2014
Images: 106,863
Subjects: 530
Source: Public figures on the Internet
Publicly available: Yes
Download at: FaceScrub
Reference: S. Winkler H.-W. Ng. A data-driven approach to cleaning large face datasets. In ICIP, 2014.
IJB-C
Published: 2018
Images: 31,334
Subjects: 3,531
Source: Celebrities and Internet personalities
Publicly available: Yes
Download at: IJB-C
Reference: B. Maze, J. Adams, J. A. Duncan, N. Kalka, T. Miller, C. Otto, A. K. Jain, W. T. Niggel, J. Anderson, J. Cheney, and P. Grother. Iarpa janus benchmark – c: Face dataset and protocol. In Intl. Conf. on Biometrics (ICB), 2018.
IJB-B
Published: 2017
Images: 21,798
Subjects: 1,845
Source: Celebrities and Internet personalities
Publicly available: Yes
Download at: IJB-B
Reference: C. Whitelam, E. Taborsky, A. Blanton, B. Maze, J. Adams, T. Miller, N. Kalka, A. K. Jain, J. A. Duncan, K. Allen, J. Cheney, and P. Grother. Iarpa janus benchmark-b face dataset. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) Workshop, 2017.
Pubfig
Published: 2011
Images: 58,797
Subjects: 200
Source: Internet personalities
Publicly available: Yes
Download at: Pubfig
Reference: N. Kumar, A. Berg, P. N. Belhumeur, and S. Nayar. Describable visual attributes for face verification and image search. IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), 33(10), 2011.
Morph
Published: 2006
Images: 55,134
Subjects: 13,618
Source: Public records
Publicly available: Yes
Download at: Morph
Reference: Karl Ricanek and Tamirat Tesafaye. Morph: A longitudinal image database of normal adult age-progression. In Intl. Conf. on Automatic Face and Gesture Recognition (FG), 2006.
Adience
Published: 2014
Images: 26,580
Subjects: 2,284
Source: Online image repositories
Publicly available: Yes
Download at: Adience
Reference: Eran Eidinger, Roee Enbar, and Tal Hassner. Age and gender estimation of unfiltered faces. IEEE Trans. on Information Forensics and Security, 9(12), 2014.
UTKface
Published: 2017
Images: 24,108
Source: Internet personalities
Publicly available: Yes
Download at: UTKface
Reference: Zhifei Zhang, Yang Song, and Hairong Qi. Age progression/regression by conditional adver- sarial autoencoder. In Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), 2017.
AgeDB
Published: 2017
Images: 16,488
Subjects: 568
Source: Manually collected Google images
Publicly available: Yes
Download at: AgeDB
Reference: S. Moschoglou, A. Papaioannou, C. Sagonas, J. Deng, I. Kotsia, and S. Zafeiriou. Agedb: the first manually collected, in-the-wild age database. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) Workshop, Honolulu, Hawaii, 2017.
LFW(A)
Published: 2007
Images: 13,233
Subjects: 5,749
Source: Web images
Publicly available: Yes
Download at: LFW(A)
Reference: Gary B. Huang, Manu Ramesh, Tamara Berg, and Erik Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst, October 2007.
LFW+
Published: 2017
Images: 15,699
Subjects: 8,000
Source: Google Images
Publicly available: Yes
Download at: LFW+
Reference: H. Han, A. K. Jain, S. Shan, and X. Chen. Heterogeneous face attribute estimation: A deep multi-task learning approach. IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), 2017.
IJB-A
Published: 2015
Images: 5,712
Subjects: 500
Source: Celebrities and Internet personalities
Publicly available: Yes
Download at: IJB-A
Reference: B. F. Klare, B. Klein, E. Taborsky, A. Blanton, J. Cheney, K. Allen, P. Grother, A. Mah, M. Burge, and A. K. Jain. Pushing the frontiers of unconstrained face detection and recogni- tion: Iarpa janus benchmark a. In Intl. Conf. on Computer Vision and Pattern Recognition (CVPR), 2015.
PPB
Published: 2018
Images: 1,270
Subjects: 1,270
Source: Parliamentarians from three African countries (Rwanda, Senegal, and South Africa) and three European countries (Iceland, Finland, and Sweden)
Publicly available: Yes
Download at: PPB
Reference: Joy Buolamwini and Timnit Gebru. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Proceedings of the 1st Conf. on Fairness, Accountability and Transparency, 2018.
FGNet
Published: 2016
Images: 1,002
Subjects: 82
Source: Scanning photographs of subjects found in personal collections
Publicly available: Yes
Download at: FGNet
Reference: Gabriel Panis and Andreas Lanitis. An overview of research on facial aging using the fg-net aging database. IET Biometrics, 5(2):37–46, 2016.