4 Diversity Sampling
This chapter covers
- Understanding diversity in the context of Machine Learning, so that you can discover your model’s “unknown unknowns”
- Using Model-based Outliers, Cluster-based Sampling, Representative Sampling, and Sampling for Real-World Diversity to increase the diversity of data selected for Active Learning
- Using Diversity Sampling in different types of Machine Learning models so that you can apply the technique to any Machine Learning architecture
- Evaluating the success of Diversity Sampling so that you can more accurately evaluate your model’s performance across diverse data
- Deciding on the right number of items to put in front of humans per iteration cycle to optimize the Human-in-the-Loop process
In the last chapter, you learned how to identify where your model is uncertain: what your model “knows it doesn’t know”. In this chapter you will learn how to identify what’s missing from your model: what your model “doesn’t know that it doesn’t know”, that is, the “unknown unknowns”. This is a hard problem, made even harder because what your model needs to know is often a moving target in a constantly changing world. Just as humans are learning new words, new objects, and new behaviors every day in response to a changing environment, most Machine Learning algorithms are also deployed in a changing environment.