Pattern discovery: how to learn like a machine

For almost ten years I have been researching how to identify meaningful patterns in real-world data. This can be achieved by defining a model that learns how to extract a specific pattern from data. However this only goes so far until it hits a performance ceiling (caused by the model, the data or both). Most of the machine learning algorithms rely on specific assumptions about the data, which in turn determine the best suited models.  

Dichotomy of real world: complexity versus simplicity 

Unfortunately the gap between theory and practice is huge. The validity of any assumption made on real-world data is a challenging problem. Data is not always distributed as per the assumed model, variables are mostly not independent, and outliers are a significant part of the dataset. As a consequence, a complex model theoretically matching with the assumptions can result in worse results than a simplistic model with fewer parameters. Often a simple model provides better results than a poorly defined complex model on real-world data

Real-life example

Both the data and the used model contribute to this complexity dichotomy. Let me take a human learning example from the medical profession in order to understand it better. 

Let’s assume that a medical doctor receives only a few test measurements about a patient and he needs to diagnose the patient’s condition without ordering more tests. 

A simple model would compare the patient’s test measurements with the average population pattern and link deviations to possible illnesses. By using the population average,  sub-patterns displayed in  a small group within the population are ignored as only the average is used to compare the patient with. Ignoring meaningful variations is a key characteristic of under-fitting.  Under-fitting happens when the model used is excessively simple and fails to capture  a pattern with sufficient detail from the data. 

In contrast a complex model would consider not just the average, but a multitude of diagnostic patterns to match the patient’s test measurements with. As there are only a small number of measurements available a complex model will have too many options to choose from and is likely to pick the wrong pattern. This phenomena is know as overfitting and it occurs when a complex model had too few data points to learn from and has  “memorised” individual training data rather than “learning” to generalise from the trend pattern. 

Fortunately if the doctor is experienced and skillful, he will choose the right diagnostic pattern even in this situation of limited data. The main ingredients that allow him to perform such pattern discovery are Experience and Skills.

How to foster experience and skills in a machine?

The main mechanism to improve the experience and skills of a machine learning algorithm is testing. Testing in machine learning is more than an assessment mechanism, it allows the designer to assess the algorithm performance and transfer his or her skills and experience to the machine. This allows for better selection of parameters (i.e. build machine skills) and determining the right training size (i.e. build machine experience) to achieve the best results that the data and model allow. 

Testing as a design philosophy

To adopt testing as a design philosophy the designer should  embrace testing and be open to changing your approach based on test results. Only this will allow you to adjust to changes in real-life. The testing mindset keeps you sharp, prevents you from developing tunnel vision (overfitting) or losing focus (under-fitting). Through it you can  accept uncertainty and open up to the real world around you. 

Testing as a mindset makes you a better person, it allows you to get inspired and surprised every day and go where evidence takes you. It keeps your mind sharp, your spirits high, and levels you out, resulting in a deep  understanding of your (algorithm’s) capabilities and the world around you. It allows you to find the balance in your life, without getting into too much detail nor ignoring important details. Testing allows you to accept to not have it all figured out, to fail from time-to-time and to take a step back to make a leap forward and achieve your maximum potential.

How to achieve your maximum potential

As with machine learning, there are two potentials to take into-account: 

  • Experience potential: defined by the size of real-life examples we need to master a specific task
  • Skill potential: defined by the available resources needed to master a specific task. 

Experience potential : learning curves

Learning curves are a test framework in machine learning to assess the number of real-life examples needed to achieve the maximum performance potential. A Learning curve shows the change in the classification error for varying training set size (i.e. number of training samples). When the difference between the apparent error (i.e. the error on the training set - previously seen data) and the test error (the error on the blind set - new data) is large, the classifier is called over-fitted or over-trained.   

 

a9dc759f0e7f404683713406e75174d9.jpg 

Figure 1: illustrating a classifier learning curve [1] 

Figure 1 presents a typical learning curve, showing the true and apparent error rates. The curve shows that the classifier gains significant performance when more training data is added. When the curves flatten out (i.e. achieve a plateau) that suggests that the machine learning algorithm is well trained and that more training data will not improve the classifier much.

Adopting the concept of a learning curve in our human life helps us to assess our expertise and independence, to assess our transition from a novice to a master learner. As continuously learning creatures, learning curves give us a mechanism to monitor our personal growth and the speed of our learning to master new concepts, interact successfully with people and confront new challenges. Achieving your potential in a situation and therefore no longer improving, is for a human an indication to move on, to get engaged in new activities with new people, to learn beyond the usual things, to read more books/blogs, to change some aspect of one’s life, to change…

Skill potential :  Feature curves

Feature curves are a tool in machine learning to obtain a classifier which generalises well and to avoid the curse of dimensionality. The curse of dimensionality arises when a machine learning algorithm does not scale well enough to high dimensional spaces. Feature curves are a tool for dimension reduction to turn a high-dimensional feature space to a lower-dimensional space without significant information loss. 

844ccda8049b42558e3f52fcbbdac545.jpg

Figure 2: illustrating the concept of feature curves [1] 

Figure 2 illustrates the concept of features curves. A Feature curve provides information about the dimensionality of the feature space needed to achieve a low error. The optimal performance is achieved when the curves start flattening out. For some machine learning algorithms a counterintuitive deterioration of performance can be observed by higher feature sizes beyond the optimal point.

 The feature curve concept can add lot of value to our human life as well. It allows us to assess our skills and resources in our life learning journey. When learning about new concepts, we connect with new people, we read new books, and engage in new activities. By assessing the contribution of those resources in adding value to our judgement and learning,  we are effectively reducing the dimensionality of the overall resource consumption. It focuses us on the valuable resources that add value and avoid the curse of dimensionality in our life allowing us to make sense of the world around us with the limited resources we possess. 

The ability to learn is key to humans, animals and recently machines. In our roller-coaster life, and over-systemised routine we almost forget that learning is a basic human capability that we need to teach machines. By instructing machines how to learn, we have to truly understand our own learning processes. We should leverage this understanding to create more space for learning new concepts, expanding our social circle and think beyond our daily roller-coaster.

[1] David Tax, PR Laboratory

[2] Feature image: http://www.shutterstock.com/