Brief report on “Getting Better from Worse: Augmented Bagging and a Cautionary Tale of Variable Importance” by Lucas Mentch

Isha Sharma
3 min readNov 27, 2021


Image source: Morgan Housel, Unsplash

Over the past few years or so, a significant amount of algorithms have evolved that have revolutionized Artificial Intelligence.

Artificial Intelligent algorithms are enabling machines to think like humans and perform complex cognitive tasks ranging from perception to language.

We come across these machines every day, and the learning in these machines has been continuously happening; it is due to the evolution of Deep Learning, a subset of AI.

Deep Learning enables machines to learn by experience on specific training, so they could perform tasks on its during unforeseen events.

Image source: Specialization of AI algorithms, IBM

The data has been growing significantly and scientists are relying on the “black-box” learning algorithms. Analyzing huge data using learning algorithms could result in the “black-box” models, in which you can’t examine how the algorithm is achieving what it’s achieving.

Models like Random Forests have significantly helped to analyze the relationship among different variables in a huge data set.

Random forest is a Supervised Learning Algorithm, where it builds a forest by assembling the various decision trees, and it’s trained with the bagging method. The bagging method tells that combing various learning models could improve the overall result. This particular research was about adding the extra noise variables to improve the accuracy of the model.

Learning about how including the extra noise variables in the model while training could lead to a significant improvement in the accuracy of the model, more so than of an optimally tuned traditional random forest, was fascinating.

Noisy variables are the set of additional unimportant data that are not required during the training of the data. However, using these additional noise data with the idea of Augmented Bagging (AugBagg), we can yield better results & performance, even better than that of an optimally tuned traditional random forest.

Various deep learning models can be useful for performing many cognitive tasks, whether it be logic and reasoning, auditory and visual processing, etc. Deep learning is a way through which we can dig deeper into how cognition works and answer some of the burning questions about cognitive science in general.

Deep Convolutional Neural Networks (CNN) are by far the most familiar to the models of activation in the brain & the nervous system.

Image source: Image Source: CNN in action, Analytics Vidhya

The cognitive computing system, which is the STEM model (senses (S), thoughts (T), experiences (E), and memory (M)), is being proposed as a big data classification problem, where we would need the “black box” models like random forests to make accurate decisions at the time needed and enhance continuous learning. Due to the emergence of novel and robust algorithms, we might be able to answer more questions and create better AI, which could perform cognitive tasks better.

p.s. this brief report was written for my Cognitive Science course at Cornell University.


Mentch, Lucas. Zhou, Siyu. “Getting Better from Worse: Augmented Bagging and a Cautionary Tale of Variable Importance”, Cornell University Library Database,,Getting%20Better%20from%20Worse%3A%20Augmented%20Bagging%20and,Cautionary%20Tale%20of%20Variable%20Importance&text=As%20the%20size%2C%20complexity%2C%20and,minimal%20a%20priori%20model%20specifications.



Isha Sharma

A sucker for creative coding, poetry, hauntingly beautiful songs of despair, serendipity || CS