The Importance of Domain Knowledge

Domain knowledge is essential for most data analysis projects, from problem definition to results interpretation.

Untitled

The domain knowledge helps in (i) understanding the data with multiple views, (ii) providing a refined set of features to improve accuracy, and (iii) improving model interpretability and interaction with humans. The domain knowledge also plays an essential role in the data preprocessing step.

Also, it is often impossible to train algorithms directly in the real world due to ethical and logistical issues. Even worse, simulations are never perfect and require trade-offs. So, the billion-dollar question is, what should we include in our simulation? This is where domain knowledge comes to the rescue.

With all of the positive examples listed above, there is one counter-example: AlphaGo Zero. The training of AlphaGo heavily relied on human experts to evaluate game states. In contrast, AlphaGo Zero is solely trained by reinforcement learning and Monte Carlo Tree Search and has defeated the previous version of AlphaGo in a 100-0 victory. In other words, a model trained from scratch without domain knowledge handily beat the model that utilized domain knowledge. AlphaGo Zero, in the self-play training phase, was able to come up with human expert moves, effectively becoming a “domain expert” throughout the training process.

However, it is because the rules of Go are well-defined, and the environment is deterministic. In real-world applications, this is rarely the case.

In most real-world scenarios, rules, objectives, and boundaries are ambiguous, and data comes with a lot of noise. In some other cases, when working in a high-stakes field such as healthcare, making decisions based on an algorithm alone is insufficient and can have severe consequences.