3 data lessons from poor presidential election forecasts

In the weeks leading to the 2016 presidential election, numerous polls identified Democratic candidate Hillary Clinton as the projected victor; however, by Tuesday evening, it was clear Republican Donald Trump had secured his place as president-elect.

Analytics experts say narrow data sets, faulty algorithms and human fallibility led to the poor prediction models; but while pollsters retrace their steps to identify what led to their incorrect projections, business leaders can use these lessons to decrease poor forecasts in their own predictive data analytics, according to the Wall Street Journal.

Here are three mistakes to avoid:

1. Incomplete samples. While some pollsters may collect large swaths of data, misleading predictions often arise when unknown or uncollected data from those "on the fringes" is not accounted for, according to Thomas H. Davenport, an information technology and management professor at Wellesley, Mass.-based Babson College, who spoke with the Wall Street Journal.

For the presidential election, this unaccounted data was seen when Michigan and Wisconsin were declared "red" states. Many polls had not predicted this outcome, since Michigan and Wisconsin have been "blue" since the 1980s.

2. Narrow datasets. Many election polls relied on asking which candidate a respondent planned to vote for; this method of data collection is vulnerable, since people may answer dishonestly or may change their minds closer to the election date. Some polls worked around this issue by asking participants to estimate the probability of voting for each of the candidates, to provide a more expansive dataset.

To address the issue of dishonest survey responses in industries other than politics, the Wall Street Journal notes that some companies have begun using geolocation technology and sensors. By collecting data based on mobile apps and customer loyalty programs, rather than simple polls, businesses bridge the gap between "what people say and what they do."

3. Faulty algorithms. In addition to accurate and generalizable data, predictive models necessitate well-made algorithms, many of which rely on historical data to generate patterns.

However, this election played out differently than those in recent history — in part due to "high emotional rhetoric and relatively less time spent comparing policy positions," according to the Wall Street Journal. Since the election didn't follow historical patterns, utilizing prediction models based on these patterns proved misguided.