The first job of artificial intelligence should be to clean up science

One of the dreams of artificial intelligence is that if we take all of the world's research papers -- let's say cancer research -- and put them into the machine, it will come up with new and novel approaches to find cures.

I'm hopeful that will be the case.

But I'm fearful that the first job it will have to do is clean up the data. Given the accessibility of data, more and more people are having a fresh looking at academic research and finding troubling -- perhaps rampant -- signs of fraud.

Last week, a top cancer surgeon at Columbia University had his research papers retracted after an online sleuth found he was using the same images across different articles purporting to show different experiments.

The man who discovered the problems -- Sholto David -- has found problems in many papers, including dozens at a Harvard cancer center.

This is bad...really bad. We're talking about Columbia and Harvard here, two of the world's most respected institutions and in one of the highest-stakes fields possible. If we want to put all this data into an AI, we've poisoned the chalice.

So I suspect the first job of AI isn't going to be solving scientific problems, it will be sorting out how much fraud has been going on.

Worse yet is that these papers were only caught because they reported the raw data, or raw images. Most papers don't contain images and the data is all hidden, the researchers only provide a summary of the data with the rest hidden. This is just the tip of the iceberg.

Most recently in the news was the resignation of Harvard President Claudine Gay after Bill Ackman exposed her for plagiarism. That was followed by Ackman's own wife -- an MIT professor -- being accused of the same.

How deep does this wormhole go?

Remember Dan Ariely? He made waves in economic and sold millions of copies of his book Predictably Irrational. An influential study of his began with:

People sometimes cheat to advance their financial self-interests—at great costs to society.

That proved to be more profound than he envisioned. Much of his research was on honesty, including a study describing the numbers of miles that a car-insurance company’s customers reported having driven. Well it turns out that his data had been faked.

Not even the honesty researchers are honest. And you don't have to spend much time in financial markets to see just how deep the scams run.

With the power of AI, we should be able to find out how big the problem is in academia. I would liken it to a blood sample of every athlete ever and a machine that can detect any levels of doping.

Sadly, I imagine many people in high places are trying to put an end to this. One analysis suggested that there are hundreds of thousands of bogus articles hidden in scientific literature. Given how critical all the research is, I would say that perhaps the most important scientific task in the coming years will be cleaning up the mess.