The Keys to Effective Data Science Projects – Part 7: Create and Train the Model

We’re in part seven on our series of the Keys to Effective Data Science Projects. This is the section that most people think of when they think of “Data Science”. It’s where we take the question, the source data which has been turned into the proper Features (and potentially Labels), and select an algorithm or two to create a Model.

Let’s hold there for a moment – just what is a Machine Learning Model anyway? It’s actually not a trivial question. It’s made more difficult by conflating the terms Algorithm and Model in Data Science discussions. Ali-Kazim Zaidi, a Data Scientist here on my team, defines it this way:

“Models form our hypothesis set of what generates or approximates a true target function / data generating system. All models are wrong, but some can be useful for inference. Models are what you define through your inductive bias of a data generating system. Algorithms are what you use to fit parameters or values to that model so that it resembles data you’ve observed. What you get in the end is a parameterization of a function that you use to do inference about an underlying system.”

So at the end of the day, the Models are what we build and operationalize. So what are the important things to remember about Modeling that will help you with a successful project?

The first thing is to realize that Modeling is experimental. You don’t simply select some data, run it through an algorithm, and then get a definite answer. You need to run lots of experiments, and change a lot of the features, change the parameters of the algorithms you choose, and perhaps even choose different algorithms each time. All the while making sure you treat it as a true scientific test (Data Science, y’all) by moving only one thing at a time. As an aside, this is where the new Azure Machine Learning Services kind of shines – you can track the runs of your experiments and go back to one that was working well. But I digress.

The next thing that is important to note for a successful project is to ensure everyone on the team, and especially the stakeholders of the project, understand that all ML is at essence a guess. A calculated, really good (hopefully) guess, but a guess. Most people that deal with IT project are used to VERY deterministic outcomes – something either is or it isn’t. But Machine Learning uses statistics, and statistics that deal with probabilities, so at best you’re getting a very good guess. But a guess. I have seen a few times where an incredulous manager comes in to a meeting with a fistful of charts saying “but the smart-phone thingie TOLD me this would work!” No, the smart-phone thingie told you it MIGHT work. Probably. Mostly.

This is where DevOps practices can really save the day. Communication between all the teams would have helped to avoid this misunderstanding. But that’s an article for another time.

In our next installment, I’ll give you another key. See you then.

The Keys to Effective Data Science Projects – Part 7: Create and Train the Model

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112