.

THE 10 COMMANDMENTS OF DATA SCIENCE PROJECT


Let’s get straight to the point:

1. Post Analysis — There is no point just communicating a number. You have to analyse the numbers your algorithm throws out. Tie it back to the domain. Work with the domain experts as to what the results mean.

E.g. What prices is your algorithm recommending? What does the curve (fitment) for these products look like? The algorithm recommends something, does it make sense? Does it raise any eyebrows on the results? (This can be both good if you have discovered something new.. Or it could mean you need to re-check your analysis). Can you as a human immediately put a finger on the graph as to where the price range should be?

How have you verified your results? Do you know what products your algorithm is able to recommend prices for (fish? Canned goods? Wine? chocolate?)? Does it intuitively make sense for those products?

Continue reading

CLEAN CODE FOR A DATA SCIENTIST (?)

“I’m a Data Scientist.. I don’t need to write clean code because most of my code is throwaway anyways”. “Clean code and agile are good for developing softwares.. It does not make sense in my work”. The number of times I have heard the above & the reluctance to even try some of the suggestions on clean code, baffles me.

Well, let me tell you.. you don’t need to write clean code for software development either. You don’t need to practice agile for software development either. One can make a perfectly working software even without the above (maintaining/ modifying/ scaling will get difficult. But that’s not the focus of this article). When you need to follow clean code practices is when you are working in a TEAM! Irrespective of whether you are developing a software or an algorithm or have to try out multiple algorithms.

The basic idea of clean code is that your fellow team members should be able to understand what you have written. This is especially important in data science. As a scientist your experiments must be reproducible. Must be verifiable. That means others on your team should be able to understand & reproduce your results.

We exist in a team. It is impossible to be a data scientist by yourself. Most of the times in industry you would be working in applied sciences. This means you have to understand someone else’s problem & they (team, business folks) too need to understand your solution.

Continue reading