Thoughts on Data Science, ML and Startups

  • Want to join a startup? Here is how to choose.

    Nowadays, startups are a popular destination among all data scientist (all tech people, really) levels. But what startup is right for you? How to choose a startup when this term encompasses very different companies. While hiring teammates myself, I thought about this, and here are my thoughts.

    Read more...

  • Book Notes: Designing Data Intensive Applications

    Notes from Designing Data-Intensive Applications.

    Read more...

  • Machine Learning Design Patterns: Problem Representation Part 2

    In the first part of this Problem Representation series, we saw that representing seemingly regression problem as a classification problem can increase performance. We also noticed that constructing a label in a specific way can additionality increase the performance, but results weren't great - we achieved only about 30% correlation with the correct label. To improve our predictions, we will clean our dataset from those unpopular tracks before predicting popularity. For this, we will try to use a couple of design patterns - Rebalancing and Ensembles.

    Read more...

  • Machine Learning Design Patterns: Problem Representation Part 1

    In my previous post, I have discussed data representation patterns presented in Machine Learning Design Patterns by V. Lakshmanan, S. Robinson & M. Munn. In this post, I would like to talk about the next topic in the book mentioned above - problem representation design patterns.

    Read more...

  • Machine Learning Design Patterns: Data Representation

    Design patterns are a set of best practices and solutions to common problems. Machine learning engineers as engineers in other disciplines can benefit immensely by following such idioms. In this and following posts, I will discuss ML patterns outlined in Machine Learning Design Patterns by V. Lakshmanan, S. Robinson & M. Munn. Let us start with Data Representation Patterns.

    Read more...

  • 2020 in Books

    2020 was not easy by any measure. COVID-19, adapting to parenthood, rollercoaster of working at an early stage startup - it was a memorable year (though I might not remember a lot because of sleep deprivation). However, I still managed to read a few books, and there were several that I would like to reflect upon. So here is an overview, and I will share more details on a few books in later posts.

    Read more...

  • A Case For Agile Data Science

    I have encountered a lot of resistance in the data science community against agile methodology and specifically the scrum framework. I don’t see it this way and claim that most disciplines would improve by adopting an agile mindset. We will go through a typical scrum sprint to highlight the compatibility of the data science process and the agile development process. Finally, we discuss when a scrum is not an appropriate process to follow. If you are a consultant working on many projects at a time or your work requires deep concentration on a single and narrow issue (narrow, so that you alone can solve it).

    Read more...