Thoughts on Data Science, ML and Startups

Want to join a startup? Here is how to choose.

Joining a startup is exciting. You join a company that against the odds pushes to bring value to their customers. Or you join founders and together discover the solution to the overlooked customer pain point. For a data scientist, joining a startup also means that you will have freedom in delivering value without all the baggage that comes with the corporate environment. You will also have a more significant say in what and how to build.

However, the same things that make joining a startup exciting also make this decision difficult. Not all startups are equal; primarily, they differ in their maturity as a business. This has significant implications for the joiners and what they can expect. Unless you are joining a Growth stage startup (we will define these terms shortly), there is little information about the company, its product, and people working on the idea. For Series A stage companies, you will probably have some PR news circulating, but you know better to take them at face value. Because there is no long list of people who already left the company and you could gather feedback, you will have to make your decisions based on the conversations with the hiring manager and founders. As startup success is very much dependent on the founders' and first joiners' ability to figure out what pain point to solve and how to solve it, you will have to make a judgment call regarding those individuals. This is not a trivial decision.

All in all, joining a startup is more similar to an investment decision rather than an employment. The main reason being that your success will heavily depend if you picked a winner or not. It is especially true for Early venture and Series A stage startups. You will also be asked to take a lot more responsibility in this setting than in your median corporate job and stomach a pivot or two before declaring the venture a success (or failure). Therefore it is imperative for you to trust the founders and to have their trust too.

Apparently, this line of thinking is not unique to me (surprise, surprise). When I shared my thoughts with my brilliant colleague Živilė, she pointed me to the Jeffrey Bussgang video on his book Entering Startupland, where the author expressed very similar ideas to my own. I encourage you to watch that video as well if you are thinking about joining a startup.

Disclaimer: I have worked for big banks at the beginning of my professional career (roles that were not necessarily related to data science), and I have worked in several startups since. As things stand, I wouldn't consider joining a corporation, at least not one with "the usual" approach to work. So take my advice with this in mind.

The Venture Ladder

First, let us define a venture ladder - the segmentation of startups according to their maturity. I have adopted names for the phases from the post by Gil Dibner. But it echos nicely the stages described by Jeffrey Bussgang in video mentioned above - Jungle, Dirt Road, Highway. It captures the company's path nicely - finding the product-market fit, proving that this product/business model scales, and building an efficient organization for high growth.

Phase: Early Venture

Early Venture or Jungle phase is the most dynamic and uncertain. There is only an idea of the business at this stage, and all effort is directed into finding that elusive product-market fit. As described in The Lean Product Playbook, product-market fit is found when

...you have build a product that creates significant customer value. This means that your product meets real customer needs and does so in a better way than the alternatives.

As you would expect, this phase is full of (most often than not) trials and errors, as finding a genuine problem that customers would be willing to pay for is a challenge. Even if you identify the real pain point, solving it in a way that would be better than alternatives is not straightforward. This means that the uncertainty level is high at this stage. Everyone on the team should be hyper-focused on finding the best solution possible for the present iteration of the business problem hypothesis.

Phase: Series A (also B and C)

Series A or Dirt Road phase is pretty much defined by the name in Gil Dibners post, and it makes sense from the investors perspective. This stage means that the company has a working, proven product that customers are happy to pay for. So this stage is about scaling. Most importantly, about scaling the sales team to fuel revenue growth. Also, scaling product efforts to keep up with the new users and cleaning up any technical debt accumulated during the early venture stage. From the potential joiner perspective, this stage takes longer than 18 to 24 months series A round usually lasts. Jeffrey Bussgang's definition includes an approximate size of the company - 50 to 500 people, and it takes some time to scale your workforce ten times. For our discussion purposes, it makes sense to include other early rounds (B, C). Companies raising these rounds are still solving scaling pains and cleaning their technical debt while delivering value to their new and ever-increasing number of clients/customers. For joiners, it means a bit more clarity on what to build`, but lots of hard work in making that happen in a scalable way. Team processes are not yet settled either, which makes this stage still a bit chaotic at times. Couple that with a high pressure to move quickly, and you have an environment, not for the faint of heart.

Phase: Growth

Growth or Highway phase is all about efficient growth. There is still an occasional bump in the road, and there are definitely pains felt because of high growth. Still, the product offering is worked-out, technology is well designed and configured for scalability. Onboarding processes are efficient, and managers take good care of new joiners, which is their primary concern. Hiring processes are also a lot better, as companies need to add joiners very quickly [1]. There are processes and team structure and clarity of product/goals for the new data science joiners. There are still plenty of challenges to solve, but the high-level details are pretty fleshed out at this point.

Interaction between venture and data scientist levels

Having the three phases of the company defined above, we can match those to the data scientist according to his experience level. This comes from my personal experience, so if you disagree - I would definitely like to hear your thoughts!

Company Phase Junior Data Scientist Data Scientist Senior Data Scientist
Early Venture -- -/+ ++
Series A -/+ + +
Growth ++ + +

Let us go through the different experience levels and consider the options.

Junior Data Scientist

As a junior data scientist, you are probably fresh out of university or already have one or two years of experience working as a data scientist. This means that you have the technical skills to conduct an exploratory data analysis and build a model and a good grasp of different ML concepts and statistics/mathematics underpinning those concepts. Still, you lack experience with varying types of datasets and business problems. You need guidance for the approach to take or an angle to look from at the particular situation. Depending on your background, you might lack the business understanding to put your work into context. Or you might lack practical software engineering skills that can only be build working on a product - e.g., being part of a development team, planning small increments, delivering insights/code consistently. Or an efficient data science communication needs improvement - it is challenging to communicate technical concepts to a non-technical audience. Data science adds to this complexity because you need to share probabilistic results a lot of the time, making them challenging to interpret for a business owner. Data Science is a complex, interdisciplinary area, and developing all the necessary components takes time and dedication.

As you can see from the table above, I wouldn't recommend junior data scientists to join early-stage ventures. Two reasons. First, there is still little clarity of what to build. This might sound exciting, but it will be frustrating for someone with little to no previous data science experience. No one can tell you how accurate your models have to be for them to be helpful. No one can tell you how to interpret data because most of the time, it was not done before. Second, since the experienced colleagues (if there are any) are swamped with solving open-ended and complex problems, they will have little time to spear for support. Therefore, you will be left to your own devices for way too long. You might think that it is not as bad - you would be able to tinker with algorithms and "gain experience" that way. I would caution this line of thinking - the experience of tinkering with the models is OK. Still, it cannot even compare to the experience of solving business problems in a meaningful way with those algorithms. With the proper support of the more experienced colleagues, you will progress a lot faster while benefiting from the best practices your senior colleagues would share with you.

Advice: Join a startup in a Growth stage. You still need to do due diligence on the team and who is leading data science etc. If data science is an essential part of the product, there will be good people in senior positions at this stage. They will be motivated to help you succeed, and, most importantly, it will be their primary focus. At this stage, the company has learned its main offering; the data science team is mature and able to onboard new and junior members efficiently, so you can concentrate on solving entirely well-defined data science problems. You will be pushed out of your comfort zone from time to time, but that is one of the main reasons to join a startup in the first place.

A legitimate question here is - maybe I will be better off by joining a corporate for a couple of years? And the answer depends on your personality and aspirations. If you think you are the kind of person who would enjoy a slower pace, very well-defined roles, and security that comes from knowing that the company will not go bust for decades, then, by all means, look for a corporate job. However, if you desire to work in a more dynamic environment, I would suggest going there from the start. The technical skills will be transferable from corporate to startup (though corporates usually have many legacy systems, knowledge of whom will be useless). Still, the culture, work processes, and the level of responsibility and ownership are very different.

What if you are already in early or series A type of startup? Well, if you are in one of these companies and experience some of the pains described above (or any others), there are things you can do to increase your productivity and speed up your learning.

If you have joined someone experienced but find that they cannot dedicate enough time - be more proactive. If you think that you need a consult - ping your lead at any time or suggest regular check-ins for you to address any question you may have. I would bet that your principal knows and feels terrible about the situation but is swamped with other work and may just lose himself in it too often. Being proactive, you will not only benefit from guidance but also will keep the project on track. This will also help your lead to realize how much time he needs to dedicate for you to be effective and devise techniques to find that time.

If you are a single data scientist in the company, it is a bit more tricky. I know people who managed to get through by share work ethic, but it is hard, and it leaves "scars." The primary tool to utilize is to step back from the problems you are stuck on. The reason check-ins with senior data scientists work is that they give you a different perspective/angle to look at your situation. Or they give more business background and reasoning why one approach is more suited. You will have to simulate that help. When you realize you are going in circles - stop. Stop trying to solve the main challenge; take stock of what has been done. If you see no particular pattern, it might be worth going back to the business problem. Are you solving the right one? Maybe you can modify your approach to tackle a different but highly correlated problem that would be easier to solve algorithmically? Having a mentor or a "support group" would be great in this situation too. If you can, try to find someone senior that you could talk to from time to time, or a group of other data scientists, that could be your sounding boards - just the mare fact of explaining the problem to other people might trigger the enlightenment.

Data Scientist

As you gather experience, you will be more comfortable taking on familiar problems or problems from a domain you have already worked in and delivering value. Your technical skills will be more refined, and you will have insights gathered from previous experience on what works and what does not, which will make you more efficient. Above all else, you will be more comfortable with vague business requirements and will be able to figure out how to map them to concrete tasks and technical metrics. You will also be comfortable delegating more straightforward tasks to more junior team members, mentoring, or giving them feedback.

This is a tricky position to give advice to. I would say that it is still too early to join an early-stage startup. You really need to be an all-around data scientist or have worked on quite a few projects to bridge the gap on your own where you need to. If you find an opportunity with an outstanding lead data scientist in a domain that you are comfortable in, give it a shot. Just be aware that it can get quite uncomfortable from time to time, as ambiguity increases when product metrics do not behave the way you expected, and you need to get back to the drawing board. Or your company needs to pivot to the domain that you have little or no experience with.

Advice: I would recommend joining at least a Series A maturity startup. Having a validated product gives ventures at this stage a bit more stability and a core offering that you will be helping to perfect. At the same time, there will be plenty of learnings along the way you will benefit from. Some learnings will be technical and data science specific, but some will be more abstract, about value creation through data science. It will also offer a proper balance between existing data science culture and the ability to influence and build that culture further as teams are still relatively small at this stage.

Joining a growth stage startup is also a good option. Especially if you feel that one of the areas (business, statistics, software engineering) needs a more significant improvement. At companies in this stage, there is a lot more clarity on what to build, so a lot more focus is on building it. And the team already in place will help you bridge any gaps faster.

What if you decided to take a leap, joined an early-stage startup, and feel that maybe it was too early? Since you are already in it - take this chance to soak in learning how to help clients through data science. This is an invaluable experience. As a data scientist with experience, you have already solved business problems, or parts of them, before. You have the most essential technical (and other) skills at your disposal, and you know how to structure a data science project. Therefore the main challenges might lie in the ambiguity of the goals or trying to work together with the team to accomplish goals - you will have to wear more than one hat in an early-stage venture, and this might be not what you are used to. A strong support network would often help immensely - to discuss your ideas or "failures" or brainstorm potential approaches to test. You should also invest time in learning the technical skills required to wear other hats (data engineer, data analyst, business analyst, etc.). I assume that everyone is constantly learning, so just the re-prioritization of skills to learn is needed.

Senior Data Scientist

By the time you are a senior data scientist, you have worked on several projects, most probably in different domains. You can figure out other businesses quite quickly (what's essential and what's less so), and you can point to areas where data and algorithms can bring the most value. You are still very technical, but most of your focus is to make other team members productive, helping set up the team's environment, process, and tools. Helping to define and set propper context for the problems to solve.

As you gather experience, a wide range of choices become available for you. Because you already have worked on different projects and in various sectors, you would add value wherever you decide to move. Therefore everything depends on the direction you want to take. If you would like to develop/improve your team management skills, Growth stage startup would be best. If, on the other hand, you would like to experience full chaos and an emotional rollercoaster of finding what works, early venture is your choice. The Series A stage venture falls somewhat in the middle of these two - with core offering cleared, there still quite a lot of work on expanding the product, but the team is also growing steadily, and the need for processes to deliver value is genuine.

Advice: I suggest finding an early-stage startup. This might come with limitations - early-stage startups might not have as deep pockets as their more mature competitors. However, in my opinion, everyone should experience this chaotic and very creative stage. You will not only be challenged from a data science perspective - coming up with solutions to user problems that are better than the competition - but from all areas of the business. Firsthand, you will see how value is delivered to your customers and how value is created for the company. There is one major factor to consider when joining an early-stage startup - founders. Since you are joining at an early stage, founders will be very much involved with every aspect of the company. You have to be acutely aware of that and get a mutual agreement on how you will work together. Being an experienced specialist, you will want freedom of action, especially with all the opportunities to create from scratch. The problems may arise if founders do not trust your judgment, but the worst situation would be if you would be forced to implement founders bidding against your judgment (we all know the name for this). Try to evaluate the chances of this happening and the steps you would take to improve the situation.

Another factor to consider is the engineering team and the influence you will have on the engineering decisions. Data science cannot deliver quick and sound results without acceptable engineering practices and well design data systems. If you cannot influence those decisions, you will have to work hard to work around them rather than concentrate on the best solution.

What if you joined an early venture and found yourself in the worst possible situation? The founder is forcing your hands when you know better, and the engineering team that prioritizes other projects/approaches than you need/want/think would benefit the company?

For one, this is an excellent opportunity for you to practice stakeholder management and communication skills. Since you do not have the formal authority, you will be forced to present your case and lobby for it with founders, engineers, or anyone if you want something done the way you see it. Persuasion is a crucial skill in today's workplace - you will often have to convince your colleagues of your idea's merits. Even if you will have formal authority over them. The second helpful skill you will have the opportunity to practice is committing to the decision even without agreeing to it - disagree and commit. This is hard, for me at least, but essential for productive work. I find it difficult because, given the experience I gathered, I can anticipate specific difficulties down the road for data science applications when selecting between a couple of alternative solutions. I weigh them from the data science perspective mostly and try to include engineering considerations to the best of my ability. However, other stakeholders can have very different takes on the problem. Maybe they think that it is OK to take on massive technical debt for this particular situation just to get to market faster. Or they weigh the engineering complexity vs. model performance differently and therefore impose a limitation that you don't like, eliminating your preferred approach. These situations should be used to practice communication skills - you need to be able to layout your ideas understandably and persuasively - and humility to learn from others.

Discussion

While expanding my team not long ago, I thought about what candidate would be the best ROI hire and would benefit the most from this opportunity. Above I have laid out my thoughts. When thinking about what organization to join, the general rule that I came up with is: figure out how much chaos and uncertainty you can handle in your domain and in general. On average, the more experience you gather, the more you can connect the dots and thrive in chaotic environments professionally - you can formulate goals and directions yourself, and you can see what needs to be done. However, chaos from the organizational standpoint might still cause you frustrations. If this is the case, no matter how experienced you are, early venture is probably not for you. However, the level of chaos even in early-stage ventures depends heavily on the founders' ability to run an efficient operation.

Looking from that perspective, if you are just starting out in your DS/ML journey - start with someone that already has learned their lessons and has efficient processes in place to help you learn faster. This will also make you more desirable down the road, as other companies will hire you to more senior positions because you can introduce best practices. As you progress in your career, I would suggest looking for opportunities to join an early-stage startup to experience the joys and pains of creation. The younger the company is, the more critical founders are - when choosing early-stage startups, you are effectively selecting founders.

The above categorization of the startup by stage is also not bullet-proof. Some organizations might be in their Growth stage but just starting out with the data science function. Therefore, those organizations should be treated more like Series A stage companies - the organization is relatively mature, but the data science function has not matured yet. For the data science-centric company, even at the Series A stage, the data science department might feel quite robust and experienced, so your experience would be like joining the Growth stage company.

All in all, finding a promising startup to join is not easy, as is working at one. There are numerous challenges to conquer, but the dynamism, ownership, learnings, and fast pace makes it all worth it. More in a long-term, fulfillment kind of way and less so in short-term happiness/joy. Good luck in your ventures!


[1] I have had a hiring process with such a company, which lasted 2/3 weeks from the first HR call to the offer. Including flight to the Netherlands. I have never experienced that before or since.