Building a Data Science Team

As you may know I regularly work for Data Science Retreat - a boot camp style program, where participants (with prior experience) are trained to be freshly minted data scientists. Over the course of three months they also build a (larger) portfolio project that are presented on a demo day.

On the last demo day Ronert Obst gave a very well received keynote talk about building a data science team. Because the talk was not recorded, Ronert gave me the slides. Let me do a quick write-up on this.

New Yorker

Ronert works as a head of data science for a large German fashion company, New Yorker1, with a revenue exceeding a billion euros. The challenge he had to solve is to build a DS2 department from scratch in a conservative company in Germany. As you may know we are pretty behind IT tech wise in Germany. As such he had to also overcome several obstacles inside the company. To further add into the complexity the only office to begin with was in Brunswig, a smaller town in Germany where as an employer you compete with the huge VW group.

But not everything worked to his disadvantage: If New Yorker’s organisation was compared to a state, then it would be an authoritarian dictatorship. The founder Friedrich Knapp, which would be the dictator, strongly supports the DS initiative. This empowers Ronert (and in proxy the team) to propose changes to business processes, which then would get really implemented at a fast pace. I don’t think that this is a necessary condition for a successful DS operations, but it certainly has a reductive effect on the cycle time.3

Hiring the right people

Ronert tried different channels for hiring:

Very important insight: People may look very good on paper, but are incompetent. Let me add that competence has a subjective factor, i.e. how well does the person fit into the team and company.

The tech stack

Ronert claims that a mix of 2/3 DS to 1/3 DE is appropriate. I disagree: it depends on the domain you are working in. Ronert is great in DE himself, so he got the best tooling and setup/workflow. Of course, this shifts the balance into favouring DS compared to DEs.

As such this is very related to the tech stack in place. At New Yorker actually licenses for the Hadoop vendors were bought. I have seen many startups, who wanted to save money on this, but let me put it this way: you are also playing a lot of blood/sweat. Setting up a hadoop distribution with proper monitoring and logging is harder than it sounds at first. Not to mention updating it.

Continuous Presentation

I let the slide speak ;)

continuous

What do DS expect?

Last from an employer sight he described what employees are looking for, to show you the slide:

Expectations

This sounds to me that he asked, but only got semi-honest answers. I can tell you what I would want on top of the points listed:

And please stop overhyping Slack.


  1. It is not the news paper. Funnily, not many people in Germany know or expect New Yorker to be actually founded in Germany (in 1971). For the last decade the company also expands outside of Germany (and Europe!) at an interesting pace. 

  2. DS = Data Science (or Scientist), DE = Data Engineering (or Engineer). 

  3. Cycle time is the time it takes from starting a process and seeing its effect. For example it would be to come up with a machine learning experiment, implementing and running it. Then at the evaluation time this measures the cycle time.