5 Habits of Highly Effective Data Scientists

While COVID has negatively impacted many sectors, bringing the global economy to its knees, one sector has not only survived but thrived: Data Science. If anything, the current pandemic has only scaled up demand for data scientists, as the world’s leaders scramble to make sense of the exponentially expanding data streams generated by the pandemic. 

“These days the data scientist is king. But extracting true business value from data requires a unique combination of technical skills, mathematical know-how, storytelling, and intuition.” 1

Geoff Hinton

According to Gartner’s 2020 report on AI, 63% of the United States labor force has either (i) already transitioned; or (ii) is actively transitioning; towards a career in data science. However, the same report shows that only 5% of this cohort eventually lands their dream job in Data Science.

We interviewed top executives in Big Data, Machine Learning, Deep Learning, and Artificial General Intelligence; and distilled these 5 tips to guarantee success in Data Science.2

BUILD YOUR BRAND

On the internet you are your brand, you are the sum and average of your Twitter, GitHub and other social profiles.

If a tree falls in a forest and nobody hears it, was the tree really ever really there? If you {learn  something, think something, build something, eat something} and don’t share, all that value goes down the drain. 

Image Source: https://www.franchiseindia.com/wellness/Bringing-in-a-New-Player–7-Ways-to-Build-your-Own-Brand-in-the-Wellness-Industry.9964]

Anything you do—or pretend to do—has the potential to build your brand:

  • Screenshot your favorite equations and share them on Twitter.
  • Take your favorite ML tutorial, redraw the graphics by hand, and post to Medium, finding a maximal audience with minimal effort.
  • Whether in Homer, Shakespeare, or the Kardashians, any engaging story has conflict. Don’t like your boss? Post it to Twitter. Waiting for an hour on the Tarmac, subtweet @AmericanAir. Identify a nemesis? Launch a flame war and see both of your brands grow.

After you reach 1000 followers, haters are going to start to question why you have so much influence. Some people might say you’re not a “real researcher”. This is why you need to publish papers. Worried that your ideas aren’t original enough? Rest assured. Science proceeds by standing on the shoulders of giants. Search and replace on a technical term. A thesaurus can help here. For instance, convert “quantum gate” to “quantum door”. Or substitute “complicated Hilbert space” for “complex Hilbert space”. Post to arXiv. Rinse and repeat. 

KNOW YOUR DATA

Do the rows in your dataset correspond to real people? Yes? How do you know? Have you met these people? Get off your butt and work the phones. Find out who these people are. Cross reference usernames against other datasets. Hire a private investigator. Whatever it takes, find out who they are. Call them on the phone. The primary metric you should be optimizing is customer delight, not predictive accuracy.3

Image Source: https://blogs.prio.org/2016/11/get-to-know-your-data-double/

EXECUTE WITH RADICAL TRANSPARENCY

Machine learning is in the throes of a reproducibility crisis. According to a Bloomberg industry report in June 2020, over 83% of machine learning results are entirely fabricated or artifacts due to multiple hypothesis testing, excessive hyperparameter optimization, or bugs in the code.

Image Source: https://liberationist.org/how-to-move-from-secrecy-to-radical-transparency/

The only way to build trust with the scientific community is to commit to radical transparency. Post all of your code to GitHub, use human-readable variable names, post all of your training runs to a public dashboard via Weights and Biases.

But that’s not enough. Even with public code, you’re still hiding all of the domain knowledge that goes into creating it in the first place. Why was a particular type of layer chosen? Which other ideas were tried but failed to make the cut for a publication? Which podcasts were in heavy rotation when inspiration struck?

Real commitment to transparency requires 24h livestreams of your entire life. Real science happens in real time.

Image Source : http://cseweb.ucsd.edu/classes/sp15/cse190-c/slides/week3/lecture5.pdf

GIVE BACK TO THE COMMUNITY

You might be the best data scientist in the world, but how is your next employer supposed to know that? They don’t see the code that you write for yourself or for your boss.  The best way to make your work known in the real world is to make an impact in open source. 

Image Source: https://www.advanceddentistry.org/a-dental-practice-focusing-on-community

With so many commits on so many repositories, nobody has time to actually see what you contributed. Therefore, the best strategy is to single out high impact projects and make a large number of fixes, such as removing whitespace or fixing typos – make sure that each fix gets its own commit to maximize the visibility your work gets.

The open source community is famous for gatekeepers that decide what is or is not a valid contribution. However, the entire raison-d’etre of the internet is that it eliminates the need for gatekeepers. Don’t let anyone tell you what you can or cannot accomplish.

ADOPT A GROWTH MENTALITY

Data science has been around far longer than the phrase “data science”. But the field moves so fast that the Data Science of your forebears would hardly be recognizable to our generation. To keep up with this rapidly evolving field, you must evolve your skill set.

Image Source: https://sites.dartmouth.edu/learning/2017/05/18/understanding-the-growth-mindset/

Centuries ago,  being a great data scientist required mastery of the abacus. In the 20th century, classical statistics took over, demanding command of calculus and measure theory. Today, conquering data science requires virtuosity with package management.4 Top data scientists can work the full stack of package installation. From apt-get to pip to conda to CRAN, today’s data heroes can install any package on any machine at any time.

And tomorrow? To be a data scientist is to constantly challenge yourself to become a better thinker, writer, mathematician, illuminator and programmer. A data scientist never settles. A data scientist strives to learn every framework that makes it to the front page of Hacker News. A data scientist rejects dogma, goes from first principles in a singular quest towards truth.

Co-authored by Mark Saroufim and Zachary C. Lipton

FOOTNOTES

  1. Actual quote source: https://www.cio.com/article/3263790/the-essential-skills-and-traits-of-an-expert-data-scientist.html 
  2. Some other important keywords are: blockchain, cryptofacism, optical computing, neuralink, quantum security, siraj raval,  $TSLA, 
  3. Do not do this.
  4. To acquire virtuosity in package management we recommend the following exercise. Every hour, on the hour, visit Hacker News. Identify the most trending package and install it immediately. Do not blink. Do not think. Create.

✝ These reports do not exist.

RELATED POSTS

  1. Is This a Paper Review?
  2. AI Researcher to Join Johnson&Johnson, Make More than 19 Squillion
  3. ICML 2018 Registrations Sell Out Before Submission Deadline
  4. Death Note: Finally, an Anime about Deep Learning
  5. DeepMind Solves AGI, Summons Demon

Author: Mark Saroufim

twitter.com/formalsystem Mark Saroufim is a Machine Learning Engineer at Graphcore focused on drug discovery. In his past lives, Mark worked on his company Yuri.ai a game AI service and has also worked as a Product Manager and Applied Scientist at Microsoft on Language models and Large scale analytics systems. Mark is also passionate about online teaching and regularly writes on robotoverlordmanual.com and streams on twitch.tv/formalsystem

Leave a Reply

Your email address will not be published. Required fields are marked *