Author Spotlight

To Do Great Data Science, Embrace Domain Knowledge

It’s never wasted effort even if the details are messy, inconclusive, and hard to automate around

Published in

Towards Data Science

7 min readJun 16, 2021

In the Author Spotlight series, TDS Editors chat with members of our community about their career path in data science, their writing, and their sources of inspiration. Today, we’re thrilled to present Elliot Gunn in conversation with Randy Au.

Randy Au is currently a Quantitative User Experience Researcher at Google, using data science and traditional research methods to better understand users and help create better products. Prior to that, he’s spent over a decade doing data work for various tech startups in the NYC area. He writes a weekly newsletter, Counting Stuff, highlighting the mundane, but still important aspects of data science. His current interests include doing uncomfortably awkward things with various SQL dialects, and pursuing lots of peculiar hobbies to absurd lengths, for instance: cooking, photography, and gem cutting.

You have a unique job title/scope in data science and a really interesting educational background (Continental Philosophy in undergrad!). How did you become a Senior Quantitative UX Researcher at Google?

Yes, I have a quirky background, largely driven by doing what seemed interesting at the time. Officially, I studied Business Administration and Philosophy for undergrad, primarily because I wound up taking most of the courses under both majors out of interest. The professors were good, and I did a lot of undergrad research work in both departments. Then I wound up in a Master’s program for Communications, which is where I learned my social science, philosophy of science, and research methods. After learning that the academic publishing life was not for me, I wound up at a boutique interior design consultancy helping them do survey analysis and automation of Excel/Powerpoint. I eventually learned SQL on the job, hands-on in production at an ad-tech position, before moving fully into tech as a data analyst.

For the majority of my career, I was just a “data analyst” at small NYC tech startups. “Data Science” as a term was only starting to become a thing during that period. These were small organizations, always <150 people, often <100. Most of the time I was the only data person on the team, tasked with making everyone else smarter and more effective with data. That meant I got to work with literally everyone in the company, top to bottom. It was a hyper-broad experience and I got exposed to lots of viewpoints, problems, and people.

Quant UX research is a pretty obscure job title in industry, with only a few companies officially having the role. I had been looking for work and a friend within Google was helping me search through positions, the usual data science/analyst stuff, and happened to find this for me. It fit all the product work I had been doing for years so perfectly, I applied, and, to my surprise, got hired. If you have a data science and research skill set, but find yourself always drawn to learning about users, QuantUXR might be for you.

What is your favorite project, or a project you’re particularly proud of?

Probably the most important projects I take on aren’t even officially labeled as a work “project” — it’s working with teams and people who have never had quantitative research support before and working with them until they truly understand using data to build. Working with them to understand what sorts of questions are best answered with data, how to form hypotheses that can be tested, learning to understand all the costs and benefits, the ins and outs of instrumenting products and setting metrics, reporting and monitoring. There are processes that need to be put in place and evolved, new habits formed. It’s always very involved and it’s always different each time. I learn a bit more each time I do it.

This sort of work takes a surprisingly long time, months and sometimes years, but the end result is something to be proud of. A team of people will go from not knowing what to do with data given to them, into generating hypotheses, figuring out ways to measure and test things, and actively seeking out research and data to help make the best decisions that they can. They won’t be experts at it, and will need guidance at times, but they’ll understand when they need expert help. Those individuals will then one day move on to other teams and organizations and spread that experience with them. That alone makes me really proud of this work.

You have written many posts advising both new entrants and more established analysts, from showing value as a support data scientist, to scaling yourself, and staying afloat as a new-ish solo data scientist. Any advice for readers looking to follow in your footsteps?

Embrace domain knowledge!

It’s messy, confusing, and often difficult (or downright impossible) to automate around, so no one enjoys learning it. It means learning from experts who have a completely different background and speak a different language. But so much of good data science relies on this knowledge, from knowing what data to collect and how to collect it, to knowing what questions to ask, to finally communicating your results in the best way. Domain knowledge is often not emphasized enough in discussions about data science because the complicated details means the answer is always “it depends”, but it’s never wasted effort.

How do you manage to write so consistently in addition to your work and personal responsibilities? How do you find inspiration for articles?

I wind up writing consistently once a week because I know myself well enough to know that if I allowed myself to slip and get lazy “just this once,” I’d rapidly get distracted by all the other things I have going on and get lazy. I swore to myself that I’d get one out a week and I’m doing all I can to not disappoint myself. Having a drumbeat at least keeps me honest, and there’s a comforting cadence to know that when Friday rolls around, I need to start drafting something to make it out the door Monday night.

Coming up with ideas is always a challenge if you plan on writing steadily. Luckily life is full of inspiration. I draw from work that I’ve been doing, things going on around me, tweets and memes about data I see during the day, questions from readers. If it’s something that I’m struggling with, or I see someone else is struggling with, it’s a good start to finding something to write about. The work is in taking that seed, which can be very small, and analyzing it a bit until you can pull an article out of it.

It helps to keep a notebook, or an open file, to toss germs of ideas into when you come across them. That way you have a stock of material to start from instead of just a blank page.

What kind of writing in DS/ML would you like to see more of?

There’s an endless amount of content aimed at new entrants to the field these days. You can’t really go a day without finding another variation of a “How to become a DS” article being published somewhere. It gets eyeballs and metrics because the field is currently red hot, but as a practicing community we need more content for us practitioners. We need more people producing content where experienced folk can continue to learn and share and grow.

That means sharing experiences, techniques, successes and failures, all the tools and experiences that make up data science. We could use more posts that translate the latest academic work into layman’s terms, introduce less common techniques, or shed light on quirks and gotchas in extremely common techniques. It also doesn’t have to be bleeding-edge “I created a new ML framework and solved world peace” type content either. Even if you write about your trusty favorite method that was invented 150 years ago, there’s tons of people out there who aren’t familiar with it and can benefit from your hands-on knowledge. I believe there’s lots of room for people to come and write about their experiences and join the community of data science writers.

What are your hopes for the DS community in the next few months/couple of years?

As the world very slowly defrosts out of the COVID-19 lockdowns, faster for some countries and unfortunately slower for others, I hope that the data community continues to be as awesome as it has been. We’ll be able to meet each other at conferences and events again some time soon. On top of that, I hope that we’ve learned a thing or two about running awesome online data events and hope some of those also remain because they’re fun, awesome, and can be very inclusive.

Curious to learn more about Randy’s work and data science interests? You’ll find his writing on his Medium profile, on his Substack, Counting Stuff, as well as on his Twitter account. Here are some of our recent favourites.

Data Cleaning IS Analysis, Not Grunt Work (TDS, April 2021): Randy explores how “cleaning” is a form of analysis that imposes values, judgments, and interpretations upon data, and shouldn’t be considered as beneath real data science work.
Dates, Times, Calendars — The Universal Source of Data Science Trauma (TDS, September 2019): A deep and comprehensive dive into a major pain point for data scientists.
Be Yourself: The Data Scientists You See In Public Are Not Representative (TDS, December 2019): Randy reminds us that if you work with data, your work is within the scope of the data sciences. Don’t be intimidated by gatekeepers.

Stay tuned for our next featured author, coming soon. If you have suggestions for people you’d like to see in this space, drop us a note in the comments!