Mona Khalil is a Senior Data Scientist at Greenhouse, a developer of hiring and talent management software.
We talked about their background, data ethics, and how to responsibly deploy ML and AI to make decisions about humans and careers. This interview has been condensed and edited for clarity.
Mona: I think my path into data science was a little bit nontraditional. My formal academic background is in Research Psychology. I was pursuing a PhD. I had an opportunity to take a Psychometrics project class, and that is where I learnt about this shiny new field called Data Science. I learned quite a handful of sophisticated statistical methods from within that program, and ultimately took what I learned and pursued a career within Data Analytics and Data Science.
We work on a whole different suite of problems. In the last couple of years that has really evolved given the team has gone from two to four people. We are in the process of scaling out product A/B testing and experimentation to be a lot more robust than it was. We are also working on creating more sophisticated frameworks for doing A/B tests. And we are starting to think about what machine learning can look like in the product, which is something we are really being thoughtful about ahead of time. We want to make sure we do it right when we get there.
What we have been thinking about internally is a little bit different than what we have seen from others in the field. Some other systems are trying to create a lot of automation, like candidate matching and rankings.
At Greenhouse we have actually been extremely hesitant to touch a lot of the big problems that some of our customers have been requesting that we take a look at. What we have decided instead to focus on a tiered approach to different solutions, where instead of trying to rank people, match people, or make decisions about whether or not a person deserves a job, we have decided the thing to do is actually to use ML to augment people's abilities to make effective decisions.
What that looks like is doing things like creating benchmark departments, like predicting department hierarchies and classes of departments so we can provide benchmarking to different customers, we have a couple of recommendation engines in the product where we are recommending the use of different reports, like things you may want to report based on what your role is, or what you have previously reported on, stuff like that.
We also have a couple of Bayesian features in the product, for example predicting how long we think it is going to take to fill a specific role in your company. So, really just leaning into providing intelligent information about processes as opposed to about people.
I think for the time being, yes we are leaving certain topics untouched. We are also keeping a very close eye on legislation that’s coming out in different areas. We know that there is a bill in the New York City Council that will require reporting the use of all ML and automated systems that make decisions about candidates and the hiring process, so we would like to see how the dust settles on that.
We are also working on just capturing better data underlying our system. For example, if you are a previous Greenhouse user you know that it’s a highly configurable system and a lot of the data is messy in different ways. So, we are thinking how to be intelligent in making different data fields first class and capture better data over time.
We are also starting to think about what a decision science or data science research function might look like. For example, instead of just saying you know we will never ever rank a candidate, we would see if it’s possible to create more intelligent data sets that can actually mitigate bias, or look at different areas of the hiring process where bias tends to creep into the process. For example, making better recommendations about what kind of technical take home tests are likely to lead to candidates of different backgrounds dropping out of a hiring process.
You have actually touched on what is going to be our big next metrics that we are in the process of developing. We have put a lot of work over the last six months into developing a set of inclusion metrics to begin to understand how effective our customers are in different aspects of creating an inclusive and diverse pipeline.
We thought about starting a data ethics committee at a time when we were beginning to roadmap what machine learning in the product would look like. We were on the horizon looking at the pretty significant expansion of data science, and we were also having internal discussions about things that could go wrong with different models.
One of my team members and I put together a Request for Comments (RFC) for the internal team, saying that we would actually benefit from having some type of bird's eye view of any data project that could have an impact on candidates’ getting a job or getting chosen for an interview. We wanted to make sure we would have a number of different perspectives looking at the project, to make sure that we were not missing anything and don’t have a blind spot in terms of potential impact.
I led an effort to find how people have done this, and drew pretty heavily from what I learned in graduate school. My advisor and my PhD program - Dr. Celia Fisher - was actually I think one of the world’s leading researchers, wrote the Psych ethics code, and contributed to the Public Health Ethics code, so ethics was just a passion and interest of mine for a very long time, and specifically from a behavioral science perspective. I started reaching out to a number of different networks to see how this has been done elsewhere. The decisions we make as a company can impact millions of people, so we wanted to do things right.
The one thing that really didn’t exist was a framework for how to set this up successfully. I had spoken to a number of different companies and people that had small working groups mostly within data science itself, and didn’t have much of an external perspective from different stakeholders around the company.
Even compared to a year ago, today there are a number of incredible tools to be able to evaluate models for bias and fairness which is pretty fantastic. I would also love to see more accessibility of tools to not just mitigate bias, but actually to remove it. For instance, the ability to correct or review outcomes associated with previously disadvantaged groups, or working guidelines to create more effective datasets where you’re not just trying to mitigate bias in the model itself.
Right now the committee has ten members from different perspectives. We have me from Data Science, someone from Data Engineering, we have folks from Product, Security, Sales, Customer Success, and Marketing. We brought in a number of different perspectives along the line for people who interact with customers and may receive questions about automated systems, machine learning, or AI, or those who may be involved in handling them.
Given that our Data Science function is still small and growing, we want to cast a wide net, so right now it’s called The Data Ethics and Product Committee. We want to be a source of ethical perspective and a resource to anybody who has some type of dilemma in any of those areas.
What that has looked like so far is we have been working on sets of Product and Data Science guidelines for what’s considered risky in terms of the candidate’s outcome and experience. What’s considered risky for the types of information we might share and feedback loops we might generate in our system. Or areas in which somebody’s privacy might be at risk. So, we have a number of different perspectives, and an expert who consults on each one of those as well as members from other parts of the org, that together we have been for the most part creating guidelines and resources. We already have a set of ethical principles published to our website which outline what we agree to as a company. We also wrote a blog post announcing it. We are also looking to share our perspective, including talks at a couple conferences.
And now we are asking around the company, “what can we do?” One example is working with Sales and Customer Success and educating those teams on if a customer is asking for some type of AI feature to rank their candidates or sort through their candidates, to actually be able to take on those conversations and explain why we make the decisions that we do.
We have also been bringing in external experts to speak to the company. We did a screening of Coded Bias that was hugely successful and started an org-wide conversation. And we have been brought in on a number of questions related to the DE&I, related to a couple of different dilemmas, customer questions, external partner questions. Like, really we are just seen as a resource right now, that’s working on supporting and educating the company.
I 100% agree with you. I think that’s a really important conversation to actually start having within Data Science across a number of different perspectives. So much of what we have used as training data sets is just what happens to be available, as opposed to looking at what outcomes we want, and then actually creating or curating training sets that can generate predictions that make better decisions than people.
One of the whole appeals of AI was that the computer can make unbiased and objective decisions because it doesn’t have the 30, 40, 50 years of bias and experience that a human being does. But it is still just reflecting whatever happens to be in a single data point.
I suspect that it is possible to actually create more effective curated training sets to not just mitigate or reduce bias, but to actually start with what outcomes you want for people. You want a model that’s going to predict an outcome that’s fair across different subgroups within a population. And if you have an understanding of where those unbiased processes are occurring within a subset of your data set, I think there is a world where it is just possible to actually create better training sets. I don’t know if we are there yet, but that is definitely something of interest to us and something that I would hope to start a conversation about.
We have more than 100 years of research methods and experience that we can draw on from the social sciences as to effective data collection, like creating more laboratories settings that maybe in an extreme, actually lead to poor outcomes in a different direction, but the big promise of Data Science was actually leveraging the domain expertise of different fields and I think that there is an opportunity there.
We are so excited about what’s next. We have laid out our goals for the year, mostly focused on expanding the resources we can offer internally. But we’re also looking to have dedicated time and resources to create our own research, to publish our own findings, to actually be an external resource. We are still in the early stages of planning that out, but we have got a lot on the horizon.