- Tech Insights
Senior Machine Learning Engineer, Anti-Abuse AI
In a tutorial at the recent Strata Data Conference in San Francisco, we shared our experiences and success in leveraging emerging techniques to power intelligent decisions that lead to impactful outcomes at LinkedIn.
We started our tutorial by addressing two important questions:How do you determine the right metric (KPI) for a business goal?
How do you test out a new feature on the site to make business decisions?
We shared a few real business problems and project examples to demonstrate how our data scientists tackle those problems at LinkedIn.
Matching KPIs and business goals
To design a business KPI, there are many elements to consider. A KPI must align with the ultimate business goals, not short-term vanity numbers. It needs to be both simple and interpretable; it needs to be actionable and moveable. That often leaves data scientists with the important responsibility of translating a fuzzy business question into a data science question that will use rigorous analysis to get results, and then later, be translated back into business terms. With these principles in mind, we’re often tasked with the following types of questions: “What is an active member?” “How do we define a highly engaged customer?” and so on. Once this KPI is determined, next we will focus on driving improvements.
Testing features and making business decisions
To test out a new feature, we do extensive controlled experiments, also known as A/B testing. LinkedIn has its own A/B testing platform that enables convenient setup and fast iteration of experiments. At any given time, the platform runs hundreds of experiments.
The data science and machine learning process
During our tutorial, we talked about the machine learning process for a model to be deployed in production at LinkedIn. This process has six major steps:
* Problem formation
* Label preparation
* Feature engineering
* Model learning
* Model deployment
* Model management
Defining the problem is critical to the success of any project. Pre-analysis is often conducted to understand the current business problems and challenges, along with what we want to achieve and how to align this with business priorities.
Labeled data is the thing we are predicting in a machine learning scenario (for example, a relevant piece of content in a feed recommendation system). Label definitions are key to training, testing, and validating data sets. Depending on the implications and business priorities, the label definition could be different. As an example, to develop a churn prediction problem, the labels can be defined as completely churn (renewal rate = 0) vs. not completely churn (renewal rate > 0). Alternatively, the churn rate could be defined as partial churn (renewal rate < 1) vs. no churn (renewal rate >= 1). The first definition fits better when we focus on keeping customers, while the second definition fits better when we are focused on growth.
Features are the inputs to a machine learning system; for a feed recommendation system, the feature is the content. We are often faced with too many features from multiple data sources. Thus, we must first collect the features that are not only meaningful in solving our problem, but also in line with the labels we defined. Then, we integrate these features with the label by being careful with the alignment for dynamic features. Later, we can clean and carry out the transformation in order to best reveal the patterns of the data.
We start with partitioning our data into training, validation, and testing sets. Then, we train our model with the training set. While doing that, we should choose our solver by considering the type of our problem, the system requirements, and also the balance to strike between performance and interpretation. In order to choose a solver with the best-performed parameters, we can also run a hyperparameter search. Then, we use different evaluation techniques to choose the best model by using the validation set. While choosing the best model, we should also consider business metrics. Next, we present our model’s results on the test set.
Once the modeling process is over, we deploy and run the model in production. This enables us to schedule and run the scoring pipeline regularly.
Once we deploy the model, we regularly run feature and model performance monitoring to see how the model is performing, and if it is utilizing the right content of data. If we decide to refresh our model, we retrain the model and then conduct A/B testing in order to compare the new model with the old model. Depending on the A/B test results, we decide which model to use in production.
Even if we go through the six steps of the machine learning process, there is a chance that our model may not deliver the desired performance. This happens because there are many common pitfalls and challenges that may pop up during the process. During our tutorial, we talked about the two most common challenges: model interpretation and data quality.
Model interpretation is one of the challenges that we face in our day-to-day work. When we present our modeling results to our business partners, they care not only about the results, but also about the “why?” We could use the feature importance (coming from the machine learning model that was used to generate the results) to explain the key drivers of the results, but this method can come with some drawbacks, such as difficulty in interpreting the ranking of correlated variables or bias for variables with more categories. For example, let’s say we are building a model to predict who we should be sending email for the career subscription by using logistic regression. Suppose both the feature job search and the feature job view are important in order to decide who we should be sending email to. If these two features are also correlated, then how do we decide which one is more important?
Instead, we use group-wise feature interpretation. In this method, we cluster features into buckets with semantic meaning, and then build models based on only the subset of the features within each bucket.
In the example above, both Emily and Steve received high scores of 0.9 from the primary model, which uses all the features. However, this presents a discrepancy from their scores from the three other models (Behavioral, Identity, Social) that are generated by features related to behaviors, identity, and social features, respectively. The above results suggests that Emily has a high score with the primary model because of her behavioral features, while Steve received a high score because of his social features.Data quality
An example of low quality data is missing historical data or noisy data. In order to discover potential issues ahead of time, we have a quality monitoring flow that generates insights for the data quality for each week, month, or year. With this flow, we regularly generate feature profiles and use statistics from feature profiles to compute a quality score. With this quality score, we compute the health index anomaly score by checking the percentage of anomaly features. We store this information in a Quality track table to reflect the status to see if a certain feature can be used for modeling or not. If we see any problem we send an alert to the owner of the feature.
Thanks to the rapid growth in data resources, it is common for business leaders to appreciate the challenges and importance in mining information from data. In our tutorial, we presented a case study on marketing, a field that has been traditionally considered to be more artistic than quantitative driven until recent years with the rise of big data. Much of the data involved describes the customers, such as who they are, what they do, and, more importantly, their interactions within their companies. Mining such information becomes an important part of the marketing practice to better understand their customers.
In this example, we demonstrated how data science techniques and methodologies can help companies’ marketing decisions through customer acquisition, customer engagement, and prevention of customer churn. The combination of data science tools and techniques with marketing experience and business acumen leads to the success of data-driven marketing.
In customer acquisition, marketing and sales would like to identify the right customers to target their efforts to, and to evaluate the marketing investments across different channels and assist with determining future resource allocations. Data science tools like media mix modeling, customer response model, and multi-touch attribution approaches can be applied to solve these problems.
In customer retention and engagement, data science tools could help with not only measurement of levels of engagement by each customer or customer segment, but also a customer dynamic response model could help evaluate the effectiveness of various marketing interventions in encouraging customer engagement, and help design targeted messages to the right customers to improve retention. In addition, through the analysis of the customer’s actions and engagement, a customer lifetime value (CLV) model could help identify the most valuable customers, in terms of the revenue or profit the customer would bring, through his/her own purchases in one or multiple products and categories, and through his/her influence to other customers.
Finally, statistical and behavioral modeling approaches could help predict the customers who are more likely to churn, based on the observed information related to the customer and his/her interactions with marketing and the company’s product. Once those customers are identified, the marketing team would design the right message to re-engage the customer and hopefully prevent a valuable customer from churning.