Deploying machine learning models requires multiple teams and coordination. Developing a statistical model or picking which one to use is simply not enough. A machine learning engineer must also be able to implement it into a large system.
Between the various teams required to deploy the model and all the expertise to design it, new models can often become blocked or slowed down. Pre-made machine learning models that can be easily integrated can speed up the deployment of ML models and reduce the need to involve experts to handle them. BigQuery, for example, allows you to implement several models into your SQL queries using BigQuery ML.
Through BigQuery ML, teams can gain access to ML models in BigQuery, which enables them to create and execute them easily using SQL rather than having to push the data to another language such as Python or Java. BigQuery ML was designed to democratize machine learning and shorten the time required to develop and implement models.
This article covers some of the key models supported by BigQuery ML and how your team can benefit from them.
5 ML Models Supported by BigQuery
BigQuery ML offers a wide variety of machine learning models that can be implemented into your SQL queries. This includes pre-created models that BigQuery ML allows you to easily implement in order to load neural networks you’ve created externally. Below are five statistical models you can use in BigQuery ML.
Binary Logistic Regression
One popular statistical model is binary logistic regression. Logistic regression is a classification algorithm for categorizing data with a binary output such as true/false or yes/no. This model can utilize multiple input variables to calculate what is known as the log odds and then provide a true/false style output.
Binary logistic regression works when trying to predict a binary output. Some examples include true/false outputs such as whether or not a person is likely to click “buy” or whether a medical patient will be readmitted to the hospital within 30 days.
This conceptual simplicity is what makes binary logistic regression such a popular model.
Implementation of Binary Logistic Regression
Implementing binary logistic regression involves following two distinct steps. The first step is to create the model, as demonstrated in the code below.
Once you’ve created a model, your team can implement the said model using the ML.PREDICT function and then reference your model’s name.
There are many classic time-series models. For example, the ARIMA model is a standard predictive model used for forecasting outputs like sales, users, attendees, etc. Time series relies on two points of data; that is time and the actual value you are trying to predict. By using historical data as well as considering such concepts as seasonality, time series can create finely tuned models with very limited inputs.
BigQuery allows you to choose from several popular time-series models like ARIMA and also enables you to set parameters to manage holidays. This allows you to develop a fine-tuned time-series model.
Time series models are useful for predicting concepts such as monthly sales or the number of attendees (e.g., at a theme park). This is due to the fact that time series is able to take seasonality and other factors into account when you calculate them, thus making for a very robust output. In addition, time series only require a single value to calculate.
Implementation in BigQuery
Once again, you will use the CREATE MODEL clause to implement this model. Next, you can reference the date column, time-series data, and your preferred model.
It is also possible to add in other columns to break down the categories you are trying to predict on.
Boosted trees are like decision trees, except that they also integrate the concept of ensemble ML algorithms. Ensemble learning is the process in which multiple weak learners provide input for the final classification and form a single strong learner — the model. While there are many different methods of ensemble learning, with boosted trees, boosting trains models in succession. Based on the output of the previous learner, the next one will be trained to improve upon the errors made by the previous ones. This layering technique helps in turn to improve the model’s performance.
Boosted trees have become a catch-all model. Due to the robustness of the model as well as its flexibility to fit most use cases, users can use them to predict the likelihood of a person asking for a loan paying it back, a recommendation system, or weather, for example.
Because of how the model is set up in BigQuery ML, it is easy to implement into your workflow.
Implementation in BigQuery
Both linear regression and multivariate linear regression models take a set of independent variables and use them to determine some form of linear relationship with a dependent variable.
For example, a single variable linear regression model looks like the
y = mx + b formula. This is because that is what a single variable linear equation is. You are essentially trying to figure out what you need to add and multiply by x to get y.
Multivariable regression is similar, except you can have many more variables.
BigQuery’s Linear regression can be used for both multivariate regression and basic linear regression models.
The BigQuery linear regression model can be used to calculate outputs such as housing costs or the impact that advertising spend has on your company’s bottom line, based on multiple variables. Linear regression is often a great place to start when your team is trying to find the relationship between two or more variables.
Implementation in BigQuery
Tensorflow is an open-source framework used to articulate and process complex mathematical calculations. Tensorflow is particularly known for neural networks. Tensorflow can be used to train and run deep neural networks and is one of many libraries that has made neural networks much more accessible to developers.
Now it’s also possible to implement a Tensorflow model into your SQL queries. Implementing Tensorflow in BigQuery involves a slightly different process than some of the previous models referenced in this article. Unlike the previous models that exist as functions in BigQuery, in order to use Tensorflow, you will need to import your Tensorflow model. This is demonstrated in the code below.
Before we discuss how to implement Tensorflow into your BigQuery queries, let’s talk about where these models can be useful. Typically, Tensorflow is used to develop neural networks that classify videos, pictures, and text. For example, many self-driving cars utilize neural networks to help classify what is in front of them. Other examples of where you could implement a Tensorflow neural network is classifying text for sentiment. Overall, Tensorflow allows for a broad range of solutions that can now be implemented into BigQuery.
Implementation in BigQuery
The Benefits of BigQuery ML
BigQuery is an ideal solution for teams with limited technical resources. Rather than spend time implementing a model in code, teams can use SQL, enabling them to naturally integrate their ML models into the rest of their data infrastructure, like ETLs.
BigQuery can also be a good temporary solution. Perhaps you need to deploy a model quickly and want to fine-tune it later. Instead of developing a complex code-based model, your team can start with a BigQuery model and see how it works out. Based on the user interaction, you can choose to fine-tune it or leave it as is. Overall, BigQuery ML is an effective solution for faster ML model deployment.
For development teams lacking an ML expert in their ranks but looking to implement ML models quickly, BigQuery ML is a viable alternative. Written in SQL, which is widely known among developers, BigQuery ML is easy to use, whether you’re a data engineer, software engineer, or perhaps even an analyst.
The real question then is which model is right for you. Whether a simple binary regression model or something more complex like Tensorflow is required can only be determined once you start testing out your next model.