Statsmodel API OLS Model Example

Here’s an example of how to load and analyze the “tips” dataset using both pandas and statsmodels libraries in Python:

import pandas as pd
import statsmodels.api as sm

# Load the tips dataset from seaborn library
tips = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv')

# Fit a linear regression model to predict tip amount based on total bill
X = tips['total_bill']
y = tips['tip']
X = sm.add_constant(X)
model = sm.OLS(y, X).fit()

# Print the summary of the regression model
print(model.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                    tip   R-squared:                       0.457
Model:                            OLS   Adj. R-squared:                  0.454
Method:                 Least Squares   F-statistic:                     203.4
Date:                Wed, 22 Feb 2023   Prob (F-statistic):           6.69e-34
Time:                        22:02:24   Log-Likelihood:                -350.54
No. Observations:                 244   AIC:                             705.1
Df Residuals:                     242   BIC:                             712.1
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.9203      0.160      5.761      0.000       0.606       1.235
total_bill     0.1050      0.007     14.260      0.000       0.091       0.120
==============================================================================
Omnibus:                       20.185   Durbin-Watson:                   2.151
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               37.750
Skew:                           0.443   Prob(JB):                     6.35e-09
Kurtosis:                       4.711   Cond. No.                         53.0
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

In this example, we first load the tips dataset using the pandas read_csv() function. Then, we extract the relevant columns (total bill and tip amount) and fit a simple linear regression model using the OLS() function from statsmodels. Finally, we print the summary of the model using the summary() method of the fitted model object.

Note that we also add a constant term to the predictor variable X using the add_constant() function from statsmodels. This is necessary because the OLS function does not include an intercept term by default.

Leave a Comment