Fast Load Data To SQL Server Using Python

Introduction

There’s plenty of good online material on how to insert data to SQL Server using python but most of it is aimed at OLTP type work loads. Essentially using the python modules pyodbc or sqlalchemy you can insert data from a CSV file to a table but you’ll notice that each row is inserted one at a time.

In order to make this process faster you can leverage the bulk insert capabilities of SQL Server.

Why use BULK INSERT over INSERT?

If we take the following example of the Cursor.executemany() method from pyodbc documentation.

params = [ ('A', 1), ('B', 2) ]
cursor.fast_executemany = True
cursor.executemany("insert into t(name, id) values (?, ?)", params)

This translates to a SQL statement like below.

insert into t(name, id) 
values 
 ('A', 1)
,('B', 2)

Which for a small enough number of rows performs perfectly but for larger files will be noticeably slow. The same is true for the method sqlalchemy.sql.expression.insert()

Example BULK INSERT with Python

In the below python script I’m leveraging the pyodbc module once again but this time I’m parsing a BULK INSERT SQL statement with the path to a CSV file.

The code is inserting the data into a staging table, hence the truncation before inserting.

import pyodbc

server = 'localhost'
database = 'myDB'
filepath = 'C:\temp\myfileofdata.csv'

# SQL Server Connection
conn = pyodbc.connect(
    'Driver={SQL Server};'
    'Server='+ server + ';'
    'Database=' + database + ';'
    'Trusted_Connection=yes;'
    )

truncsql = 'TRUNCATE TABLE [Stage].[Transactions]'

bulksql = f"""
    BULK INSERT [Stage].[Transactions]
    FROM '{filepath}'
    WITH (
        DATAFILETYPE = 'char'
        ,FIELDTERMINATOR = ','
        ,ROWTERMINATOR = '0x0a'
        ,FIRSTROW = 2
        ,KEEPNULLS
        );
"""

# Open the connection / cursor
 cursor = conn.cursor()

# Execute the SQL statement
cursor.execute(truncsql)
cursor.execute(bulksql)