Skip to content

Snowflake Example ❄️

snowflake

Establishing a connection between an Snowflake database and whitson+ through our external API is a standard practice. While this example involves a Snowflake database, the majority of the steps are applicable to connecting any data source to whitson+. Read a Devon case study example here.

Auto Update Prod Data to whitson+: Workflow Overview

bhp input data model

Engineers spend 95% of their time uploading data. This example demonstrates an automated daily update of production data. The process adheres to a widely used structure: connecting to a database, verifying if the well is already uploaded to whitson+, and creating a new well with production data if it isn't. Alternatively, if the well exists, it appends the new production data to the existing entity. Let's explore the details!

Connect Snowflake to whitson+: Production Data Example

What does the snowflake.py file do?

This file shows an example of how one connect to a Snowflake database and a relevant whitson+ domain. The file uses a helper class WhitsonConnection in whitson_connect.py provided here.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
import whitson_connect
import pandas as pd

# ----------------------------------------------------------------------------------------------------------------
# 1. WHITSON CONNECTION
# ----------------------------------------------------------------------------------------------------------------

CLIENT = "your_domain_here" #This is the company suffix in your whitson urls ex. 'courses' in courses.whitson.com
CLIENT_ID = "your_client_id_here" # Available on request
CLIENT_SECRET = "your_client_secret" # Available on request

OVERWRITE_LAST_45_DAY_BOOL = True # Will overwrite last 45 days of prod data
SYNC_WELLHEADER_DATA = False

PROJECT_DICT = {
    ('PERMIAN'): 1,
}

PROJECT_ID_LIST = [value for value in PROJECT_DICT.values()]

whitson_connection = whitson_connect.WhitsonConnection(
    CLIENT, CLIENT_ID, CLIENT_SECRET
)

whitson_connection.access_token = whitson_connection.get_access_token_smart()

# ----------------------------------------------------------------------------------------------------------------
# 2. SNOWFLAKE CONNECTION
# ----------------------------------------------------------------------------------------------------------------

snowflake_connection = whitson_connection.snowflake_connection(
    account='YOUR_ACCOUNT',             # example: 'abc12345.east-us-1.azure'
    user='YOUR_USERNAME',               # example: 'WHITSON_READ'
    password='YOUR_PASSWORD',           # example: 'password123!'
    database='YOUR_DB_NAME'             # example: 'ABCDEFG_123456_WHITSON'
)

# ----------------------------------------------------------------------------------------------------------------
# 3. Wellheader Info
# ----------------------------------------------------------------------------------------------------------------

snowflake_query = """SELECT * FROM YOUR_DB_NAME.PUBLIC.WELL_DATA"""
snowflake_wells_df = whitson_connection.snowflake_table_to_dataframe(snowflake_connection, snowflake_query)
whitson_wells = whitson_connection.get_wells_from_projects(PROJECT_ID_LIST, 200)

for index, well in snowflake_wells_df.iterrows():
    wellname = well['well_name']
    if any(well["name"] == wellname for well in whitson_wells) and SYNC_WELLHEADER_DATA == False:
        print(f"Well with name '{wellname}' already exists.")
    else:
        well_info = {
            "project_id": PROJECT_DICT[('PERMIAN')],
            "name": well['well_name'],
            "uwi_api": well['uwi_api'],
            "t_res": well['t_res'] if well['t_res'] else 200,
            "p_res_i": well['p_res_i'] if well['p_res_i'] else 8000,
            "h": well['h'] if well['h'] else 200,
            "h_f": well['h_f'] if well['h_f'] else 200,
            "phi": well['phi'] if well['phi'] else 0.05,
            "l_w": float(well['l_w']) if well['l_w'] else 5280,
            "n_f": well['n_f'] if well['n_f'] else 100,
            "Sw_i": float(well['sw_i']) if well['sw_i'] else 30,
            "cr": well['cr'] if well['cr'] else 4,
            "gamma_m": well['gamma_m'] if well['gamma_m'] else 0,
            "gamma_f": well['gamma_f'] if well['gamma_f'] else 0,
            "salinity": well['salinity'] if well['salinity'] else 0,
            "country": well['country'],
            "state": well['state'],
            "county": well['county'],
            "sub_field": well['sub_field'],
            "reservoir": well['reservoir'],
            "formation": well['formation'],
            "surf_lat": well['surf_lat'],
            "surf_long": well['surf_long'],
            "bothole_lat": well['bothole_lat'],
            "bothole_long": well['bothole_long'],
            "operator": well['operator'],
            "pad_name": well['pad_name'],
            "reserve_classification": well['reserve_classification'],
            "fluid_pumped": float(well['fluid_pumped']) if pd.notna(well['fluid_pumped']) else None,  # Convert from gallons to STB * 0.0238095238
            "prop_pumped": float(well['prop_pumped']) if pd.notna(well['prop_pumped']) else None,
            "stages": int(well['stages']) if pd.notna(well['stages']) else None,
            "clusters": int(well['clusters']) if  pd.notna(well['clusters']) else None,
            "spacing": well['spacing'],
            "bounded": well['bounded'].lower() if well['bounded'] else 'bounded',
            "vintage": int(well['vintage']) if  pd.notna(well['vintage']) else None,
            "true_vertical_depth": float(well['true_vertical_depth']) if pd.notna(well['true_vertical_depth']) else None,
            "well_trajectory": well['well_trajectory'].lower() if well['well_trajectory'] else 'horizontal',
            "spud_date": well['spud_date'].strftime("%Y-%m-%d") if pd.notna(well['spud_date']) else None,
            "rig_release_date": well['rig_release_date'].strftime("%Y-%m-%d") if pd.notna(well['rig_release_date']) else None,
            "first_prod_date": well['first_prod_date'].strftime("%Y-%m-%d") if pd.notna(well['first_prod_date']) else None,
            "completion_date": well['completion_date'].strftime("%Y-%m-%d") if pd.notna(well['completion_date']) else None,
        }  
        if any(well["name"] == wellname for well in whitson_wells) and SYNC_WELLHEADER_DATA:
            print(f"Well with name {wellname} already exists, but we're updating the wellheader data.")
            well_id = whitson_connection.get_well_id_by_wellname(whitson_wells, wellname)
            well_info['id'] = well_id
            del well_info['project_id']
            del well_info['name']
            whitson_connection.edit_well_info(payload=[well_info])
            well_info = {}
        else:
            whitson_connection.create_well(payload=well_info)

# ----------------------------------------------------------------------------------------------------------------
# 4. Production Data
# ----------------------------------------------------------------------------------------------------------------

whitson_wells = whitson_connection.get_wells_from_projects(PROJECT_ID_LIST, 200)
uwi_id_dict = {item['uwi_api']: item['id'] for item in whitson_wells}

# Fetch daily production data from relevant Snowflake table
last_45_string = "where date >= DATEADD(DAY,-45, GETDATE())" if OVERWRITE_LAST_45_DAY_BOOL else ""
snowflake_query = f"""SELECT * FROM YOUR_DB_NAME.PUBLIC.PRODUCTION_DATA {last_45_string}"""
snowflake_prod_df = whitson_connection.snowflake_table_to_dataframe(snowflake_connection, snowflake_query)

# Bulk upload production data from Snowflake to whitson+
chunk_size = 25000 
num_chunks = len(snowflake_prod_df) // chunk_size + (1 if len(snowflake_prod_df) % chunk_size != 0 else 0)

for i in range(num_chunks):
    chunk_df = snowflake_prod_df[i * chunk_size: (i+1) * chunk_size]
    chunk_df.loc[:, 'well_id'] = chunk_df['well_id'].map(uwi_id_dict)
    payload = whitson_connection.convert_dataframe_to_prod_payload(chunk_df, columns_to_drop = ['insert_date'])
    whitson_connection.bulk_upload_production_to_well({"production_data": payload})

# ----------------------------------------------------------------------------------------------------------------
# 5. Run BHP calculations
# ----------------------------------------------------------------------------------------------------------------

whitson_connection.run_bhp_calc_in_projects(PROJECT_ID_LIST)

Connect Snowflake to whitson+: Bottomhole Pressure Example

What does the aries.py file do?

This file shows an example of how one connect to a Snowflake database and a relevant whitson+ domain. The file uses a helper class WhitsonConnection in whitson_connect.py provided here.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
import whitson_connect

CLIENT = "your_domain_here" # This is the company suffix in your whitson urls ex. 'courses' in courses.whitson.com
CLIENT_ID = "your_client_id_here" # Available on request
CLIENT_SECRET = "your_client_secret" # Available on request

PROJECT_DICT = {
    ('PERMIAN'): 1,
}

PROJECT_ID_LIST = [value for value in PROJECT_DICT.values()]

whitson_connection = whitson_connect.WhitsonConnection(
    CLIENT, CLIENT_ID, CLIENT_SECRET
)

whitson_connection.access_token = whitson_connection.get_access_token_smart()

# ------------------------------------------------------------------------------------------------------------------
# SNOWFLAKE CONNECTION
# ------------------------------------------------------------------------------------------------------------------

snowflake_connection = whitson_connection.snowflake_connection(
    account='YOUR_ACCOUNT',             # example: 'abc12345.east-us-1.azure'
    user='YOUR_USERNAME',               # example: 'WHITSON_READ'
    password='YOUR_PASSWORD',           # example: 'password123!'
    database='YOUR_DB_NAME'             # example: 'ABCDEFG_123456_WHITSON'
)

whitson_wells = whitson_connection.get_wells_from_projects(PROJECT_ID_LIST, 200)
uwi_id_dict = {item['uwi_api']: item['id'] for item in whitson_wells}

# ------------------------------------------------------------------------------------------------------------------
# 2.1 Upload well deviation surveys
# ------------------------------------------------------------------------------------------------------------------

snowflake_query = """SELECT * FROM YOUR_DB_NAME.PUBLIC.DEVIATION_SURVEY_DATA"""
snowflake_dev_df = whitson_connection.snowflake_table_to_dataframe(snowflake_connection, snowflake_query)

payloads_by_well = {}

for index, row in snowflake_dev_df.iterrows():
    uwi_api = row['well_id']
    md_value = row['md']
    tvd_value = row['tvd']

    if uwi_api in payloads_by_well:
        payloads_by_well[uwi_api].append({"md": md_value, "tvd": tvd_value})
    else:
        payloads_by_well[uwi_api] = [{"md": md_value, "tvd": tvd_value}]

for uwi_api, payload in payloads_by_well.items():
    well_id = uwi_id_dict.get(uwi_api)
    if whitson_connection.is_default_deviation_survey(well_id):
        try:
            whitson_connection.edit_well_deviation_survey(well_id=well_id, payload=payload)
        except:
            print(f'{uwi_api} (well_id: {well_id}): Something is off with this well id. This is the payload: {payload}')
            continue
    else:
            print(f'{uwi_api} (well_id: {well_id}): Well deviation survey already uploaded for this well.')

# ------------------------------------------------------------------------------------------------------------------
# 2.2 Upload top and bottom perforations
# ------------------------------------------------------------------------------------------------------------------

snowflake_query = """SELECT * FROM YOUR_DB_NAME.PUBLIC.PERFORATIONS"""
snowflake_perf_df = whitson_connection.snowflake_table_to_dataframe(snowflake_connection, snowflake_query)

processed_wells = {}

for index, row in snowflake_perf_df.iterrows():
    uwi_api = row['uwi']
    top_md = row['depthtop']
    bottom_md = row['depthbtm']

    if uwi_api in processed_wells:
        continue

    payload = {
        "top_perforation_md": top_md,
        "bottom_perforation_md": bottom_md
    }

    well_id = uwi_id_dict.get(uwi_api)

    if whitson_connection.is_default_perforated_interval(well_id):
        try: 
            whitson_connection.edit_perf_interval(well_id, payload=payload)
        except:
            print(f'{uwi_api} (well_id: {well_id}): Something is off with this well id. This is the payload: {payload}')
    else: 
        print(f'{uwi_api} (well_id: {well_id}): Perforated interval is already uploaded for this well.')

    processed_wells[uwi_api] = True

# -------------------------------------------------------------------------------------------------------------------
# 2.3.1 Upload Casings First
# -------------------------------------------------------------------------------------------------------------------

snowflake_query = """SELECT * FROM YOUR_DB_NAME.PUBLIC.CASING"""
snowflake_casing_df = whitson_connection.snowflake_table_to_dataframe(snowflake_connection, snowflake_query)
well_payloads = {}

def _get_well_casing_data(uwi, group):
    """ Returns the casing data in relevant format. Returns None if not a production casing is found. """

    if 'Production' not in group['casingdes'].values:
        return None, None

    well_id = uwi_id_dict.get(uwi)
    top_md = 0
    bottom_md = whitson_connection.get_max_md_well_deviation_data(well_id)
    d_casing_inner = whitson_connection.round_to_significant_digits(group['szidnommincalc'].min() * 39.37) # in this example DB is in m, need to convert to inches
    use_from_date = group['dttmspud'].iloc[0].strftime('%Y-%m-%d')

    well_casing_data = [ {
        "d_casing_inner": d_casing_inner,  
        "pipe_number": 1,
        "k_casing": roughness,
        "bottom_md": bottom_md,
        "top_md": top_md,  
    }
    ]

    return well_casing_data, use_from_date

# Define some example values
last_valve_open = True 
calculate_from_gauge = False 
compute_through = "flowing_side" 
compute_static_down_to = "end_of_tubing"
roughness = 0.0006 # default in whitson+
well_data_casing = []

for uwi, group in snowflake_casing_df.groupby('uwi'):

    well_data_casing, use_from_date = _get_well_casing_data(uwi, group)

    if well_data_casing is not None: 

        well_payload = {
            "use_from_date": use_from_date,
            "flow_path": "casing",  # Placeholder, replace with actual logic if available
            "lift_method": "none",  # Placeholder, replace with actual logic if available
            "gas_lift_config": 'automatic',  # Placeholder, replace with actual logic if available
            "last_valve_open": last_valve_open,
            "gauge_depth": None,  # Placeholder, replace with actual logic if available
            "calculate_from_gauge": calculate_from_gauge,
            "compute_through": compute_through,
            "compute_static_down_to": compute_static_down_to,
            "well_data_casing": well_data_casing,
            "well_data_tubing": [],  # Placeholder, add logic to populate if needed
            "gas_lift_data": []  # Placeholder, add logic to populate if needed
        }

        # Append the payload to the well_payloads dictionary
        if uwi not in well_payloads:
            well_payloads[uwi] = []

        well_payloads[uwi].append(well_payload)

# -------------------------------------------------------------------------------------------------------------------
# # 2.3.2 Upload Tubings Second
# -------------------------------------------------------------------------------------------------------------------

snowflake_query = """SELECT * FROM YOUR_DB_NAME.PUBLIC.TUBING"""
snowflake_tubing_df = whitson_connection.snowflake_table_to_dataframe(snowflake_connection, snowflake_query)

# Iterate through each UWI
for uwi, tubing_group in snowflake_tubing_df.groupby('uwi'):

    for install_date, tubing in tubing_group.groupby('dttmspud'):

        tubing = tubing[tubing['compsubtyp'] == 'tubing']
        tubing_bottom_md = tubing['depthbtm'].max() * 3.28084 # Convert from meter to ft
        install_date = install_date.strftime('%Y-%m-%d')

        # Filter rows where 'compsubtyp' is 'tubing'
        df_tubing = tubing[tubing['compsubtyp'] == 'tubing']

        # Check if all values in 'szidnom' and 'szodnom' are the same
        szidnom_unique = df_tubing['szidnom'].nunique()
        szodnom_unique = df_tubing['szodnom'].nunique()

        if szidnom_unique != 1 or szodnom_unique != 1:
            print("Values of 'szidnom' and/or 'szodnom' are not the same for all tubing sizes.")

        # Select the most common value for tubing ID and OD 'szidnom' and 'szodnom'
        tubing_id = whitson_connection.round_to_significant_digits(df_tubing['szidnom'].mode().iloc[0] * 39.37) # in this example DB is in m, need to convert to inches
        tubing_od = whitson_connection.round_to_significant_digits(df_tubing['szodnom'].mode().iloc[0] * 39.37) # in this example DB is in m, need to convert to inches

        if tubing_od <= tubing_id: 
            print(f'WARNING! Tubing ID ({tubing_od}) > Tubing OD ({tubing_id}). Tubing OD will be set to tubing ID / 0.84 : {tubing_od/0.84}')
            tubing_od = whitson_connection.round_to_significant_digits(tubing_id/0.84) 

        well_data_tubing = [
            {
                "pipe_number": 1,
                "d_tubing_inner": tubing_id, 
                "d_tubing_outer": tubing_od,
                "k_tubing": roughness,
                "bottom_md": tubing_bottom_md, # Assembly_Btm_Md of last tubing string
            }
        ]

        well_data_casing = well_payloads.get(uwi, [{}])[0].get('well_data_casing')

        if well_data_casing == None:
            continue 

        glv_exists = bool(tubing['des'].str.contains('AGL|GL', case=False, na=False).any())
        lift_method = 'gas_lift' if glv_exists else 'none'

        # Define the well payload for this api_no12
        well_payload = {
            "use_from_date": install_date,
            "flow_path": "unknown", 
            "lift_method": lift_method, 
            "gas_lift_config": 'automatic', 
            "last_valve_open": last_valve_open,
            "gauge_depth": None,  
            "calculate_from_gauge": calculate_from_gauge,
            "compute_through": compute_through,
            "compute_static_down_to": compute_static_down_to,
            "well_data_casing": well_data_casing,
            "well_data_tubing": well_data_tubing,
            "gas_lift_data": []  
        }

        # Append the payload to the well_payloads dictionary
        if uwi not in well_payloads:
            well_payloads[uwi] = []

        well_payloads[uwi].append(well_payload)


for uwi, new_well_configurations in well_payloads.items():
        well_id = uwi_id_dict.get(uwi)
        existing_wellbore_data = whitson_connection.get_well_data(well_id)

        # Delete the default wellbore if the well has a default
        try:
            if existing_wellbore_data and whitson_connection.is_default_well_configuration(existing_wellbore_data): 
                well_data_id = existing_wellbore_data[0]['id']
                whitson_connection.delete_wellbore_config_by_well_data_id(well_data_id=well_data_id)

            # Check if there is a well config upload for this date already
            for new_well_configuration in new_well_configurations:
                if whitson_connection.is_wellbore_configuration_already_uploaded(new_well_configuration, existing_wellbore_data):
                    print('already uploaded for this date')
                    continue
                else:
                    try:
                        whitson_connection.upload_well_data_to_well(well_id, new_well_configurations)
                    except:
                        print(str(uwi) +': failed for this well. Here is the payload. ' + str(new_well_configuration))
        except:
            print('Something went wrong. Here is the api number ' + str(well_id) + 'and existing wellbore data ' + str(existing_wellbore_data)