Connecting to SQL Server with Python

About

James Alexander

  • Consultant with Leaf Software Solutions
  • 10 years experience writing C# / .NET applications - Mostly FinTech
  • Python Web Developer / DevOps since 2017
  • Frequent IndyPy attendee since 2014
In [ ]:
 
In [ ]:
 

Application Background

  • Parking Garage Customer Service Application
  • ~700 devices
  • ~15 CSRs
  • Hosted in Azure with App Service, Azure SQL, REDIS, and Azure SignalR

In [ ]:
 
In [ ]:
 

Problem

  • Irregular outages at Irregular times
  • Thundering Herd problem during device reconnects
In [ ]:
 
In [ ]:
 

Sample Loging Table Output

In [ ]:
 
In [1]:
import os
from datetime import datetime, timedelta, timezone
import pandas as pd
pd.set_option('display.max_rows', 1000)
pd.set_option('display.max_colwidth', 400)
import pymssql
import sqlalchemy
import pandas_profiling as pp
# import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
%matplotlib inline
In [3]:
from sqlalchemy import create_engine
from sqlalchemy.orm import Session

engine = create_engine(os.environ['NOTEBOOK_DB'])
In [ ]:
 
In [ ]:
 

SQL Connection String GOT-CHA

For Azure SQL Connections, the full username is <user>@<server> E.g.:

<protocol>://<user>:<pass>@<server>/<database>

mssql+pymssql://username@servername:password@servername.database.windows.net/database_name?charset=utf8

In [ ]:
 

Alternative ODBC Connection String

In [ ]:
server = os.environ['SERVER']
database = os.environ['DATABASE']
username = os.environ['USERNAME']
password = os.environ['PASSWORD']

driver= '{ODBC Driver 17 for SQL Server}'
cstr = f'DRIVER={driver};SERVER={server};PORT=1433;DATABASE={database};UID={username};PWD={password};Connection Timeout=30'
params = urllib.parse.quote_plus(cstr)
conn_str = 'mssql+pyodbc:///?odbc_connect={}'.format(params)
In [ ]:
 
In [50]:
def get_log_data():
    sql = f'''SELECT TOP 1000000
                      [Id],convert(varchar(50), [Date], 121) as LogDate,
                      [Date] as TestDate,
                      [Thread]
                      ,[Level],[Logger],[Event]
                      ,[Context],[Message],convert(varchar(50), [DeviceDate], 121) as DeviceDate
                      ,[Exception],[EndpointId],[CsrId]
                      ,[FacilityId],[CallId]
                  FROM [dbo].[Logging]
                  ORDER BY Id DESC'''

    return pd.read_sql(sql, engine)
In [ ]:
 
In [51]:
tmp = get_log_data()
tmp.head()
Out[51]:
Id LogDate TestDate Thread Level Logger Event Context Message DeviceDate Exception EndpointId CsrId FacilityId CallId
0 116510334 2019-11-19 01:53:26.8870000 +00:00 b'p\x878\xd9\x0f\x00\x00\x00\n\xab\x00\x00\x00\x00\x07\xe0' 80 40000 global_asax 3 (null) Search for endpoints with login: 121emdevice5 2019-11-19 01:53:26.8870000 +00:00 None NaN NaN NaN NaN
1 116510333 2019-11-19 01:53:26.5800000 +00:00 b'@\xaf\t\xd9\x0f\x00\x00\x00\n\xab\x00\x00\x00\x00\x07\xe0' 102 40000 global_asax 3 (null) Search for endpoints with login: 003018573728 2019-11-19 01:53:26.5800000 +00:00 None NaN NaN NaN NaN
2 116510332 2019-11-19 01:53:26.3000000 +00:00 b'\xc0\xf5\xde\xd8\x0f\x00\x00\x00\n\xab\x00\x00\x00\x00\x07\xe0' 140 40000 global_asax 3 (null) Search for endpoints with login: 003018570cd4 2019-11-19 01:53:26.3000000 +00:00 None NaN NaN NaN NaN
3 116510331 2019-11-19 01:53:26.2100000 +00:00 b' :\xd1\xd8\x0f\x00\x00\x00\n\xab\x00\x00\x00\x00\x07\xe0' 208 40000 global_asax 3 (null) Search for endpoints with login: 00301807cdec 2019-11-19 01:53:26.2100000 +00:00 None NaN NaN NaN NaN
4 116510330 2019-11-19 01:53:25.9600000 +00:00 b'\x80\x14\xab\xd8\x0f\x00\x00\x00\n\xab\x00\x00\x00\x00\x07\xe0' 126 40000 global_asax 3 (null) Search for endpoints with login: 0030183b64f6 2019-11-19 01:53:25.9600000 +00:00 None NaN NaN NaN NaN
In [52]:
tmp.dtypes
Out[52]:
Id              int64
LogDate        object
TestDate       object
Thread         object
Level           int64
Logger         object
Event           int64
Context        object
Message        object
DeviceDate     object
Exception      object
EndpointId    float64
CsrId         float64
FacilityId    float64
CallId        float64
dtype: object
In [ ]:
 
In [24]:
tmp.describe()
Out[24]:
Id Level Event EndpointId CsrId FacilityId CallId dayOfWeek hour
count 9.981240e+05 998124.000000 998124.000000 998124.000000 998124.000000 998124.000000 998124.0 998124.000000 998124.000000
mean 1.159234e+08 39725.444935 4.508255 177.453213 0.867205 2.663168 0.0 3.000493 11.513677
std 2.885934e+05 1647.621419 5.058757 736.588870 69.005271 56.942589 0.0 3.000001 6.739440
min 1.154236e+08 30000.000000 2.000000 0.000000 0.000000 0.000000 0.0 0.000000 0.000000
25% 1.156735e+08 40000.000000 3.000000 0.000000 0.000000 0.000000 0.0 0.000000 6.000000
50% 1.159235e+08 40000.000000 3.000000 0.000000 0.000000 0.000000 0.0 6.000000 12.000000
75% 1.161732e+08 40000.000000 3.000000 0.000000 0.000000 0.000000 0.0 6.000000 17.000000
max 1.164236e+08 70000.000000 25.000000 3864.000000 5781.000000 1376.000000 0.0 6.000000 23.000000
In [ ]:
 
In [26]:
pp.ProfileReport(tmp)
Out[26]:

Overview

Dataset info

Number of variables 18
Number of observations 998124
Total Missing (%) 0.0%
Total size in memory 137.1 MiB
Average record size in memory 144.0 B

Variables types

Numeric 8
Categorical 8
Boolean 1
Date 0
Text (Unique) 0
Rejected 1
Unsupported 0

Warnings

  • CallId has constant value 0 Rejected
  • CsrId is highly skewed (γ1 = 79.904) Skewed
  • CsrId has 997965 / 100.0% zeros Zeros
  • DeviceDate has a high cardinality: 972827 distinct values Warning
  • EndpointId has 942193 / 94.4% zeros Zeros
  • FacilityId is highly skewed (γ1 = 21.52) Skewed
  • FacilityId has 995926 / 99.8% zeros Zeros
  • LogDate has a high cardinality: 972827 distinct values Warning
  • Message has a high cardinality: 26957 distinct values Warning
  • Thread has a high cardinality: 276 distinct values Warning
  • hour has 33620 / 3.4% zeros Zeros

Variables

CallId
Constant

This variable is constant and should be ignored for analysis

Constant value 0

Context
Categorical

Distinct count 3
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
(null)
998121
ChangePassword
 
2
SendResetEmail
 
1
Value Count Frequency (%)  
(null) 998121 100.0%
 
ChangePassword 2 0.0%
 
SendResetEmail 1 0.0%
 

CsrId
Numeric

Distinct count 74
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 0.8672
Minimum 0
Maximum 5781
Zeros (%) 100.0%

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
Median 0
Q3 0
95-th percentile 0
Maximum 5781
Range 5781
Interquartile range 0

Descriptive statistics

Standard deviation 69.005
Coef of variation 79.572
Kurtosis 6401.8
Mean 0.8672
MAD 1.7341
Skewness 79.904
Sum 865580
Variance 4761.7
Memory size 7.6 MiB
Value Count Frequency (%)  
0.0 997965 100.0%
 
5236.0 11 0.0%
 
5735.0 11 0.0%
 
5234.0 7 0.0%
 
5466.0 6 0.0%
 
5697.0 5 0.0%
 
5742.0 4 0.0%
 
5575.0 4 0.0%
 
5733.0 4 0.0%
 
5521.0 4 0.0%
 
Other values (64) 103 0.0%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 997965 100.0%
 
1011.0 1 0.0%
 
3101.0 1 0.0%
 
4108.0 3 0.0%
 
4120.0 2 0.0%
 

Maximum 5 values

Value Count Frequency (%)  
5765.0 2 0.0%
 
5771.0 3 0.0%
 
5774.0 3 0.0%
 
5779.0 3 0.0%
 
5781.0 1 0.0%
 

DeviceDate
Categorical

Distinct count 972827
Unique (%) 97.5%
Missing (%) 0.0%
Missing (n) 0
2019-11-17 01:00:38.740000+00:00
 
13
2019-11-17 22:06:41.370000+00:00
 
12
2019-11-17 01:16:38.850000+00:00
 
12
Other values (972824)
998087
Value Count Frequency (%)  
2019-11-17 01:00:38.740000+00:00 13 0.0%
 
2019-11-17 22:06:41.370000+00:00 12 0.0%
 
2019-11-17 01:16:38.850000+00:00 12 0.0%
 
2019-11-18 18:42:42.637000+00:00 12 0.0%
 
2019-11-17 18:00:39.743000+00:00 12 0.0%
 
2019-11-17 11:16:40.843000+00:00 12 0.0%
 
2019-11-18 18:30:39.190000+00:00 12 0.0%
 
2019-11-17 13:36:41.353000+00:00 12 0.0%
 
2019-11-17 02:56:36.060000+00:00 11 0.0%
 
2019-11-18 17:44:34.473000+00:00 11 0.0%
 
Other values (972817) 998005 100.0%
 

EndpointId
Numeric

Distinct count 609
Unique (%) 0.1%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 177.45
Minimum 0
Maximum 3864
Zeros (%) 94.4%

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
Median 0
Q3 0
95-th percentile 2551
Maximum 3864
Range 3864
Interquartile range 0

Descriptive statistics

Standard deviation 736.59
Coef of variation 4.1509
Kurtosis 14.114
Mean 177.45
MAD 335.02
Skewness 3.982
Sum 177120000
Variance 542560
Memory size 7.6 MiB
Value Count Frequency (%)  
0.0 942193 94.4%
 
3358.0 15565 1.6%
 
3400.0 693 0.1%
 
3473.0 638 0.1%
 
3025.0 607 0.1%
 
3832.0 538 0.1%
 
3253.0 507 0.1%
 
2570.0 494 0.0%
 
3684.0 485 0.0%
 
3672.0 481 0.0%
 
Other values (599) 35923 3.6%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 942193 94.4%
 
48.0 8 0.0%
 
1051.0 37 0.0%
 
1052.0 33 0.0%
 
1060.0 16 0.0%
 

Maximum 5 values

Value Count Frequency (%)  
3855.0 13 0.0%
 
3856.0 17 0.0%
 
3862.0 52 0.0%
 
3863.0 37 0.0%
 
3864.0 9 0.0%
 

Event
Numeric

Distinct count 13
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 4.5083
Minimum 2
Maximum 25
Zeros (%) 0.0%

Quantile statistics

Minimum 2
5-th percentile 3
Q1 3
Median 3
Q3 3
95-th percentile 23
Maximum 25
Range 23
Interquartile range 0

Descriptive statistics

Standard deviation 5.0588
Coef of variation 1.1221
Kurtosis 8.1112
Mean 4.5083
MAD 2.7637
Skewness 3.1404
Sum 4499798
Variance 25.591
Memory size 7.6 MiB
Value Count Frequency (%)  
3 914471 91.6%
 
23 53733 5.4%
 
17 25271 2.5%
 
24 2198 0.2%
 
16 1995 0.2%
 
11 248 0.0%
 
19 169 0.0%
 
10 13 0.0%
 
4 11 0.0%
 
8 8 0.0%
 
Other values (3) 7 0.0%
 

Minimum 5 values

Value Count Frequency (%)  
2 3 0.0%
 
3 914471 91.6%
 
4 11 0.0%
 
8 8 0.0%
 
10 13 0.0%
 

Maximum 5 values

Value Count Frequency (%)  
17 25271 2.5%
 
19 169 0.0%
 
23 53733 5.4%
 
24 2198 0.2%
 
25 1 0.0%
 

Exception
Categorical

Distinct count 12
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
0
998089
System.Web.Http.HttpResponseException: Processing of the HTTP request resulted in an exception. Please see the HTTP response returned by the 'Response' property of this exception for details. at RPM2Way.Controllers.CallsController.GetCall(Int32 id) in C:\agent\_work\1\s\RPM2Way\Controllers\CallsController.cs:line 83
 
12
System.Web.Http.HttpResponseException: Processing of the HTTP request resulted in an exception. Please see the HTTP response returned by the 'Response' property of this exception for details. at RPM2Way.Controllers.LoginController.PostLogin(Login credentials) in C:\agent\_work\1\s\RPM2Way\Controllers\LoginController.cs:line 64
 
10
Other values (9)
 
13
Value Count Frequency (%)  
0 998089 100.0%
 
System.Web.Http.HttpResponseException: Processing of the HTTP request resulted in an exception. Please see the HTTP response returned by the 'Response' property of this exception for details. at RPM2Way.Controllers.CallsController.GetCall(Int32 id) in C:\agent\_work\1\s\RPM2Way\Controllers\CallsController.cs:line 83 12 0.0%
 
System.Web.Http.HttpResponseException: Processing of the HTTP request resulted in an exception. Please see the HTTP response returned by the 'Response' property of this exception for details. at RPM2Way.Controllers.LoginController.PostLogin(Login credentials) in C:\agent\_work\1\s\RPM2Way\Controllers\LoginController.cs:line 64 10 0.0%
 
System.FormatException: Memory Collision--abort save at Parker.Business.FacilityManager.MapDtoToData(IFacility dto, Facility data) in C:\agent\_work\1\s\Parker.Business\Manager\FacilityManager.cs:line 180 at Parker.Business.BaseManager`3.BaseUpdate(DtoType dto) in C:\agent\_work\1\s\Parker.Business\Manager\BaseManager.cs:line 49 at RPM2Way.Controllers.FacilitiesController.PutFacility(Int32 id, Facility facility) in C:\agent\_work\1\s\RPM2Way\Controllers\FacilitiesController.cs:line 126 5 0.0%
 
System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.WebException: The underlying connection was closed: A connection that was expected to be kept alive was closed by the server. at System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult) at System.Net.Http.HttpClientHandler.GetResponseCallback(IAsyncResult ar) --- End of inner exception stack trace --- at Twilio.Http.SystemNetHttpClient.MakeRequest(Request request) at Twilio.Clients.TwilioRestClient.Request(Request request) 1 0.0%
 
Twilio.Exceptions.ApiConnectionException: Connection Error: DELETEhttps://api.twilio.com/2010-04-01/Accounts/ACe12e496cc87d50f774696d8c92585cec/Recordings/RE7d73d3f8b85473b3ae4b6c9c36e4afa5.json ---> System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.WebException: The underlying connection was closed: A connection that was expected to be kept alive was closed by the server. at System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult) at System.Net.Http.HttpClientHandler.GetResponseCallback(IAsyncResult ar) --- End of inner exception stack trace --- at Twilio.Http.SystemNetHttpClient.MakeRequest(Request request) at Twilio.Clients.TwilioRestClient.Request(Request request) --- End of inner exception stack trace --- at Twilio.Clients.TwilioRestClient.Request(Request request) at Twilio.Rest.Api.V2010.Account.RecordingResource.Delete(DeleteRecordingOptions options, ITwilioRestClient client) at RPM2Way.Controllers.TwilioRecordCallbackController.PostTwilioRecordCallback() in C:\agent\_work\1\s\RPM2Way\Controllers\TwilioRecordCallbackController.cs:line 39 1 0.0%
 
System.Data.Entity.Core.UpdateException: An error occurred while updating the entries. See the inner exception for details. ---> System.Data.SqlClient.SqlException: Violation of UNIQUE KEY constraint 'UniqueEntry'. Cannot insert duplicate key in object 'dbo.FacilityCallCenters'. The duplicate key value is (1121, 1020, , , ). The statement has been terminated. at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose) at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady) at System.Data.SqlClient.SqlDataReader.TryConsumeMetaData() at System.Data.SqlClient.SqlDataReader.get_MetaData() at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted) at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest) at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry) at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method) at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method) at System.Data.Entity.Infrastructure.Interception.InternalDispatcher`1.Dispatch[TTarget,TInterceptionContext,TResult](TTarget target, Func`3 operation, TInterceptionContext interceptionContext, Action`3 executing, Action`3 executed) at System.Data.Entity.Infrastructure.Interception.DbCommandDispatcher.Reader(DbCommand command, DbCommandInterceptionContext interceptionContext) at System.Data.Entity.Core.Mapping.Update.Internal.DynamicUpdateCommand.Execute(Dictionary`2 identifierValues, List`1 generatedValues) at System.Data.Entity.Core.Mapping.Update.Internal.UpdateTranslator.Update() --- End of inner exception stack trace --- at System.Data.Entity.Core.Mapping.Update.Internal.UpdateTranslator.Update() at System.Data.Entity.Core.Objects.ObjectContext.ExecuteInTransaction[T](Func`1 func, IDbExecutionStrategy executionStrategy, Boolean startLocalTransaction, Boolean releaseConnectionOnSuccess) at System.Data.Entity.Core.Objects.ObjectContext.SaveChangesToStore(SaveOptions options, IDbExecutionStrategy executionStrategy, Boolean startLocalTransaction) at System.Data.Entity.SqlServer.DefaultSqlExecutionStrategy.Execute[TResult](Func`1 operation) at System.Data.Entity.Core.Objects.ObjectContext.SaveChangesInternal(SaveOptions options, Boolean executeInExistingTransaction) at System.Data.Entity.Internal.InternalContext.SaveChanges() 1 0.0%
 
System.Threading.Tasks.TaskCanceledException: A task was canceled. at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Net.Http.HttpClient.d__58.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Parker.Business.CallManager.d__11.MoveNext() in C:\agent\_work\1\s\Parker.Business\Manager\CallManager.cs:line 264 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at RPM2Way.Controllers.CallRecordingController.d__5.MoveNext() in C:\agent\_work\1\s\RPM2Way\Controllers\CallRecordingController.cs:line 139 1 0.0%
 
System.Net.WebException: The underlying connection was closed: A connection that was expected to be kept alive was closed by the server. at System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult) at System.Net.Http.HttpClientHandler.GetResponseCallback(IAsyncResult ar) 1 0.0%
 
System.Web.Http.HttpResponseException: Processing of the HTTP request resulted in an exception. Please see the HTTP response returned by the 'Response' property of this exception for details. at RPM2Way.Controllers.LoginController.PostLogin(Login credentials) in C:\agent\_work\1\s\RPM2Way\Controllers\LoginController.cs:line 126 1 0.0%
 
Other values (2) 2 0.0%
 

FacilityId
Numeric

Distinct count 138
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 2.6632
Minimum 0
Maximum 1376
Zeros (%) 99.8%

Quantile statistics

Minimum 0
5-th percentile 0
Q1 0
Median 0
Q3 0
95-th percentile 0
Maximum 1376
Range 1376
Interquartile range 0

Descriptive statistics

Standard deviation 56.943
Coef of variation 21.382
Kurtosis 464.92
Mean 2.6632
MAD 5.3146
Skewness 21.52
Sum 2658200
Variance 3242.5
Memory size 7.6 MiB
Value Count Frequency (%)  
0.0 995926 99.8%
 
1126.0 242 0.0%
 
1272.0 147 0.0%
 
1042.0 107 0.0%
 
1352.0 103 0.0%
 
1351.0 74 0.0%
 
1311.0 70 0.0%
 
1261.0 68 0.0%
 
1376.0 59 0.0%
 
1030.0 58 0.0%
 
Other values (128) 1270 0.1%
 

Minimum 5 values

Value Count Frequency (%)  
0.0 995926 99.8%
 
1029.0 20 0.0%
 
1030.0 58 0.0%
 
1031.0 11 0.0%
 
1032.0 9 0.0%
 

Maximum 5 values

Value Count Frequency (%)  
1368.0 23 0.0%
 
1369.0 48 0.0%
 
1372.0 9 0.0%
 
1375.0 26 0.0%
 
1376.0 59 0.0%
 

Id
Numeric

Distinct count 998124
Unique (%) 100.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 115920000
Minimum 115423564
Maximum 116423564
Zeros (%) 0.0%

Quantile statistics

Minimum 115423564
5-th percentile 115470000
Q1 115670000
Median 115920000
Q3 116170000
95-th percentile 116370000
Maximum 116423564
Range 1000000
Interquartile range 499740

Descriptive statistics

Standard deviation 288590
Coef of variation 0.0024895
Kurtosis -1.1995
Mean 115920000
MAD 249920
Skewness 0.0003236
Sum 115705974036820
Variance 83286000000
Memory size 7.6 MiB
Value Count Frequency (%)  
115425240 1 0.0%
 
115689980 1 0.0%
 
116059144 1 0.0%
 
116036615 1 0.0%
 
116038662 1 0.0%
 
116032517 1 0.0%
 
116034564 1 0.0%
 
116044803 1 0.0%
 
116046850 1 0.0%
 
116040705 1 0.0%
 
Other values (998114) 998114 100.0%
 

Minimum 5 values

Value Count Frequency (%)  
115423564 1 0.0%
 
115423565 1 0.0%
 
115423566 1 0.0%
 
115423567 1 0.0%
 
115423568 1 0.0%
 

Maximum 5 values

Value Count Frequency (%)  
116423560 1 0.0%
 
116423561 1 0.0%
 
116423562 1 0.0%
 
116423563 1 0.0%
 
116423564 1 0.0%
 

Level
Numeric

Distinct count 4
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 39725
Minimum 30000
Maximum 70000
Zeros (%) 0.0%

Quantile statistics

Minimum 30000
5-th percentile 40000
Q1 40000
Median 40000
Q3 40000
95-th percentile 40000
Maximum 70000
Range 40000
Interquartile range 0

Descriptive statistics

Standard deviation 1647.6
Coef of variation 0.041475
Kurtosis 34.558
Mean 39725
MAD 536.24
Skewness -5.4404
Sum 39650920000
Variance 2714700
Memory size 7.6 MiB
Value Count Frequency (%)  
40000 970568 97.2%
 
30000 27517 2.8%
 
70000 35 0.0%
 
60000 4 0.0%
 

Minimum 5 values

Value Count Frequency (%)  
30000 27517 2.8%
 
40000 970568 97.2%
 
60000 4 0.0%
 
70000 35 0.0%
 

Maximum 5 values

Value Count Frequency (%)  
30000 27517 2.8%
 
40000 970568 97.2%
 
60000 4 0.0%
 
70000 35 0.0%
 

LogDate
Categorical

Distinct count 972827
Unique (%) 97.5%
Missing (%) 0.0%
Missing (n) 0
2019-11-17 01:00:38.740000+00:00
 
13
2019-11-17 22:06:41.370000+00:00
 
12
2019-11-17 01:16:38.850000+00:00
 
12
Other values (972824)
998087
Value Count Frequency (%)  
2019-11-17 01:00:38.740000+00:00 13 0.0%
 
2019-11-17 22:06:41.370000+00:00 12 0.0%
 
2019-11-17 01:16:38.850000+00:00 12 0.0%
 
2019-11-18 18:42:42.637000+00:00 12 0.0%
 
2019-11-17 18:00:39.743000+00:00 12 0.0%
 
2019-11-17 11:16:40.843000+00:00 12 0.0%
 
2019-11-18 18:30:39.190000+00:00 12 0.0%
 
2019-11-17 13:36:41.353000+00:00 12 0.0%
 
2019-11-17 02:56:36.060000+00:00 11 0.0%
 
2019-11-18 17:44:34.473000+00:00 11 0.0%
 
Other values (972817) 998005 100.0%
 

Logger
Categorical

Distinct count 13
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
global_asax
939669
EndpointLogsController
 
53733
LoginController
 
2379
Other values (10)
 
2343
Value Count Frequency (%)  
global_asax 939669 94.1%
 
EndpointLogsController 53733 5.4%
 
LoginController 2379 0.2%
 
TwilioWaitController 1677 0.2%
 
TwilioSipAnswerController 318 0.0%
 
GateVendLogsController 248 0.0%
 
TwilioQueueActionController 70 0.0%
 
CallsController 12 0.0%
 
FacilitiesController 8 0.0%
 
UserManager 3 0.0%
 
Other values (3) 7 0.0%
 

Message
Categorical

Distinct count 26957
Unique (%) 2.7%
Missing (%) 0.0%
Missing (n) 0
NODE readbutton cmd failed
 
14025
CALL Call button push detected
 
3368
ANSWER ANSWER message received
 
3251
Other values (26954)
977480
Value Count Frequency (%)  
NODE readbutton cmd failed 14025 1.4%
 
CALL Call button push detected 3368 0.3%
 
ANSWER ANSWER message received 3251 0.3%
 
ENDCALL ENDCALL message received 3084 0.3%
 
RINGER Begin ringing 2859 0.3%
 
StartWebRTC starting 2799 0.3%
 
StartWebRTC connection initialized 2799 0.3%
 
START Endpoint Page Loaded 2191 0.2%
 
SIGNALR signalR start 2098 0.2%
 
EVENT Call button pressed 1958 0.2%
 
Other values (26947) 959692 96.1%
 

Thread
Categorical

Distinct count 276
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
106
 
9626
51
 
9512
80
 
8535
Other values (273)
970451
Value Count Frequency (%)  
106 9626 1.0%
 
51 9512 1.0%
 
80 8535 0.9%
 
68 8419 0.8%
 
16 8409 0.8%
 
15 8268 0.8%
 
21 8000 0.8%
 
107 7869 0.8%
 
114 7858 0.8%
 
13 7776 0.8%
 
Other values (266) 913852 91.6%
 

date
Categorical

Distinct count 2
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
2019-11-17
499144
2019-11-18
498980
Value Count Frequency (%)  
2019-11-17 499144 50.0%
 
2019-11-18 498980 50.0%
 

dayOfWeek
Boolean

Distinct count 2
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Mean 3.0005
6
499144
0
498980
Value Count Frequency (%)  
6 499144 50.0%
 
0 498980 50.0%
 

hour
Numeric

Distinct count 24
Unique (%) 0.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 11.514
Minimum 0
Maximum 23
Zeros (%) 3.4%

Quantile statistics

Minimum 0
5-th percentile 1
Q1 6
Median 12
Q3 17
95-th percentile 21
Maximum 23
Range 23
Interquartile range 11

Descriptive statistics

Standard deviation 6.7394
Coef of variation 0.58534
Kurtosis -1.2247
Mean 11.514
MAD 5.8672
Skewness -0.063861
Sum 11492077
Variance 45.42
Memory size 7.6 MiB
Value Count Frequency (%)  
20 53127 5.3%
 
21 51881 5.2%
 
19 49354 4.9%
 
18 45681 4.6%
 
17 45266 4.5%
 
16 44641 4.5%
 
15 43977 4.4%
 
14 43491 4.4%
 
2 43204 4.3%
 
1 42831 4.3%
 
Other values (14) 534671 53.6%
 

Minimum 5 values

Value Count Frequency (%)  
0 33620 3.4%
 
1 42831 4.3%
 
2 43204 4.3%
 
3 42418 4.2%
 
4 42046 4.2%
 

Maximum 5 values

Value Count Frequency (%)  
19 49354 4.9%
 
20 53127 5.3%
 
21 51881 5.2%
 
22 26823 2.7%
 
23 21287 2.1%
 

index
Numeric

Distinct count 998124
Unique (%) 100.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 500120
Minimum 0
Maximum 999999
Zeros (%) 0.0%

Quantile statistics

Minimum 0
5-th percentile 50097
Q1 250350
Median 500050
Q3 750090
95-th percentile 949980
Maximum 999999
Range 999999
Interquartile range 499740

Descriptive statistics

Standard deviation 288590
Coef of variation 0.57705
Kurtosis -1.1995
Mean 500120
MAD 249920
Skewness -0.00032551
Sum 499179194341
Variance 83286000000
Memory size 7.6 MiB
Value Count Frequency (%)  
1 2047 0.2%
 
1 823495 82.5%
 
1 897251 89.9%
 
1 899298 90.1%
 
1 893153 89.5%
 
1 895200 89.7%
 
1 807135 80.9%
 
1 809182 81.1%
 
1 803037 80.5%
 
1 805084 80.7%
 
Other values (998114) 499171559459 50010976.5%
 

Minimum 5 values

Value Count Frequency (%)  
1 2047 0.2%
 
1 823495 82.5%
 
1 897251 89.9%
 
1 899298 90.1%
 
1 893153 89.5%
 

Maximum 5 values

Value Count Frequency (%)  
1 374300 37.5%
 
1 364059 36.5%
 
1 362010 36.3%
 
1 368153 36.9%
 
1 0 0.0%
 

Correlations

Sample

Id LogDate Thread Level Logger Event Context Message DeviceDate Exception EndpointId CsrId FacilityId CallId dayOfWeek date hour
0 116423564 2019-11-18 22:13:21.053000+00:00 146 40000 global_asax 3 (null) Search for endpoints with login: b827eb70be91 2019-11-18 22:13:21.053000+00:00 0 0.0 0.0 0.0 0.0 0 2019-11-18 22
1 116423563 2019-11-18 22:13:21.033000+00:00 219 40000 global_asax 3 (null) Search for endpoints with login: b827eba00de7 2019-11-18 22:13:21.033000+00:00 0 0.0 0.0 0.0 0.0 0 2019-11-18 22
2 116423562 2019-11-18 22:13:20.943000+00:00 82 40000 global_asax 3 (null) Search for endpoints with login: 0030184f850c 2019-11-18 22:13:20.943000+00:00 0 0.0 0.0 0.0 0.0 0 2019-11-18 22
3 116423561 2019-11-18 22:13:19.837000+00:00 5 40000 global_asax 3 (null) Search for endpoints with login: 003018570d3e 2019-11-18 22:13:19.837000+00:00 0 0.0 0.0 0.0 0.0 0 2019-11-18 22
4 116423560 2019-11-18 22:13:19.810000+00:00 41 40000 global_asax 3 (null) Search for endpoints with login: b827eb1ce094 2019-11-18 22:13:19.810000+00:00 0 0.0 0.0 0.0 0.0 0 2019-11-18 22
In [ ]:
 
In [ ]:
 
In [ ]:
 

SQL Server Date Workaround

tl;dr - Query as a string, parse date in Python

https://pymssql.readthedocs.io/en/stable/freetds_and_dates.html

In [ ]:
 
In [27]:
def clean_log_data(df):
    df = df.fillna(0)
    df['LogDate'] = pd.to_datetime(df['LogDate'])
    df['DeviceDate'] = pd.to_datetime(df['DeviceDate'])
    df = df[df['Message'] != ' ']

    # Return the day of the week as an integer, where Monday is 0 and Sunday is 6.
    df['dayOfWeek'] = df['LogDate'].apply(lambda x: x.date().weekday())
    df['date'] = df['LogDate'].apply(lambda x: x.strftime('%Y-%m-%d'))
    df['hour'] = df['LogDate'].dt.hour

    df = df.loc[:, ~df.columns.str.contains('^Unnamed')]

    return df
In [5]:
def get_endpoints():
    sql = f'''
    SELECT
        [EndpointId],[FacilityId],[EndpointName],[EndpointType],[Login],[Password]
        ,[HasMap],[HasPhoto],[InvertCamera],[AudioBandwidth],[VideoBandwidth]
        ,[SupervisorPhone],[AudioInputGain],[AudioOutputGain],[Hardware]
        ,[InvertDisplay],[Manufacturer],[Model],[InboundPhone]
        ,[GateVendDigits],[Version],[ExternalCameraURL],[IdlePageText]
        ,[UseStaticImage],[BrainSerialNumber],[DisplaySerialNumber],[InServiceDate]
        ,[ExternalMonitoring],[Notes],[AllowGateLock],[Deleted]
        ,[InboundPhoneType]
      FROM [dbo].[Endpoints]
    '''

    return pd.read_sql(sql, engine)
In [ ]:
 
In [6]:
def get_calls_from_hour_ago(hourago):
    sql = """
    select 
        convert(varchar, start, 121) as Start,
        convert(varchar, [end], 121) as [endDate],
        CallId, EndpointId, FacilityName, EndpointName, SecondsToAnswer
    from Calls
    where Start >= '{0}'
    order by CallId desc;""".format(hourago.strftime("%Y-%m-%d %H:%M:%S"))

    calls = pd.read_sql(sql, engine)
    calls['Start'] = pd.to_datetime(calls['Start'])
    calls['endDate'] = pd.to_datetime(calls['endDate'])
    time = hourago + timedelta(hours=1)
    return calls.loc[calls['Start'] < time].loc[calls['Start'] > hourago]
In [7]:
hour_ago = datetime(2019, 6, 3, 3, 0, 0)
hour_ago = hour_ago.replace(tzinfo=timezone.utc)

calls = get_calls_from_hour_ago(hour_ago)
In [8]:
callIds = calls.loc[calls['SecondsToAnswer'] == 9999]['CallId'].unique()
for callId in callIds:
    c = calls.loc[calls['CallId'] == callId]
    print(f'Missed Facility: {c["FacilityName"].values} -> Endpoint: {c["EndpointName"].values}')
Missed Facility: ['945 Market Street'] -> Endpoint: ['945 Market - Lane 1']
Missed Facility: ['945 Market Street'] -> Endpoint: ['945 Market - Lane 1']
In [ ]:
 
In [9]:
missed_calls = calls.loc[calls['SecondsToAnswer'] == 9999]

# missed_calls
for i in range(len(missed_calls)):
    print(f'Missed Facility: {missed_calls.loc[i:,"FacilityName"]}')
Missed Facility: 427740    945 Market Street
427746    945 Market Street
Name: FacilityName, dtype: object
Missed Facility: 427740    945 Market Street
427746    945 Market Street
Name: FacilityName, dtype: object
In [ ]:
 
In [ ]:
 
In [11]:
full_message_types = [
    'SIGNALR Error: Error: WebSocket closed.',
    'START Endpoint Page Loaded','SIGNALR signalR start',
    'ANSWER ANSWER message received',
    'CALL Call button push detected','AJAX POST EndpointLogs failed',
    'ENDCALL ENDCALL message received','getUserMedia local audio/video stream aquired',
    'SIGNALR Disconnected','SignalR SignalR Connection Drop',
    'Initializing Media Streams Start','RINGER Begin ringing',
    'StartWebRTC starting','StartWebRTC connection initialized',
    'version: 52','EVENT Call button pressed','SIGNALR Connection slow',
    'HOLDCALL Placing on hold', 'NODE readbutton cmd failed', 'Exception in SendWebRTCMessage',
    'OPENGATE OPENGATE command received', 'SIGNALR Error: Error: Failed to ping server.',
    'AJAX SUCCESS get endpoint success: 05-141,v1.5.6_16.04.1 LTS,v1.5.5,\n\nOK\n'
]

starts_with_types = [
    'Initializing endpoint status complete'
    ,'Endpoint login OK, Login'
    ,'AJAX ERROR Open Gate Error: OPENGAT'
    ,'AJAX SUCCESS get endpoint success'
    ,'GetGateVendLogs took'
    ,'GetGateVendLogsExport took:'
    ,'SIGNALR Error: Error: The client has been inactive since'
    ,'User login OK. Login'
    ,'Error'
    ,'Exception'
    ,'User login failed. Login'
    ,'Diagnostic Version info'
]
In [ ]:
 
In [ ]:
 
In [12]:
def alert_threshold(df, groupby=['date', 'EndpointId']):
    thresholds = []
    for m in full_message_types:
        msg = df.loc[df['Message'] == m]
        msg = msg.groupby(groupby)
        results = pd.DataFrame()
        results['counts'] = msg.count().sort_values('Id', ascending=False)['Id']
        if results.empty:
            continue
        std_dev = results.std()['counts']
        avg = results.mean()['counts']
        thresholds.append((m, int(avg + 2 * std_dev), avg))
    return thresholds
In [14]:
def get_failing_endpoints(log_df, day, thresholds, endpoints_df):
    endpoints = log_df['EndpointId'].unique()
    today_df = log_df.loc[log_df['date'] == day]

    time = datetime.utcnow()
    hourago = time + timedelta(hours=-1)
    hourago = hourago.replace(tzinfo=timezone.utc)
    calls = get_calls_from_hour_ago(hourago)

    failing_endpoints = {}
    for e in endpoints:
        endpdf = today_df.loc[today_df['EndpointId'] == e]
        call_count = calls.loc[calls['EndpointId'] == e].count().values[0]
        for t in thresholds:
            msgdf = endpdf.loc[endpdf['Message'] == t[0]]
            if msgdf.empty:
                continue

            hour_ago_count = msgdf.loc[msgdf['LogDate'] < hourago].count().values[0]
            now_count = msgdf.count().values[0]
            last_hour_count = now_count - hour_ago_count
            if last_hour_count > t[1] and call_count < t[1]:
                endpoint_name = endpoints_df.loc[endpoints_df['EndpointId'] == e]['EndpointName'].values[0]
                if e not in failing_endpoints.keys():
                    failing_endpoints[e] = {'name': endpoint_name, 'messages': [], 'total_delta': 0}
                failing_endpoints[e]['messages'].append( {
                    'message': t[0],
                    'expected': t[1],
                    'actual': last_hour_count,
                    'delta': last_hour_count - t[1]
                })
                failing_endpoints[e]['total_delta'] += last_hour_count - t[1]
                failing_endpoints[e]['call_count'] = call_count
    return failing_endpoints
In [15]:
log_df = get_log_data()
log_df = clean_log_data(log_df)
endpoints_df = get_endpoints()
thresholds = alert_threshold(log_df, ['date', 'EndpointId', 'hour'])
In [ ]:
 
In [ ]:
 
In [46]:
endpoints = log_df['EndpointId'].unique()
today = datetime.now().strftime('%Y-%m-%d')
today_df = log_df.loc[log_df['date'] == today]
thresholds = alert_threshold(log_df, ['date', 'EndpointId', 'hour']) # List of Tuples: [('Message', ExpectedCount, Average),...]

failing_endpoints = get_failing_endpoints(log_df, today, thresholds, endpoints_df)

sorted_endpoints = sorted(
    failing_endpoints.keys(), 
    key=lambda x: failing_endpoints[x]['total_delta'], 
    reverse=True)
for e in sorted_endpoints:
    if failing_endpoints[e]['total_delta'] < 25:
        print('Endpoint ID: %s - Total delta less than 30, move on, delta: %s, calls: %s', e, failing_endpoints[e]['total_delta'], failing_endpoints[e]['call_count'])
        continue
    msg = f'\nId: {e} - {failing_endpoints[e]["name"]} total delta: {failing_endpoints[e]["total_delta"]:.0f} Calls: {failing_endpoints[e]["call_count"]}'
    for m in failing_endpoints[e]['messages']:
          msg += f'\n\tMesage: {m["message"]} - expected: {m["expected"]} - actual: {m["actual"]}'
    print(msg)
In [ ]:
 
In [ ]:
 

Rollbar

Rollbar - Endpoint Message

In [ ]:
 
In [ ]:
 

Parallelizing NumPy / Pandas

https://dask.org/

In [ ]:
 

Python Extras / Syntax

In [60]:
a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
start, stop = 2, 3
a[start:stop]  # items start through stop-1
Out[60]:
[2]
In [61]:
a[start:]      # items start through the rest of the array
Out[61]:
[2, 3, 4, 5, 6, 7, 8, 9]
In [62]:
a[:stop]       # items from the beginning through stop-1
Out[62]:
[0, 1, 2]
In [63]:
a[:]           # a copy of the whole array
Out[63]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [64]:
a[-1]    # last item in the array
Out[64]:
9
In [65]:
a[-2:]   # last two items in the array
Out[65]:
[8, 9]
In [66]:
a[:-2]   # everything except the last two items
Out[66]:
[0, 1, 2, 3, 4, 5, 6, 7]
In [ ]:
 

FIN

  • Questions?
In [ ]: