Finding Patterns and Outcomes in Time Series Data - Hands-On with Python
Introduction
Let's analyze time-series data and assign outcome variables depending on pattern types. If you are looking to model raw time series for classification, this video is for you.
Code
from IPython.display import Image
Image(filename='viralml-book.png')
Fundamental and Technical Indicators - Hands-On Market Analysis¶
Companion book: "The Little Book of Fundamental Market Indicators":
More at:
Pattern Recognition on Time Series Data - Finding Outcomes using Matching Shapes¶
ViralML-Hands-On-Time-Seriees-Pattern-Recognition-and-Assigning-Outcome-Variables
Gold Price: London Fixing¶
https://www.quandl.com/data/LBMA/GOLD-Gold-Price-London-Fixing
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import io, base64, os, json, re
import pandas as pd
import numpy as np
import datetime
import warnings
warnings.filterwarnings('ignore')
path_to_market_data = '/Users/manuel/Documents/financial-research/market-data/2019-08-03/'
Load Data¶
# https://www.quandl.com/data/LBMA/GOLD-Gold-Price-London-Fixing
gold_df = pd.read_csv(path_to_market_data + 'LBMA-GOLD.csv')
gold_df['Date'] = pd.to_datetime(gold_df['Date'])
gold_df = gold_df[['Date', 'USD (PM)']]
gold_df.columns = ['Date', 'GLD']
gold_df['GLD'] = pd.to_numeric(gold_df['GLD'], errors='coerce')
print(np.min(gold_df['Date'] ),np.max(gold_df['Date'] ))
gold_df = gold_df.sort_values('Date', ascending=True)
gold_df = gold_df.dropna(how='any')
gold_df.head()
# Price chart
fig, ax = plt.subplots(figsize=(16, 8))
plt.plot(gold_df['Date'], gold_df['GLD'], label='GLD', color='gold')
plt.title('Gold ' + str(np.min(gold_df['Date'])) + ' - ' + str(np.max(gold_df['Date'])))
plt.legend(loc='upper left')
plt.grid()
plt.show()
def split_seq(seq, num_pieces):
# https://stackoverflow.com/questions/54915803/automatically-split-data-in-list-and-order-list-elements-and-send-to-function
start = 0
for i in range(num_pieces):
stop = start + len(seq[i::num_pieces])
yield seq[start:stop]
start = stop
def pearson(s1, s2):
"""take two pd.Series objects and return a pearson corrleation"""
s1_c=s1-np.mean(s1)
s2_c=s2-np.mean(s2)
return np.sum(s1_c * s2_c) / np.sqrt(np.sum(s1_c ** 2) * np.sum(s2_c ** 2))
# we don't need to do this, just emphasizing
gold_df = gold_df.sort_values('Date', ascending=True)
lookback = 30
dates = gold_df['Date']
prices = list(gold_df['GLD'].values)
counter_ = -1
price_series = []
for day in dates:
counter_ += 1
# if counter_ % 1000 == 0: print(counter_)
if counter_ >= lookback:
price_series.append(prices[counter_-lookback:counter_])
timeseries_df = pd.DataFrame(price_series)
Look for rises and build outcome¶
timeseries_df.shape
timeseries_df.head()
counter = 5
for index, row in timeseries_df.iterrows():
counter -= 1
# look for desired shape
plt.plot(row.values)
plt.grid()
plt.show()
if counter < 0:
break
Pattern simplifier¶
Here we break a long list of data into smaller lists set by 'complexity' and then average out each one
counter = 5
complexity = 5
for index, row in timeseries_df.iterrows():
counter -= 1
# look for desired shape
plt.plot([np.mean(r) for r in split_seq(list(row.values), complexity)])
plt.grid()
plt.show()
if counter < 0:
break
[np.mean(t) for t in split_seq(list(r), complexity)]
Create an ideal shape pattern¶
Play around with the shape, you can select ups, downs, u's or v's - anythin goes
# let's single out the shape we want
correlate_against = [0,0,0,0,1,2]
plt.plot(correlate_against)
plt.grid()
Using the pearson correlation function to find the best matching shape¶
complexity = 6
outcome_list = []
for index, row in timeseries_df.iterrows():
simplified_values = []
for r in split_seq(list(row.values), complexity):
simplified_values.append(np.mean(r))
correz = pearson(simplified_values,correlate_against)
if correz > 0.5:
outcome_list.append(1)
else:
outcome_list.append(0)
np.mean(outcome_list)
timeseries_df['outcome'] = outcome_list
timeseries_df.head(20)
timeseries_df_tmp = timeseries_df[timeseries_df['outcome']==1]
timeseries_df_tmp.tail()
timeseries_df_tmp = timeseries_df_tmp.tail()
# pull one example and remove the outcome variable
example = timeseries_df_tmp.values[0][:-1]
plt.plot(example)
simplified_values = []
for r in split_seq(list(example), complexity):
simplified_values.append(np.mean(example))
plt.plot(simplified_values)
vals = [np.mean(r) for r in split_seq(list(example), complexity)]
np.min(vals)
vals2 = [val - np.min(vals) for val in vals]
plt.plot(vals2)
Show Notes
(pardon typos and formatting -these are the notes I use to make the videos)
Let's analyze time-series data and assign outcome variables depending on pattern types. If you are looking to model raw time series for classification, this video is for you.