### My New Udemy Class: Fundamental Market Analysis with Python

Fundamental Market Analysis with Python - Find Your Own Answers On What Is Going on in the Financial Markets

### Hot off the Press!

My New Book: "The Little Book of Fundamental Analysis: Hands-On Market Analysis with Python" is Out!

### Grow Your Web Brand, Visibility & Traffic Organically

5 Years of amunategui.github.Io and the Lessons I Learned from Growing My Online Community from the Ground Up

### Introduction

Let's analyze time-series data and assign outcome variables depending on pattern types. If you are looking to model raw time series for classification, this video is for you.

### Code

ViralML-Hands-On-Time-Series-Pattern-Recognition-and-Assigning-Outcome-Variables
In [231]:
from IPython.display import Image
Image(filename='viralml-book.png')

Out[231]:

# Fundamental and Technical Indicators - Hands-On Market Analysis¶

Companion book: "The Little Book of Fundamental Market Indicators":

https://amzn.to/2DERG3d

More at:

https://www.viralml.com/

# Pattern Recognition on Time Series Data - Finding Outcomes using Matching Shapes¶

ViralML-Hands-On-Time-Seriees-Pattern-Recognition-and-Assigning-Outcome-Variables

### Gold Price: London Fixing¶

https://www.quandl.com/data/LBMA/GOLD-Gold-Price-London-Fixing

In [187]:
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import io, base64, os, json, re
import pandas as pd
import numpy as np
import datetime
import warnings
warnings.filterwarnings('ignore')

In [188]:
path_to_market_data = '/Users/manuel/Documents/financial-research/market-data/2019-08-03/'


In [189]:
# https://www.quandl.com/data/LBMA/GOLD-Gold-Price-London-Fixing
gold_df['Date'] = pd.to_datetime(gold_df['Date'])

gold_df = gold_df[['Date', 'USD (PM)']]
gold_df.columns = ['Date', 'GLD']
gold_df['GLD'] = pd.to_numeric(gold_df['GLD'], errors='coerce')

print(np.min(gold_df['Date'] ),np.max(gold_df['Date'] ))
gold_df = gold_df.sort_values('Date', ascending=True)
gold_df = gold_df.dropna(how='any')


1968-01-02 00:00:00 2019-07-30 00:00:00

Out[189]:
Date GLD
12982 1968-04-01 37.70
12981 1968-04-02 37.30
12980 1968-04-03 37.60
12979 1968-04-04 36.95
12978 1968-04-05 37.00
In [190]:
# Price chart
fig, ax = plt.subplots(figsize=(16, 8))
plt.plot(gold_df['Date'], gold_df['GLD'], label='GLD', color='gold')
plt.title('Gold ' + str(np.min(gold_df['Date'])) + ' - ' + str(np.max(gold_df['Date'])))
plt.legend(loc='upper left')
plt.grid()
plt.show()


In [172]:
def split_seq(seq, num_pieces):
# https://stackoverflow.com/questions/54915803/automatically-split-data-in-list-and-order-list-elements-and-send-to-function
start = 0
for i in range(num_pieces):
stop = start + len(seq[i::num_pieces])
yield seq[start:stop]
start = stop

def pearson(s1, s2):
"""take two pd.Series objects and return a pearson corrleation"""
s1_c=s1-np.mean(s1)
s2_c=s2-np.mean(s2)
return np.sum(s1_c * s2_c) / np.sqrt(np.sum(s1_c ** 2) * np.sum(s2_c ** 2))


# Build time series out of daily data¶

In [191]:
# we don't need to do this, just emphasizing
gold_df = gold_df.sort_values('Date', ascending=True)

lookback = 30
dates = gold_df['Date']
prices = list(gold_df['GLD'].values)
counter_ = -1
price_series = []
for day in dates:
counter_ += 1
# if counter_ % 1000 == 0: print(counter_)
if counter_ >= lookback:
price_series.append(prices[counter_-lookback:counter_])

timeseries_df = pd.DataFrame(price_series)



# Look for rises and build outcome¶

In [194]:
timeseries_df.shape

Out[194]:
(12867, 30)
In [195]:
timeseries_df.head()

Out[195]:
0 1 2 3 4 5 6 7 8 9 ... 20 21 22 23 24 25 26 27 28 29
0 37.70 37.30 37.60 36.95 37.00 37.05 37.5 37.70 38.00 38.00 ... 39.20 39.45 39.10 39.75 39.30 39.75 39.70 39.60 39.50 39.80
1 37.30 37.60 36.95 37.00 37.05 37.50 37.7 38.00 38.00 37.80 ... 39.45 39.10 39.75 39.30 39.75 39.70 39.60 39.50 39.80 40.25
2 37.60 36.95 37.00 37.05 37.50 37.70 38.0 38.00 37.80 37.55 ... 39.10 39.75 39.30 39.75 39.70 39.60 39.50 39.80 40.25 41.25
3 36.95 37.00 37.05 37.50 37.70 38.00 38.0 37.80 37.55 37.65 ... 39.75 39.30 39.75 39.70 39.60 39.50 39.80 40.25 41.25 41.50
4 37.00 37.05 37.50 37.70 38.00 38.00 37.8 37.55 37.65 38.00 ... 39.30 39.75 39.70 39.60 39.50 39.80 40.25 41.25 41.50 42.30

5 rows × 30 columns

In [196]:
counter = 5
for index, row in timeseries_df.iterrows():
counter -= 1
# look for desired shape
plt.plot(row.values)
plt.grid()
plt.show()
if counter < 0:
break


# Pattern simplifier¶

Here we break a long list of data into smaller lists set by 'complexity' and then average out each one

In [197]:
counter = 5
complexity = 5
for index, row in timeseries_df.iterrows():
counter -= 1
# look for desired shape
plt.plot([np.mean(r) for r in split_seq(list(row.values), complexity)])
plt.grid()
plt.show()
if counter < 0:
break

In [205]:
[np.mean(t) for t in split_seq(list(r), complexity)]

Out[205]:
[37.26666666666667,
37.75833333333333,
38.208333333333336,
39.225,
39.60833333333333]

# Create an ideal shape pattern¶

Play around with the shape, you can select ups, downs, u's or v's - anythin goes

In [207]:
# let's single out the shape we want
correlate_against = [0,0,0,0,1,2]
plt.plot(correlate_against)
plt.grid()


# Using the pearson correlation function to find the best matching shape¶

In [212]:
complexity = 6
outcome_list = []
for index, row in timeseries_df.iterrows():
simplified_values = []
for r in split_seq(list(row.values), complexity):
simplified_values.append(np.mean(r))
correz = pearson(simplified_values,correlate_against)
if correz > 0.5:
outcome_list.append(1)
else:
outcome_list.append(0)

In [213]:
np.mean(outcome_list)

Out[213]:
0.35571617315613585
In [219]:
timeseries_df['outcome'] = outcome_list

Out[219]:
0 1 2 3 4 5 6 7 8 9 ... 21 22 23 24 25 26 27 28 29 outcome
0 37.70 37.30 37.60 36.95 37.00 37.05 37.50 37.70 38.00 38.00 ... 39.45 39.10 39.75 39.30 39.75 39.70 39.60 39.50 39.80 1
1 37.30 37.60 36.95 37.00 37.05 37.50 37.70 38.00 38.00 37.80 ... 39.10 39.75 39.30 39.75 39.70 39.60 39.50 39.80 40.25 1
2 37.60 36.95 37.00 37.05 37.50 37.70 38.00 38.00 37.80 37.55 ... 39.75 39.30 39.75 39.70 39.60 39.50 39.80 40.25 41.25 1
3 36.95 37.00 37.05 37.50 37.70 38.00 38.00 37.80 37.55 37.65 ... 39.30 39.75 39.70 39.60 39.50 39.80 40.25 41.25 41.50 1
4 37.00 37.05 37.50 37.70 38.00 38.00 37.80 37.55 37.65 38.00 ... 39.75 39.70 39.60 39.50 39.80 40.25 41.25 41.50 42.30 1
5 37.05 37.50 37.70 38.00 38.00 37.80 37.55 37.65 38.00 38.40 ... 39.70 39.60 39.50 39.80 40.25 41.25 41.50 42.30 42.40 1
6 37.50 37.70 38.00 38.00 37.80 37.55 37.65 38.00 38.40 38.25 ... 39.60 39.50 39.80 40.25 41.25 41.50 42.30 42.40 41.55 1
7 37.70 38.00 38.00 37.80 37.55 37.65 38.00 38.40 38.25 38.30 ... 39.50 39.80 40.25 41.25 41.50 42.30 42.40 41.55 41.40 1
8 38.00 38.00 37.80 37.55 37.65 38.00 38.40 38.25 38.30 38.65 ... 39.80 40.25 41.25 41.50 42.30 42.40 41.55 41.40 41.75 1
9 38.00 37.80 37.55 37.65 38.00 38.40 38.25 38.30 38.65 38.75 ... 40.25 41.25 41.50 42.30 42.40 41.55 41.40 41.75 41.50 1
10 37.80 37.55 37.65 38.00 38.40 38.25 38.30 38.65 38.75 39.10 ... 41.25 41.50 42.30 42.40 41.55 41.40 41.75 41.50 41.50 1
11 37.55 37.65 38.00 38.40 38.25 38.30 38.65 38.75 39.10 39.20 ... 41.50 42.30 42.40 41.55 41.40 41.75 41.50 41.50 41.60 1
12 37.65 38.00 38.40 38.25 38.30 38.65 38.75 39.10 39.20 39.45 ... 42.30 42.40 41.55 41.40 41.75 41.50 41.50 41.60 41.95 1
13 38.00 38.40 38.25 38.30 38.65 38.75 39.10 39.20 39.45 39.10 ... 42.40 41.55 41.40 41.75 41.50 41.50 41.60 41.95 41.95 1
14 38.40 38.25 38.30 38.65 38.75 39.10 39.20 39.45 39.10 39.75 ... 41.55 41.40 41.75 41.50 41.50 41.60 41.95 41.95 41.15 1
15 38.25 38.30 38.65 38.75 39.10 39.20 39.45 39.10 39.75 39.30 ... 41.40 41.75 41.50 41.50 41.60 41.95 41.95 41.15 41.20 1
16 38.30 38.65 38.75 39.10 39.20 39.45 39.10 39.75 39.30 39.75 ... 41.75 41.50 41.50 41.60 41.95 41.95 41.15 41.20 41.20 1
17 38.65 38.75 39.10 39.20 39.45 39.10 39.75 39.30 39.75 39.70 ... 41.50 41.50 41.60 41.95 41.95 41.15 41.20 41.20 41.25 1
18 38.75 39.10 39.20 39.45 39.10 39.75 39.30 39.75 39.70 39.60 ... 41.50 41.60 41.95 41.95 41.15 41.20 41.20 41.25 41.30 0
19 39.10 39.20 39.45 39.10 39.75 39.30 39.75 39.70 39.60 39.50 ... 41.60 41.95 41.95 41.15 41.20 41.20 41.25 41.30 41.55 0

20 rows × 31 columns

In [226]:
timeseries_df_tmp = timeseries_df[timeseries_df['outcome']==1]
timeseries_df_tmp.tail()

Out[226]:
0 1 2 3 4 5 6 7 8 9 ... 21 22 23 24 25 26 27 28 29 outcome
12862 1332.35 1335.90 1351.25 1341.30 1341.35 1344.05 1379.50 1397.15 1405.70 1431.40 ... 1413.75 1407.60 1412.40 1409.85 1410.35 1417.45 1439.70 1427.75 1425.55 1
12863 1335.90 1351.25 1341.30 1341.35 1344.05 1379.50 1397.15 1405.70 1431.40 1403.95 ... 1407.60 1412.40 1409.85 1410.35 1417.45 1439.70 1427.75 1425.55 1426.95 1
12864 1351.25 1341.30 1341.35 1344.05 1379.50 1397.15 1405.70 1431.40 1403.95 1402.50 ... 1412.40 1409.85 1410.35 1417.45 1439.70 1427.75 1425.55 1426.95 1416.10 1
12865 1341.30 1341.35 1344.05 1379.50 1397.15 1405.70 1431.40 1403.95 1402.50 1409.00 ... 1409.85 1410.35 1417.45 1439.70 1427.75 1425.55 1426.95 1416.10 1420.40 1
12866 1341.35 1344.05 1379.50 1397.15 1405.70 1431.40 1403.95 1402.50 1409.00 1390.10 ... 1410.35 1417.45 1439.70 1427.75 1425.55 1426.95 1416.10 1420.40 1419.05 1

5 rows × 31 columns

In [227]:
timeseries_df_tmp = timeseries_df_tmp.tail()
# pull one example and remove the outcome variable
example = timeseries_df_tmp.values[0][:-1]
plt.plot(example)

Out[227]:
[<matplotlib.lines.Line2D at 0x12bb9a7b8>]
In [229]:
simplified_values = []
for r in split_seq(list(example), complexity):
simplified_values.append(np.mean(example))
plt.plot(simplified_values)

Out[229]:
[<matplotlib.lines.Line2D at 0x12bc6bac8>]
In [230]:
vals = [np.mean(r) for r in split_seq(list(example), complexity)]
np.min(vals)
vals2 = [val - np.min(vals) for val in vals]
plt.plot(vals2)

Out[230]:
[<matplotlib.lines.Line2D at 0x12bfd1908>]

### Show Notes

(pardon typos and formatting -
these are the notes I use to make the videos)

Let's analyze time-series data and assign outcome variables depending on pattern types. If you are looking to model raw time series for classification, this video is for you.