My New Udemy Class: Fundamental Market Analysis with Python

Fundamental Market Analysis with Python

Fundamental Market Analysis with Python - Find Your Own Answers On What Is Going on in the Financial Markets

Hot off the Press!

The Little Book of Fundamental Market Indicators

My New Book: "The Little Book of Fundamental Analysis: Hands-On Market Analysis with Python" is Out!

Grow Your Web Brand, Visibility & Traffic Organically

The Little Book of Fundamental Market Indicators

5 Years of amunategui.github.Io and the Lessons I Learned from Growing My Online Community from the Ground Up

Finding Patterns and Outcomes in Time Series Data - Hands-On with Python

Introduction

Let's analyze time-series data and assign outcome variables depending on pattern types. If you are looking to model raw time series for classification, this video is for you.

Code

ViralML-Hands-On-Time-Series-Pattern-Recognition-and-Assigning-Outcome-Variables
In [231]:
from IPython.display import Image
Image(filename='viralml-book.png')
Out[231]:

Fundamental and Technical Indicators - Hands-On Market Analysis

Companion book: "The Little Book of Fundamental Market Indicators":

https://amzn.to/2DERG3d

More at:

https://www.viralml.com/

Pattern Recognition on Time Series Data - Finding Outcomes using Matching Shapes

ViralML-Hands-On-Time-Seriees-Pattern-Recognition-and-Assigning-Outcome-Variables

Gold Price: London Fixing

https://www.quandl.com/data/LBMA/GOLD-Gold-Price-London-Fixing

In [187]:
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import io, base64, os, json, re 
import pandas as pd
import numpy as np
import datetime
import warnings
warnings.filterwarnings('ignore')
In [188]:
path_to_market_data = '/Users/manuel/Documents/financial-research/market-data/2019-08-03/'

Load Data

In [189]:
# https://www.quandl.com/data/LBMA/GOLD-Gold-Price-London-Fixing
gold_df = pd.read_csv(path_to_market_data + 'LBMA-GOLD.csv')
gold_df['Date'] = pd.to_datetime(gold_df['Date'])

gold_df = gold_df[['Date', 'USD (PM)']]
gold_df.columns = ['Date', 'GLD']
gold_df['GLD'] = pd.to_numeric(gold_df['GLD'], errors='coerce')

print(np.min(gold_df['Date'] ),np.max(gold_df['Date'] ))
gold_df = gold_df.sort_values('Date', ascending=True) 
gold_df = gold_df.dropna(how='any')

gold_df.head()
1968-01-02 00:00:00 2019-07-30 00:00:00
Out[189]:
Date GLD
12982 1968-04-01 37.70
12981 1968-04-02 37.30
12980 1968-04-03 37.60
12979 1968-04-04 36.95
12978 1968-04-05 37.00
In [190]:
# Price chart
fig, ax = plt.subplots(figsize=(16, 8))
plt.plot(gold_df['Date'], gold_df['GLD'], label='GLD', color='gold')
plt.title('Gold ' + str(np.min(gold_df['Date'])) + ' - ' + str(np.max(gold_df['Date'])))
plt.legend(loc='upper left')
plt.grid()
plt.show()
 
In [172]:
def split_seq(seq, num_pieces):
    # https://stackoverflow.com/questions/54915803/automatically-split-data-in-list-and-order-list-elements-and-send-to-function
    start = 0
    for i in range(num_pieces):
        stop = start + len(seq[i::num_pieces])
        yield seq[start:stop]
        start = stop
        
        
def pearson(s1, s2):
    """take two pd.Series objects and return a pearson corrleation"""
    s1_c=s1-np.mean(s1)
    s2_c=s2-np.mean(s2)
    return np.sum(s1_c * s2_c) / np.sqrt(np.sum(s1_c ** 2) * np.sum(s2_c ** 2))

Build time series out of daily data

30 trading day series

In [191]:
# we don't need to do this, just emphasizing 
gold_df = gold_df.sort_values('Date', ascending=True) 

lookback = 30
dates = gold_df['Date']
prices = list(gold_df['GLD'].values)
counter_ = -1
price_series = []
for day in dates:
    counter_ += 1
    # if counter_ % 1000 == 0: print(counter_)
    if counter_ >= lookback:
        price_series.append(prices[counter_-lookback:counter_])
                
timeseries_df = pd.DataFrame(price_series)              
 

Look for rises and build outcome

In [194]:
timeseries_df.shape
Out[194]:
(12867, 30)
In [195]:
timeseries_df.head()
Out[195]:
0 1 2 3 4 5 6 7 8 9 ... 20 21 22 23 24 25 26 27 28 29
0 37.70 37.30 37.60 36.95 37.00 37.05 37.5 37.70 38.00 38.00 ... 39.20 39.45 39.10 39.75 39.30 39.75 39.70 39.60 39.50 39.80
1 37.30 37.60 36.95 37.00 37.05 37.50 37.7 38.00 38.00 37.80 ... 39.45 39.10 39.75 39.30 39.75 39.70 39.60 39.50 39.80 40.25
2 37.60 36.95 37.00 37.05 37.50 37.70 38.0 38.00 37.80 37.55 ... 39.10 39.75 39.30 39.75 39.70 39.60 39.50 39.80 40.25 41.25
3 36.95 37.00 37.05 37.50 37.70 38.00 38.0 37.80 37.55 37.65 ... 39.75 39.30 39.75 39.70 39.60 39.50 39.80 40.25 41.25 41.50
4 37.00 37.05 37.50 37.70 38.00 38.00 37.8 37.55 37.65 38.00 ... 39.30 39.75 39.70 39.60 39.50 39.80 40.25 41.25 41.50 42.30

5 rows × 30 columns

In [196]:
counter = 5
for index, row in timeseries_df.iterrows():
    counter -= 1
    # look for desired shape
    plt.plot(row.values)
    plt.grid()
    plt.show()
    if counter < 0:
        break

Pattern simplifier

Here we break a long list of data into smaller lists set by 'complexity' and then average out each one

In [197]:
counter = 5
complexity = 5
for index, row in timeseries_df.iterrows():
    counter -= 1
    # look for desired shape
    plt.plot([np.mean(r) for r in split_seq(list(row.values), complexity)])
    plt.grid()
    plt.show()
    if counter < 0:
        break
In [205]:
[np.mean(t) for t in split_seq(list(r), complexity)]
Out[205]:
[37.26666666666667,
 37.75833333333333,
 38.208333333333336,
 39.225,
 39.60833333333333]

Create an ideal shape pattern

Play around with the shape, you can select ups, downs, u's or v's - anythin goes

In [207]:
# let's single out the shape we want
correlate_against = [0,0,0,0,1,2] 
plt.plot(correlate_against)
plt.grid()

Using the pearson correlation function to find the best matching shape

In [212]:
complexity = 6
outcome_list = []
for index, row in timeseries_df.iterrows():
    simplified_values = []
    for r in split_seq(list(row.values), complexity):
        simplified_values.append(np.mean(r))
    correz = pearson(simplified_values,correlate_against)
    if correz > 0.5:
        outcome_list.append(1)
    else:
        outcome_list.append(0)
In [213]:
np.mean(outcome_list)
Out[213]:
0.35571617315613585
In [219]:
timeseries_df['outcome'] = outcome_list
timeseries_df.head(20)
Out[219]:
0 1 2 3 4 5 6 7 8 9 ... 21 22 23 24 25 26 27 28 29 outcome
0 37.70 37.30 37.60 36.95 37.00 37.05 37.50 37.70 38.00 38.00 ... 39.45 39.10 39.75 39.30 39.75 39.70 39.60 39.50 39.80 1
1 37.30 37.60 36.95 37.00 37.05 37.50 37.70 38.00 38.00 37.80 ... 39.10 39.75 39.30 39.75 39.70 39.60 39.50 39.80 40.25 1
2 37.60 36.95 37.00 37.05 37.50 37.70 38.00 38.00 37.80 37.55 ... 39.75 39.30 39.75 39.70 39.60 39.50 39.80 40.25 41.25 1
3 36.95 37.00 37.05 37.50 37.70 38.00 38.00 37.80 37.55 37.65 ... 39.30 39.75 39.70 39.60 39.50 39.80 40.25 41.25 41.50 1
4 37.00 37.05 37.50 37.70 38.00 38.00 37.80 37.55 37.65 38.00 ... 39.75 39.70 39.60 39.50 39.80 40.25 41.25 41.50 42.30 1
5 37.05 37.50 37.70 38.00 38.00 37.80 37.55 37.65 38.00 38.40 ... 39.70 39.60 39.50 39.80 40.25 41.25 41.50 42.30 42.40 1
6 37.50 37.70 38.00 38.00 37.80 37.55 37.65 38.00 38.40 38.25 ... 39.60 39.50 39.80 40.25 41.25 41.50 42.30 42.40 41.55 1
7 37.70 38.00 38.00 37.80 37.55 37.65 38.00 38.40 38.25 38.30 ... 39.50 39.80 40.25 41.25 41.50 42.30 42.40 41.55 41.40 1
8 38.00 38.00 37.80 37.55 37.65 38.00 38.40 38.25 38.30 38.65 ... 39.80 40.25 41.25 41.50 42.30 42.40 41.55 41.40 41.75 1
9 38.00 37.80 37.55 37.65 38.00 38.40 38.25 38.30 38.65 38.75 ... 40.25 41.25 41.50 42.30 42.40 41.55 41.40 41.75 41.50 1
10 37.80 37.55 37.65 38.00 38.40 38.25 38.30 38.65 38.75 39.10 ... 41.25 41.50 42.30 42.40 41.55 41.40 41.75 41.50 41.50 1
11 37.55 37.65 38.00 38.40 38.25 38.30 38.65 38.75 39.10 39.20 ... 41.50 42.30 42.40 41.55 41.40 41.75 41.50 41.50 41.60 1
12 37.65 38.00 38.40 38.25 38.30 38.65 38.75 39.10 39.20 39.45 ... 42.30 42.40 41.55 41.40 41.75 41.50 41.50 41.60 41.95 1
13 38.00 38.40 38.25 38.30 38.65 38.75 39.10 39.20 39.45 39.10 ... 42.40 41.55 41.40 41.75 41.50 41.50 41.60 41.95 41.95 1
14 38.40 38.25 38.30 38.65 38.75 39.10 39.20 39.45 39.10 39.75 ... 41.55 41.40 41.75 41.50 41.50 41.60 41.95 41.95 41.15 1
15 38.25 38.30 38.65 38.75 39.10 39.20 39.45 39.10 39.75 39.30 ... 41.40 41.75 41.50 41.50 41.60 41.95 41.95 41.15 41.20 1
16 38.30 38.65 38.75 39.10 39.20 39.45 39.10 39.75 39.30 39.75 ... 41.75 41.50 41.50 41.60 41.95 41.95 41.15 41.20 41.20 1
17 38.65 38.75 39.10 39.20 39.45 39.10 39.75 39.30 39.75 39.70 ... 41.50 41.50 41.60 41.95 41.95 41.15 41.20 41.20 41.25 1
18 38.75 39.10 39.20 39.45 39.10 39.75 39.30 39.75 39.70 39.60 ... 41.50 41.60 41.95 41.95 41.15 41.20 41.20 41.25 41.30 0
19 39.10 39.20 39.45 39.10 39.75 39.30 39.75 39.70 39.60 39.50 ... 41.60 41.95 41.95 41.15 41.20 41.20 41.25 41.30 41.55 0

20 rows × 31 columns

In [226]:
timeseries_df_tmp = timeseries_df[timeseries_df['outcome']==1]
timeseries_df_tmp.tail()
Out[226]:
0 1 2 3 4 5 6 7 8 9 ... 21 22 23 24 25 26 27 28 29 outcome
12862 1332.35 1335.90 1351.25 1341.30 1341.35 1344.05 1379.50 1397.15 1405.70 1431.40 ... 1413.75 1407.60 1412.40 1409.85 1410.35 1417.45 1439.70 1427.75 1425.55 1
12863 1335.90 1351.25 1341.30 1341.35 1344.05 1379.50 1397.15 1405.70 1431.40 1403.95 ... 1407.60 1412.40 1409.85 1410.35 1417.45 1439.70 1427.75 1425.55 1426.95 1
12864 1351.25 1341.30 1341.35 1344.05 1379.50 1397.15 1405.70 1431.40 1403.95 1402.50 ... 1412.40 1409.85 1410.35 1417.45 1439.70 1427.75 1425.55 1426.95 1416.10 1
12865 1341.30 1341.35 1344.05 1379.50 1397.15 1405.70 1431.40 1403.95 1402.50 1409.00 ... 1409.85 1410.35 1417.45 1439.70 1427.75 1425.55 1426.95 1416.10 1420.40 1
12866 1341.35 1344.05 1379.50 1397.15 1405.70 1431.40 1403.95 1402.50 1409.00 1390.10 ... 1410.35 1417.45 1439.70 1427.75 1425.55 1426.95 1416.10 1420.40 1419.05 1

5 rows × 31 columns

In [227]:
timeseries_df_tmp = timeseries_df_tmp.tail()
# pull one example and remove the outcome variable
example = timeseries_df_tmp.values[0][:-1]
plt.plot(example)
Out[227]:
[<matplotlib.lines.Line2D at 0x12bb9a7b8>]
In [229]:
simplified_values = []
for r in split_seq(list(example), complexity):
    simplified_values.append(np.mean(example))
plt.plot(simplified_values)
Out[229]:
[<matplotlib.lines.Line2D at 0x12bc6bac8>]
In [230]:
vals = [np.mean(r) for r in split_seq(list(example), complexity)]
np.min(vals)
vals2 = [val - np.min(vals) for val in vals]
plt.plot(vals2)
Out[230]:
[<matplotlib.lines.Line2D at 0x12bfd1908>]

Show Notes

(pardon typos and formatting -
these are the notes I use to make the videos)

Let's analyze time-series data and assign outcome variables depending on pattern types. If you are looking to model raw time series for classification, this video is for you.