Project 4 Deep Learning

Posted by Edward-Beck on October 14, 2019
    In the age old question how does one get rich without having to exert oneself, i turned to the stock market. I hoped by utilizing the power of neural networks and other machine learning techniques new applications to establish theorems that could be shown.  Beginning the exploration with market data containing open, close, high and low price of stocks of as well as the quantity of trades in a given day were collected from February 2013 to February of 2018. The S&P a stock market index of the largest 500 companies in the US was chosen to be representative of the market as a whole. To expand the data set additional features were added which including a binary definition if the stock closed higher than it opened and another binary feature with respect to the market. A rolling average of the past five days of performance was added as well as seven day lag for each stock and for the market.
    The next task was to see if any meaningful information could be derived from a rather basic dataset using the powers of machine learning or shallow learning. Our goal quite simply put, was based upon the other features we had already established, predict if the stock would be worth more than it was the day before. The accuracy of the logistic regression results were 52.3% or slightly better than flipping a coin to make your decision whether to sell or buy. Though one would still be much better off using their regression than seated at the blackjack table. An interesting discovery though was the model was had a bias of selling the stock but there was still some hedging. However when the technique of synthetic minority over-sampling was applied there was a balanced distribution. For the sake of variety another type of machine learning multiple Decision Tree Classifiers was employed.The results of test data of five different classifiers ranged from 53.3% to 59.6% with the Random Forest Classifier.  
    The findings of basic machine learning techniques does make sense in the light of economics, specifically the Black-Scholes equation which allows for no arbitrage to occur and because machine learning is not banded from being applied to the stock market more likely than not someone has come before me and done just that until the market corrected itself. Yet this is over the span of five years and perhaps over a much shorter time frame better predictions can be made. 
     The potential of neural networks to optimize patterns of large datasets comes from the model ‘training’ a portion data by repeated iteration over said data. Each piece of input data or neuron is a given a set of weights which is in a one to many relationship with the next layer of the network. The second layer of a network has many to one relationship with every neuron in it. The process that has occurred from the first layer to the second layer is then repeated until there are no more hidden layers and a final output is reached.  The training is in essence altering the data to best predict a targeted feature of the dataset, in our case will the stock be worth more or less tomorrow. Training data is different from testing data for after the network has best optimized its weights between neurons and layers a brand new set of data the testing data is then used to determine the accuracy of the model. Just as there are multiple models for machine learning and even decision trees a subset, there are a spectrum of choices for neural networks.
    Long Short Term Memory Neural Network was the best choice currently because the data was strongly associated with time. The prominent feature of LSTMs is being able to place greater emphasis on information that is current and forgetting some of the information that is in the past. There are several ways in a neural network can structure data to best predict the target values.   When predicting a true/false question Binary Cross-Entropy is the most reasonable. The two optimizers were employed first Adam, a hybrid of Adaptive Gradient Algorithm and Root Mean Square Propagation. The second optimizer stochastic gradient descent had a slightly lower rate of accuracy at 53.1% compared to Adam optimizer with 55.9%. The rate of learning for both optimizers were similar value loss of 0.68.  
    In conclusion both shallow and deep learning do theoretically offer a statistically significant approach to modeling the movement of the stock market. Additional input should be collected to determine if learning rates and accuracies can be improved over time. Another aspect that should be explored is the volatility as well as the variance of that volatility. Perhaps the most effective methodology of predicting future outcomes is moving from a discrete model to a continuous model.