Investment Decisions, A Discussion on Bayes


Note: This post is about exploration, as is my first time programming in R, please do not reference these examples for professional resources or for basing actual investment decisions.

Cryptocurrency is a high-risk investment. There are many unknowns and the market is volatile. Savvy investors, however, can have huge wins in this emerging market. While Bitcoin and Lightcoin are the big players in the sector, it is the mystical Ethereum (ETH) that caught my eye. It was brought to my attention by my husband, who is quite successful at managing our portfolio.


About 3-4 years ago, my husband wanted money to gamble in the stock market aggressively for our retirement. We made a deal that the money had to be over and above our regular income, since this would be a risky endeavor. For maximum profits, we bought a hybrid car for about $12,000 and began driving it in our free time for a variety of services – Uber, Amazon, Lyft, etc… Just driving for these services in his free time, my husband made roughly $58,000 yr. This money was used for investing.

Our early investments were less risky and included NVIDA, AMD, Amazon, Microsoft and other tech companies. My husband is a Network Security specialist and I work is the Virtual Reality sector, so tech is where we are comfortable. One morning, my husband asked me if we should invest in Ethereum. I did not know what Ethereum was at the time, so I immediately set out to do some research. After some reading, I knew I was seeing something amazing, but that didn’t quite understand it. We discussed purchase at length and made a decision, scooping up a sizable quantity of the up and coming cryptocurrency Ethereum (ETH).

What is Ethereum (ETH)?

Ethereum is a decentralized blockchain platform and cryptocurrency. Ethereum runs smart contracts, which are simply, applications that run exactly as programmed. This means smart contracts, in theory, run without the possibility of crashes, loss of time, censorship, fraud or third party intrusion [1]. The benefits of using smart contracts have led banks to start vetting various platforms, like Ethereum. The caveat is that there are less protections barring the types of mistakes that hackers can leverage. This means applications created on the platform require a certain level of skill and that if a team member is lacking in capacity the potential for exploits can be quite severe.

%Percent of Change (ETH)
Image 01


The first year was quite successful and we ran a 40% return on the portfolio based entirely on intuition. Ethereum (ETH), however, was a standout with some of the most successful returns in the whole portfolio. In March of 2016, when we bought ETH, it was $7.60 per coin, today the price is holding at around $90.44. Any investment with 1090% change in your favor is definitely worthwhile. My surgery this year was $89,000+ so the money we made, paid important medical bills, gave us some retirement savings… and as a bonus bought a (used) Mercedes SL-500. We also paid a pretty penny in taxes which felt good to give back to our country. For illustrative purposes, a $25,000 investment made at the time would have yielded a $272,500 return [Image 01].

Some analysts are saying ETH’s value will continue to climb topping over 3000% from its starting value, but these are changing times. Since ETH’s inception, they have been hit with hacking scandals and new challengers in the sector. We cashed out some of the coins, but a large percentage is still riding. The current challenge we face is whether or not to keep the coins and make the long play or to sell them now and cash out.


During the time of the initial investment, several apps and websites were used to monitor and purchase ETH coins, such as Coinbase. Almost all transactions occurred over mobile device, typically on the latest Android rollout. Other data was gathered in Quickbooks and a Turkish accounting program. This data was mostly transactional and related to taxes and banking. The price was the single most studied factor along with a crude text webscraping for news relating to the cryptocurrency market.

Pulling Historical Data for Analysis

For analysis, ETH historical .csv datasets were pulled from Kaggle [1]. The epoch or unix time code was converted to human readable formatting in Excel and data was prepared for use in RapidMiner. These are the attributes we will be using for this analysis;

  • Address – total amount of addresses in the blockchain on any given day,
  • Blocksize – total Blocksize on any given day,
  • Price ($USD) – price per Ether (In USD) on any given day,
  • Total Ether Growth – total amount of Ethereum in the Ether,
  • Hashrate – total hashrate of everyone on the blockchain,
  • Market Cap Value – total cap of the Ethereum cryptocurrency, also has ETH -> USD and USD cap total.
  • Transitions – a number of transactions made on any given day [1].

We are going to start assembling the raw data in Excel, do some preliminary exploration in RapidMiner and create the entire forecast in with R. We will use the Bayes Theorem in several ways. First, we will use Naïve Bayes to support validation functions and for our early prediction models. Then, we will use Bayesian statistical forecasting methods, in tandem, with ARIMA models to create an accurate forecast.


This week I really needed time to think and put together my thoughts. We are trying to get to the patterns in the data. I thought …how do we do that? When I look at Bayes Theorem, I see a formula and it feels easy to just start plugging in some sort of data in and looking at results,

but in this case, we are talking historical time series data; which, has its own idiosyncrasies and nuances. First, it’s important to reduce the noise. Our data is pretty streamlined, however, removing the information we don’t need will help to provide clarity. Then, we need to find the regularities. What is common? Where are the hidden patterns actually at? How exactly is Bayes going to help us get to an accurate prediction? We can use Bayes to ensure accuracy in validation, and for the prediction, but one of the most interesting ways to think of Bayes in this problem set, is for higher level decision making. We can use Bayes to determine, which ARIMA model will have the best accuracy. We could use Bayes again for classification on upwards and downwards trends on the backend of our model, as well, but due to time, we will stop once we achieve the ARIMA model forecast.

Initially in Excel, a preliminary forecast was put together to predict the future values for sales over time. According to Microsoft Excel Help Documents, Excel uses the AAA version of the Exponential Smoothing (ETS) algorithm in its forecasting model. Our preliminary forecast indicates that the price should continue to do quite well until the close of 2017 [Image 02]. Quick series charts were created for some of the attributes for visual reference [Image 03]. This, of course, is not telling us the whole story, so we will need to dig a bit deeper.


Hopping into RapidMiner, Moving Average and Fit Trend are analyzed. In stocks, the Moving Average is a measure that is tracked closely by investors and traders [2]. Breaks above and below the moving average are often considered to be important trading signals [2].  Moving Averages are based on past prices, and therefore, are not current.

Even though they are not current, Moving Averages can be important trading signals. With this in mind, we construct a process to calculate Moving Average [Image 04] We want to take a look to see if there are any point when the two lines cross over. It is easy to see that the rising Moving Average on the following chart [Image 05] indicates that ETH is in an overall uptrend.

Fit Trend shows trends in series data and is also useful for our purposes. In order to use the Fit Trend operator, we will need to create an inner regression learner [Image 06]. The fit trend operator will generate a trend for ETH Price based on the linear regression model used in our cross-validation operator. The generalized linear regression will build a model based on our current training set [Image 07]. Under the hood of the Fit Trend, we will run a neural net to build out our trend [Image 08]. In the chart, we can see that while things look good on the surface at a deeper level ETH could be heading for some problems. The trendline dips under the price line which is a bit concerning and something to watch moving forward [Image 09].


Correlation is a good way to identify and eliminate irrelevant and redundant attributes. We want our model to be as efficient and accurate, as

possible, so this is a necessary step. Starting a new process in RapidMiner, I use the retrieve operator to port in the data from the ETH Historical Data Excel file. Once loaded, the data is prepped; attributes are selected and renamed, roles are set and labels are discretized by frequency. Next Correlation Matrix and Select by Weights operators are applied [Image 10]. Lastly, a Split Validation operator is put in place and a Naïve Bayes operator is added to generate a classification model [Image 11]. Finally, the trained model is applied to the example set and the performance of the process evaluated.


As we can see from the pairwise table, Total ETH Growth and Market Cap Value are revealed to have a correlation of 1 to Price ($USD) [Image 12]. These are redundant and are filtered out. Moving forward we will focus on Total Addresses and Transactions which has fairly strong correlation to Price ($USD). Further, the K-Means Cluster Model also demonstrates that the Total Addresses and Transactions are attributes to focus on [Image 15]. This is the first pattern to emerge.

By using the Naïve Bayes operator embedded in our Split Validation, we can now look at the measure of performance success for our correlation. Using the Naïve Bayes vector for performance, we get an accuracy of 76.74% using [Image 13]. Additionally, for exploratory purposes, I re-ran the correlation using a Neural Net in place of Naïve Bayes [Image 14]. This yielded an accuracy of an exciting 87.60%. The results were the same for the correlations, however, now we can be sure that the calculations are that much more accurate.

Now that we know that the number of Transactions are correlated with a rise in the price of ETH, let’s say, theoretically, that we modify our text mining parameters to search for all news relating to transactions and events that impact the level of transactions. We modify parameters and search for any events that will dramatically increase or decrease the number of Transactions on a given day. Armed with the knowledge gained during the correlation, we might be worried about an emerging news story that pops up from our web scraping and text mining efforts. A change to net neutrality, for instance, could lead to changes in formerly reliable bandwidth by internet service providers and slow down the number transactions inadvertently. This might be a good reason to consider a “shorting the market” type play.


If we want to calculate a forecast on Time Series data, using Bayesian forecasting statistical methods we are going to need to move to R Studio. Unfortunately, neither Excel, RapidMiner or Tableau are set up to handle this type of task. Here we will use some R programming to illustrate the Bayesian approach to model order selection for the class of ARIMA time series models. An ARIMA model is a special type of dynamic linear models, or state space models [3]. ARIMA models have a sequential context and are built on a Bayesian framework [3].

The first issue we face is that we need to decompose the data so we can fit it to a formula. We can think of it like this, see below [Image 21].

Image 21

Luckily for us, we have taken care of this already in RapidMiner. So, the first item to take care of is updating the Excel sheet with the Moving Average (Lag Variable), Fit Trend (Trend Variable) and additionally, Differentiate (Difference Variable) (not shown). Once added to the Excel sheet, we now cross over into R Studio.

In order to fit our data to the formula, we have many things to consider… for instance, what is the lag of the data? Is the data stationary? And… what is the best ARIMA model to use? The process and code for our forecast in R is somewhat long, so from here out please take a look at the code and the output attached as a separate file below or on my site. Each line is commented with helpful notes about what the code is actually doing and where it is pulling from, etc… Note, as I am not a programmer, this solution is heavily based on pre-existing code written by Ani Katchova [4] for her tutorial Time Series ARIMA Models in R and runs off the tseries package for R.

The important take away is that the code is following a Bayesian framework and is basically looking at the data, pulling what it needs to create and assess an ARIMA model (trends, seasonal, regression and error data) and creating several different ARIMA models [5]. Then the code generates probabilities and statistics about those models. Armed with those figures, we can identify which model is most accurate for predicting the future price of ETH [6].


ETH has been a great investment, but the era of cryptocurrency is still very much in its infantry. Although it looks favorable in the near future, only time will tell its overall fate. Looking through the data in this way is difficult, because of the general volatility of coin investments. If we look at the oscillation of the Price ($USD) over time, this is evidenced, because despite aggressive measures to make the data stationary, it is fairly wonky.  Stability in the market means less opportunity, however, so moves made today may be riskier, but tend to have more beneficial outcomes. Using these insights, to predict short term plays seems to be where the immediate application will be. In the future, changing the seasonality and length of time the data set includes might yield more telling results.


Up until now, the only time data was reviewed, it was done rather informally by looking at the averages in the Price ($USD) from week to week. This lines up with what we are seeing in the Moving Average. While we have completed the forecast model, the data is not perfectly lining up [Image 28]. This is because we still need to work on getting the data to be stationary. From what we can see base on the Forecast, we are heading for a bumpy ride as the Price ($USD) sharply increases and drops off again [Image 27].

ETH is part of a dynamic emerging market with major up and down swings, so there are probably better ways to handle this type of prediction and assessment than I am familiar with. However, based on the Forecast and the Fit Trend, the safest play is to either cash out or to plan to stay in for the long term. So in this respect, we are acting as conservatively as we can, without cashing out completely.


Prior to this assessment, our text mining efforts did not focus in on Total Addresses or Transactions at all. Now we are exploring tracking issues relating to the volume and frequency of these two factors. We are hoping that moving in this direction will provide valuable insights and new avenues of research.

The dip in the Fit Trend was another intriguing pattern that was surprising. Is this a sign that we should prepare to make a strategic play? Should we look for opportunities to short the market? This is an area we will have to dig further into in the future.

At the end of the day, would these insights have changed my decision? Probably, not too much. ETH is risky investment and will continue to be. Analysts much more advanced, than myself, debate it regularly.  Based on what I knew back then versus now, my only regret is that I did not invest more and pull out less.



[4]. Econometrics Academy. (2013, December 28). Retrieved May 16, 2017, from

[1]. Larsen, L. (2017, April 18). Ethereum Historical Data. Retrieved May 17, 2017, from

[2]. Picardo, E., CFA. (2013, November 28). Moving Average – MA. Retrieved May 17, 2017, from

[5]. West, M. (n.d.). BAYESIAN DYNAMIC MODELLING (Working paper). Retrieved May 15, 2017, from Department of Statistical Science, Duke University website:

[3]. West, M. (2002, June 5). BAYESIAN TIME SERIES. Retrieved May 15, 2017, from

Leave a Reply, We'd love to hear from You!

This site uses Akismet to reduce spam. Learn how your comment data is processed.