Chapter 2 Data sources

2.1 Dataset Overview

All data collected has been observed and analyzed between 1st January 2020 and 30th April 2022.

Nota bene: Because cryptocurrency markets do not have set trading hours and do not close for holidays or weekends, we receive daily data from 12am to 11:59pm. The S&P500 and DJI, on the other hand, are only traded Monday through Friday during fixed market hours.

2.2 Data Sources

2.2.1 Cryptocurrency Daily Price

Obtained from https://www.investing.com/ in a csv format.

Contains Volume, opening price, high, low for each crypto currency.

2.2.2 S&P500 and Dow Jones Index daily Price and Volume

Scraped from NASDAQ using Quandl library. NASDAQ being the official marketplace for financial and economic data,was the most reliable and efficient source.

Contains Volume, opening price, closing price, high, low for each index

2.2.4 Twitter Data

The academicTwitter library is only available upon request to some users and the regular Twitter developer API doesnt provide historical data beyond 7 days.

We tried scraping mentions on twitter per keyword, but each callback only allowed for 1500 tweets and limited us to 1 million tweets per day - which wouldnt have allowed us to obtain tweet_count by keyword for the last 2 years.

The data was finally obtained by scraping a publicly available graph on bitinfocharts using python( added separately to git repository ) and the data was rendered into a csv file - which was then used to analyse count of tweets for the keywords - ethereum, dogecoin and bitcoin.