Chapter 2 Data sources
2.1 Dataset Overview
All data collected has been observed and analyzed between 1st January 2020 and 30th April 2022.
Nota bene: Because cryptocurrency markets do not have set trading hours and do not close for holidays or weekends, we receive daily data from 12am to 11:59pm. The S&P500 and DJI, on the other hand, are only traded Monday through Friday during fixed market hours.
2.2 Data Sources
2.2.1 Cryptocurrency Daily Price
Obtained from https://www.investing.com/ in a csv format.
Contains Volume, opening price, high, low for each crypto currency.
2.2.2 S&P500 and Dow Jones Index daily Price and Volume
Scraped from NASDAQ using Quandl library. NASDAQ being the official marketplace for financial and economic data,was the most reliable and efficient source.
Contains Volume, opening price, closing price, high, low for each index
2.2.3 MarketCap Data of Popular Crypto Currencies
Obtained from https://www.coingecko.com/ - using the library geckor.
They were several sources of marketcap data, the most popular being coinmarketcap - however API calls were expensive and not freely available. Coinmarketcap is the one stop source for all crypto data, for paid/premium users. Coingecko is a close alternative but doesnt provide globalmarketcap data.
Because of the price point, and the necessity of global market capitalization data for our analysis, I manually obtained global market capitalization data from coinmarketcap and rendered it into a csv file - which has been separately uploaded to the git repository.
2.2.4 Twitter Data
The academicTwitter library is only available upon request to some users and the regular Twitter developer API doesnt provide historical data beyond 7 days.
We tried scraping mentions on twitter per keyword, but each callback only allowed for 1500 tweets and limited us to 1 million tweets per day - which wouldnt have allowed us to obtain tweet_count by keyword for the last 2 years.
The data was finally obtained by scraping a publicly available graph on bitinfocharts using python( added separately to git repository ) and the data was rendered into a csv file - which was then used to analyse count of tweets for the keywords - ethereum, dogecoin and bitcoin.