FOSS Trading

xts_0.13.2 on CRAN

Mon, 22 Jan 2024 12:41:00 -0500

An updated version of xts is now on CRAN. The most notable change is that plot.xts() now supports a log scale y-axis. This involved a significant refactor of the plot.xts() internals, so it’s possible to have introduced some bugs.

Features

Add ability to log scale the y-axis in plot.xts(). (#103)
Significantly refactor the internals of plot.xts(). This made it a lot easier to add the y-axis log scale. (#408)

Enhancements

Print a message when period.apply() is called with FUN = mean because it calculates the mean for each column, not all the data in the subset like it does for all other functions. The message says to use FUN = colMeans for current behavior and FUN = function(x) mean(x) to calculate the mean for all the data. This information is also included in the help files. The option xts.message.period.apply.mean = FALSE suppresses the message. (#124)
Actually change the underlying index values when ’tclass’ is changed from a class with a timezone (e.g. POSIXct) to one without a timezone (e.g. Date). Add a warning when this happens, with a global option to always suppress the warning. Thanks to Daniel Palomar for the report and suggestion! (#311)

Bug Fixes

Fix error when print.xts() is called ‘quote’ or ‘right’ arguments. Thanks to Willem Maetens for the report and patch! (#401)
Fix addPolygon() so it renders when observation.based = TRUE. (#403)
Print trailing zeros for index value with fractional seconds, so every index value has the same number of characters. (#404)

I look forward to your questions and feedback! If you have a question, please ask on Stack Overflow and use the [r] and [xts] tags. Or you can send an email to the R-SIG-Finance mailing list (you must subscribe to post). Open an issue on GitHub if you find a bug or want to request a feature. Please read the contributing guide first! It will help save time for both of us. ;-)

Adaptive Asset Allocation Extended

Wed, 17 Jan 2024 13:02:00 -0500

This post extends the replication from the Adaptive Asset Allocation Replication post by running the analysis on OOS (out-of-sample) data from 2015 through 2023. Thanks to Dale Rosenthal for helpful comments.

The paper uses the 5 portfolios below. Each section of this post will give a short description of the portfolio construction and then focus on comparing the OOS results with the replicated and original results. See the other post for details on the data and portfolio construction methodologies.

Equal weight of all asset classes
Equal risk contribution of all asset classes
Equal weight of highest momentum asset classes
Equal risk contribution of highest momentum asset classes
Minimum variance of highest momentum asset classes

The table below summarizes the date ranges for each sample period in this post.

Period	Date Range
Replication	Feb 1996 - Dec 2014
OOS	Jan 2015 - Dec 2023
2015-2021	Jan 2015 - Dec 2021
Full	Feb 1996 - Dec 2023

This post uses the ftblog package. You can install it using the remotes::install_github() function in the code block below. First we need to setup our environment with the necessary packages, data, and functions.

# remotes::install_github("joshuaulrich/ftblog")
suppressPackageStartupMessages({
 library(ftblog)
 library(PerformanceAnalytics)
})

data(aaa_returns, package = "ftblog")
returns <- aaa_returns[, -1] # no cash
r_rep <- returns["/2014"]
r_oos <- returns["2014-07/"] # need 6 months for 120-day lags (this is 128 days)
r_full <- returns

# calculate strategy statistics
strat_summary <-
function(returns,
 original_results = NULL)
{
 stats <- table.AnnualizedReturns(returns)
 stats <- rbind(stats,
 "Worst Drawdown" = -maxDrawdown(returns))

 if (!is.null(original_results)) {
 stats <- cbind(original_results, stats)
 colnames(stats)[1] <- "Original"
 }

 stats <- round(stats, 3)

 return(stats)
}

chart_performance <-
function(R,
 title = "Performance")
{
 stopifnot(all(c("Replication", "OOS") %in% colnames(R)))
 r <- R[, c("Replication", "OOS")]
 p <- chart.CumReturns(r,
 main = title,
 main.timespan = FALSE,
 yaxis.right = TRUE)
 p <- addLegend("topleft", lty = 1, lwd = 1)
 p <- addSeries(r[,1], type = "h", main = "Return")
 p <- addSeries(r[,2], type = "h", on = 0, col = "red")
 p <- addSeries(Drawdowns(r), main = "Drawdown")
 p
}

1. Equal weight portfolio of all asset classes

This portfolio assumes no knowledge of expected relative asset class performance, risk, or correlation. It holds each asset class in equal weight and is rebalanced monthly.

rr_equal_weight <- as.xts(apply(returns["/2014"], 1, mean))
ro_equal_weight <- as.xts(apply(returns["2015/"], 1, mean))
rf_equal_weight <- as.xts(apply(returns, 1, mean))

monthly_returns <-
 merge(Replication = to_monthly_returns(rr_equal_weight),
 OOS = to_monthly_returns(ro_equal_weight),
 "2015-2021" = to_monthly_returns(ro_equal_weight["2015/2021"]),
 Full = to_monthly_returns(rf_equal_weight),
 check.names = FALSE)

stats <- strat_summary(monthly_returns)
chart_performance(monthly_returns, "All Assets - Equal Weight")

	Replication	OOS	2015-2021	Full
Annualized Return	0.079	0.049	0.072	0.069
Annualized Std Dev	0.115	0.107	0.091	0.112
Annualized Sharpe (Rf=0%)	0.684	0.456	0.794	0.614
Worst Drawdown	-0.377	-0.210	-0.136	-0.377

The OOS annualized return is significantly less than the prior results. This is largely due to the -21.0% drawdown that started in 2022 and is still ongoing. Note that the full-period results are very similar to the replication results, though the 2022 drawdown did decrease the annualized return by ~1%.

Note that this portfolio’s results from 2015-2021 are very similar to the replication results through the end of 2014. That suggests the 2022 bear market is the main cause for the lower return in the OOS results.

2. Equal risk contribution using all asset classes

The next portfolio assumes the investor has some knowledge of each asset’s risk, but still no knowledge of relative performance or correlations. So each asset in this portfolio is given a weight proportional to its historical relative risk, with the hope that each asset will contribute the same amount of risk to the overall portfolio in the future.

rr_equal_risk <- portf_return_equal_risk(r_rep, 120, 60)
ro_equal_risk <- portf_return_equal_risk(r_oos, 120, 60)
rf_equal_risk <- portf_return_equal_risk(r_full, 120, 60)

monthly_returns <-
 merge(to_monthly_returns(rr_equal_risk),
 to_monthly_returns(ro_equal_risk["2015/"]),
 to_monthly_returns(ro_equal_risk["2015/2021"]),
 to_monthly_returns(rf_equal_risk))
colnames(monthly_returns) <- c("Replication", "OOS", "2015-2021", "Full")

stats <- strat_summary(monthly_returns)
chart_performance(monthly_returns, "All Assets - Equal Risk")

	Replication	OOS	2015-2021	Full
Annualized Return	0.086	0.034	0.056	0.069
Annualized Std Dev	0.073	0.082	0.061	0.076
Annualized Sharpe (Rf=0%)	1.177	0.411	0.908	0.903
Worst Drawdown	-0.142	-0.194	-0.071	-0.194

Like the equal weight portfolio, this portfolio’s OOS annualized return is significantly lower than the replication results. This methodology only slightly reduced the 2022 drawdown to -19.4% from -21.0%. The maximum drawdown is now in 2022 instead of during the 2008 financial crisis.

In the replication, the equal risk contribution portfolio results are better than the equal weight portfolio, but the OOS equal risk portfolio did not show similar improvement. Even when 2022 is excluded, the OOS equal risk portfolio didn’t show improvement over the equal weight portfolio.

3. Equal weight portfolio of highest momentum asset classes

The next portfolio assumes the investor has some knowledge of each asset’s returns, but still no knowledge of risk or correlations. Asset returns are based on 6-month momentum (approximately 120 days). Momentum is re-estimated every month and only the top 5 assets are included in the portfolio.

rr_momo_eq_wt <- portf_return_momo(r_rep, 5, 120)
ro_momo_eq_wt <- portf_return_momo(r_oos, 5, 120)
rf_momo_eq_wt <- portf_return_momo(r_full, 5, 120)

monthly_returns <-
 merge(to_monthly_returns(rr_momo_eq_wt),
 to_monthly_returns(ro_momo_eq_wt["2015/"]),
 to_monthly_returns(ro_momo_eq_wt["2015/2021"]),
 to_monthly_returns(rf_momo_eq_wt))
colnames(monthly_returns) <- c("Replication", "OOS", "2015-2021", "Full")

stats <- strat_summary(monthly_returns)
chart_performance(monthly_returns, "Top 5 Momentum Assets - Equal Weight")

	Replication	OOS	2015-2021	Full
Annualized Return	0.142	0.051	0.081	0.112
Annualized Std Dev	0.114	0.104	0.092	0.111
Annualized Sharpe (Rf=0%)	1.243	0.488	0.884	1.001
Worst Drawdown	-0.199	-0.213	-0.114	-0.213

Again, the OOS annualized return is significantly worse than the replicated results. The OOS results for this portfolio show improvement in the Sharpe Ratio versus the equal risk contribution portfolio (2). The replicated results for this portfolio showed similar improvements versus portfolio (2).

In the replication, equal weight momentum results are better than the equal risk portfolio. But the OOS equal weight momentum portfolio did not show significant improvement versus the equal risk portfolio (2), and is roughly the same as the equal weight portfolio (1).

4. Equal risk contribution portfolio of highest momentum asset classes

rr_momo_eq_risk <- portf_return_momo_equal_risk(r_rep, 5, 120, 60)
ro_momo_eq_risk <- portf_return_momo_equal_risk(r_oos, 5, 120, 60)
rf_momo_eq_risk <- portf_return_momo_equal_risk(r_full, 5, 120, 60)

monthly_returns <-
 merge(to_monthly_returns(rr_momo_eq_risk),
 to_monthly_returns(ro_momo_eq_risk["2015/"]),
 to_monthly_returns(ro_momo_eq_risk["2015/2021"]),
 to_monthly_returns(rf_momo_eq_risk))
colnames(monthly_returns) <- c("Replication", "OOS", "2015-2021", "Full")

stats <- strat_summary(monthly_returns)
chart_performance(monthly_returns, "Top 5 Momentum Assets - Equal Risk")

	Replication	OOS	2015-2021	Full
Annualized Return	0.137	0.050	0.081	0.108
Annualized Std Dev	0.102	0.095	0.081	0.100
Annualized Sharpe (Rf=0%)	1.335	0.528	0.991	1.076
Worst Drawdown	-0.119	-0.204	-0.086	-0.204

It’s clear that the major cause of the poorer OOS performance of this portfolio is due to how it handled the 2022 bear market. This portfolio handled the 2008 financial crisis very well, but it offered almost no protection in 2022. This indicates there was a fundamental difference in 2008 versus 2022 in the asset classes held by this portfolio.

Similar to the replicated results, the reduction in risk is the main benefit of this portfolio versus the equal weight momentum portfolio (3). That said, the OOS performance of this portfolio only showed marginal improvement versus portfolio (3). Even more notable, this portfolio didn’t improve returns versus the simple equal weight portfolio (1) during the OOS period like it did for the replication period.

5. Minimum variance portfolio of highest momentum asset classes

rr_momo_min_var <- portf_return_momo_min_var(r_rep, 5, 120, 60, "above average")
ro_momo_min_var <- portf_return_momo_min_var(r_oos, 5, 120, 60, "above average")
rf_momo_min_var <- portf_return_momo_min_var(r_full, 5, 120, 60, "above average")

monthly_returns <-
 merge(to_monthly_returns(rr_momo_min_var),
 to_monthly_returns(ro_momo_min_var["2015/"]),
 to_monthly_returns(ro_momo_min_var["2015/2021"]),
 to_monthly_returns(rf_momo_min_var))
colnames(monthly_returns) <- c("Replication", "OOS", "2015-2021", "Full")

stats <- strat_summary(monthly_returns)
chart_performance(monthly_returns, "Above Average 6mo Momentum - Min Var")

	Replication	OOS	2015-2021	Full
Annualized Return	0.130	0.058	0.082	0.106
Annualized Std Dev	0.099	0.095	0.081	0.098
Annualized Sharpe (Rf=0%)	1.315	0.612	1.008	1.086
Worst Drawdown	-0.112	-0.162	-0.083	-0.162

Recall that the original results for portfolio (5) showed improved return and lower maximum drawdown versus portfolio (4), while the replicated results were almost the same for both portfolios. The OOS results for these two portfolios are also very similar. In the 2015-2021 period, portfolio (5) has a slightly higher return and Sharpe ratio and lower max drawdown than portfolio (4).

Conclusion

For all 5 portfolios, the OOS results are not as good as the replicated results. This is largely due to the 2022 bear market, but the 2015-2021 results still aren’t as good as the replicated results.

Allocate Smartly has a great post about 2022 bear market performance of tactical asset allocation (TAA) strategies like this one. They find that TAA strategies did poorly in the 2022 bear market if they assumed intermediate and long-term bonds provide diversification from risky assets. Both risk assets and longer duration bonds performed poorly in 2022, and the correlation between bonds and equities was positive instead of negative like they have been historically.

In a future post, I may investigate how these portfolios would have performed if they were allowed to allocate to short-term Treasuries.

Portfolio Results by Sample Period

This section contains tables with results for all portfolios in a particular sample period.

Replication Period

	Equal Weight	Equal Risk	Momo Eq Weight	Momo Eq Risk	Momo Min Var
Ann. Return	0.079	0.086	0.142	0.137	0.130
Ann. Std Dev	0.115	0.073	0.114	0.102	0.099
Ann. Sharpe	0.684	1.177	1.243	1.335	1.315
Max Drawdown	-0.377	-0.142	-0.199	-0.119	-0.112

Out-of-Sample: 2015-2023

	Equal Weight	Equal Risk	Momo Eq Weight	Momo Eq Risk	Momo Min Var
Ann. Return	0.049	0.034	0.051	0.050	0.058
Ann. Std Dev	0.107	0.082	0.104	0.095	0.095
Ann. Sharpe	0.456	0.411	0.488	0.528	0.612
Max Drawdown	-0.210	-0.194	-0.213	-0.204	-0.162

Out-of-Sample: 2015-2021

	Equal Weight	Equal Risk	Momo Eq Weight	Momo Eq Risk	Momo Min Var
Ann. Return	0.072	0.056	0.081	0.081	0.082
Ann. Std Dev	0.091	0.061	0.092	0.081	0.081
Ann. Sharpe	0.794	0.908	0.884	0.991	1.008
Max Drawdown	-0.136	-0.071	-0.114	-0.086	-0.083

Adaptive Asset Allocation Replication

Fri, 08 Dec 2023 13:00:00 -0500

The paper, “Adaptive Asset Allocation: A Primer” by Adam Butler, Mike Philbrick, Rodrigo Gordillo, and David Varadi addresses flaws in the traditional application of Modern Portfolio Theory related to Strategic Asset Allocation. It shows that estimating return and (co)variance parameters over shorter time horizons are superior to estimates over long-term horizons because parameter estimates vary substantially over time. Longer-term estimates do not account for this variability in the short-term. They propose an Adaptive Asset Allocation portfolio construction methodology that uses the new parameter estimates to substantially improve performance relative to Strategic Asset Allocation.

Data

The original paper creates portfolios from 10 major global asset classes using data between 1995 and 2014. It uses ETFs when possible, and uses passive no-load mutual funds, underlying indexes, and no-load active mutual funds as proxies for asset class returns prior to ETF inception. The paper doesn’t list the actual instruments used at each point in time, so this post attempts to replicate their described methodology using publicly available data.

The table below shows the asset class, instruments, and time horizons used for each asset class in this replication. The data start in late 1997 instead of 1995 as in the original paper. Returns in this analysis are adjusted for splits and dividends.

	ETF	ETF Start	Fund	Fund Start
US Equity	VTI	2001-05	VTSMX	1992-04
European Equity	VGK	2005-03	VEURX	1990-06
Japanese Equity	EWJ	1996-03	FJPNX	1992-12
Emerging Market Equity	EEM	2003-04	VEIEX	1994-12
US Real Estate	ICF	2001-02	VGSIX	1996-05
International Real Estate	RWX	2006-12	XRFIX	1997-09
Intermediate Term Treasury	IEF	2002-07	VFITX	1991-12
Long Term Treasury	TLT	2002-07	VUSTX	1986-12
Commodities	DBC	2006-02	QRACX	1997-03
Gold	GLD	2004-11	SGGDX	1993-08

# remotes::install_github("joshuaulrich/ftblog")
suppressPackageStartupMessages({
 library(ftblog)
 library(PerformanceAnalytics)
})

data(aaa_returns, package = "ftblog")
# Only use data through the end of 2014, and no cash
returns <- aaa_returns["/2014", -1]

# calculate strategy statistics
strat_summary <-
function(returns,
 original_results = NULL)
{
 stats <- table.AnnualizedReturns(returns)
 stats <- rbind(stats,
 "Worst Drawdown" = -maxDrawdown(returns))
 colnames(stats) <- "Replication"

 if (!is.null(original_results)) {
 stats <- cbind(stats, original_results)
 colnames(stats)[2] <- "Original"
 }

 stats <- round(stats, 3)

 return(stats)
}

Replication

This analysis attempts to replicate all 5 portfolios in the original paper.

Equal weight of all asset classes
Equal risk contribution of all asset classes
Equal weight of highest momentum asset classes
Equal risk contribution of highest momentum asset classes
Minimum variance of highest momentum asset classes

The original paper showed results as a monthly series. The to_monthly_returns() function converts the strategy returns from daily to monthly. The strat_summary() function uses the PerformanceAnalytics package to calculate summary statistics to compare with the original paper. The file adaptive-asset-allocation-replication.R contains all the code used in this analysis, and the returns.rds function contains the data.

1. Equal weight portfolio of all asset classes

The baseline example assumes no knowledge of expected relative asset class performance, risk, or correlation. The results below are based on a portfolio that holds each asset class in equal weight and is rebalanced monthly. The table compares the results of this replication with the original results. Some of the difference in results are due to the different instruments and time frames used for the analysis. Despite the differences in the data between the two approaches, the results are very similar.

The returns_equal_weight object contains the portfolio returns for each day. Then we convert those returns to monthly and evaluate the portfolio results.

returns_equal_weight <-
 as.xts(apply(returns, 1, mean)) |>
 to_monthly_returns()

title <- "All Assets - Equal Weight"
stats <- strat_summary(returns_equal_weight,
 original_results = c(0.081, 0.112, 0.72, -0.392))
charts.PerformanceSummary(returns_equal_weight, main = title, wealth.index = TRUE)

	Replication	Original
Annualized Return	0.079	0.081
Annualized Std Dev	0.115	0.112
Annualized Sharpe (Rf=0%)	0.684	0.720
Worst Drawdown	-0.377	-0.392

2. Equal risk contribution using all asset classes

The next portfolio assumes the investor has some knowledge of each asset’s risk, but still no knowledge of relative performance or correlations. So each asset in this portfolio is given a weight proportional to its relative risk, and each asset contributes the same amount of risk to the overall portfolio. That way no asset’s risk will dominate the risk of the overall portfolio.

The portf_return_equal_risk() estimates the equal risk contribution portfolio using the PERC() function from the FRAPO package. It calculates the portfolio weights at the end of month using estimated portfolio risk from the returns over the last 60 days. Then those weights are used to calculate the portfolio returns for the following month.

returns_equal_risk <-
 portf_return_equal_risk(returns, 120, 60) |>
 to_monthly_returns()

title <- "All Assets - Equal Risk Contribution"
stats <- strat_summary(returns_equal_risk,
 original_results = c(0.085, 0.086, 0.99, -0.242))
charts.PerformanceSummary(returns_equal_risk, main = title, wealth.index = TRUE)

	Replication	Original
Annualized Return	0.086	0.085
Annualized Std Dev	0.073	0.086
Annualized Sharpe (Rf=0%)	1.177	0.990
Worst Drawdown	-0.142	-0.242

In this case, the replicated results are better than the original results. The data differences and the method used to estimate the equal risk contribution portfolio weights are the most likely explanation for the differences in the replicated results.

3. Equal weight portfolio of highest momentum asset classes

The next portfolio assumes the investor has some knowledge of each asset’s returns, but still no knowledge of risk or correlations. The original paper uses momentum to estimate each asset’s returns because momentum (also known as long-term memory) is a well-known property of financial market returns. Assets that have increased (decreased) in price are likely to continue increasing (decreasing) in price in the next period. Academic research shows that instruments with higher (lower) momentum over the past 1-12 months exhibits better (worse) performance over short-term future periods. See the original paper for a description of some reasons why momentum exists in financial markets.

The estimates of each asset’s returns are based on 6-month momentum (approximately 120 days). Momentum is re-estimated every month and only the top 5 assets are included in the portfolio, rather than including every asset in the portfolio. Note that this doesn’t mean every asset included in the portfolio has a positive return over the 6-month period. All of them could have negative returns over the period, in which case the 5 assets with the smallest losses would be included in the portfolio. Each of the top 5 assets included in the portfolio are held with equal weight.

returns_momo_equal_weight <-
 portf_return_momo(returns, 5, 120) |>
 to_monthly_returns()

title <- "Top 5 Assets by 6-month Momentum - Equal Weight"
stats <- strat_summary(returns_momo_equal_weight,
 original_results = c(0.130, 0.110, 1.17, -0.217))
charts.PerformanceSummary(returns_momo_equal_weight, main = title, wealth.index = TRUE)

	Replication	Original
Annualized Return	0.142	0.130
Annualized Std Dev	0.114	0.110
Annualized Sharpe (Rf=0%)	1.243	1.170
Worst Drawdown	-0.199	-0.217

The replicated results have slightly better returns and maximum drawdown, but similar standard deviation. That said, the results are similar enough to suggest this analysis replicates the approach accurately.

4. Equal risk contribution portfolio of highest momentum asset classes

The previous two portfolios estimated asset weights using either risk-based or momentum-based weights. This next portfolio combines estimates of momentum-based performance and accounts for asset class risk differences. It includes the top 5 asset classes based on 6-month returns and weights them using the estimate_equal_risk_portf() function defined earlier. Recall that the function weights each asset class so they each contribute the same amount of risk to the portfolio. Note that the asset volatilities are calculated on 60 days (approximately 3 months) of returns while the momentum is based on 120 days (approximately 6 months).

returns_momo_equal_risk <-
 portf_return_momo_equal_risk(returns, 5, 120, 60) |>
 to_monthly_returns()

title <- "Top 5 Assets by 6-month Momentum - Equal Risk Contribution"
stats <- strat_summary(returns_momo_equal_risk,
 original_results = c(0.140, 0.099, 1.41, -0.148))
charts.PerformanceSummary(returns_momo_equal_risk, main = title, wealth.index = TRUE)

	Replication	Original
Annualized Return	0.137	0.140
Annualized Std Dev	0.102	0.099
Annualized Sharpe (Rf=0%)	1.335	1.410
Worst Drawdown	-0.119	-0.148

The original and replicated results for this portfolio are very similar. Note that in the original paper this portfolio’s overall return increased to 14.0% versus 13.0% for the momentum-based equal weight portfolio, but the replicated results were more similar (14.2% versus 13.7%). Again, the data differences and portfolio weight estimation differences likely cause differences in the results. Also, an un-scientific comparison of the two cumulative return graphs suggests some difference may be due to performance in 2012, when the replicated results have a significant drawdown while the original results showed positive performance.

5. Minimum variance portfolio of highest momentum asset classes

The final portfolio takes the above concepts and adds correlation estimates to the portfolio optimization. The previous portfolios only accounted for the relative risk between the asset classes, but not the correlation between the assets’ returns. This portfolio accounts for the correlations between asset classes by finding the minimum variance portfolio using modern portfolio theory. The asset selection for these portfolios in the original paper differ slightly from the previous portfolios. Instead of taking the top 5 assets with the highest momentum, the original paper selects “assets with above average 6-month momentum”. So it’s not clear how many assets are held in the portfolio each month.

returns_momo_min_var <-
 portf_return_momo_min_var(returns, 5, 120, 60, "above average") |>
 to_monthly_returns()

title <- "Assets With Above Average 6-month Momentum - Minimum Variance"
stats <- strat_summary(returns_momo_min_var,
 original_results = c(0.150, 0.094, 1.60, -0.088))
charts.PerformanceSummary(returns_momo_min_var, main = title, wealth.index = TRUE)

	Replication	Original
Annualized Return	0.130	0.150
Annualized Std Dev	0.099	0.094
Annualized Sharpe (Rf=0%)	1.315	1.600
Worst Drawdown	-0.112	-0.088

The replicated results still show worse performance than the original results, which also seems to be related to performance during 2012. The replicated results also do not show significant improvement relative to the top 5 momentum equal risk portfolio like the original paper shows.

Conclusion

Despite using returns from different instruments in the same asset classes, over a slightly different time period, this analysis closely replicates the results from Adaptive Asset Allocation: A Primer. The differences in the data sets seem to create a significant difference in performance during 2012, but otherwise produce similar results on monthly data.

See Adaptive Asset Allocation Extended for an analysis of the out-of-sample performance of these portfolios. Thanks for reading!

quantmod_0.4.25 on CRAN

Tue, 22 Aug 2023 11:41:57 -0500

An updated version of quantmod is now on CRAN. It includes an awesome new feature that allows you to import up to 7 days of intraday data from Yahoo Finance!

New Features

getSymbols.yahoo() can import up to 7 days of intraday data! Thanks to @kapsner for the report and patch! (#351, #381) It will throw a warning if you try to request more than 7 days of intraday data, but you can suppress the warning (thanks to Dirk Eddelbuettel). (#399)
Add warning if getSymbols() is called with tickers that are reserved words because accessing them requires back-quotes (e.g. NA). (#401)

Bug Fixes

Fix getQuote.yahoo() for API changes. Thanks to Ethan B. Smith for the report and patch! Also add error message for users in GDPR countries, since we cannot automatically consent to GDPR and the request fails without consent. (#392, #393, #395)
Fix getQuote.yahoo() when the user only requested metrics that do not have have a value for ‘regularMarketTime’. Set the value to NA in these cases so the output remains the same regardless of whether the endpoint returns a ‘regularMarketTime’ or not. Thanks to @mehdiMBH for the report! (#255)
Add fields to getQuote.yahoo() that are returned when no fields are explicitly requested. Thanks to @Courvoisier13 for the report! (#335)
Fix allReturns() when ‘subset’ is specified. Thanks to @Panagis1980 for the report! (#402)

I look forward to your questions and feedback! If you have a question, please ask on Stack Overflow and use the [r] and [quantmod] tags. Or you can send an email to the R-SIG-Finance mailing list (you must subscribe to post). Open an issue on GitHub if you find a bug or want to request a feature. Please read the contributing guide first! It will help save time for both of us. ;-)

Running TimeBase in Docker

Sat, 24 Jun 2023 10:21:00 -0500

This is the second post in the series on using TimeBase to stream real-time market data. This post covers using Docker to run TimeBase and the TimeBase Web Administrator.

Getting Started

Docker installation and configuration is outside the scope of this post. Docker has a Get Started page to help you get set up.

I’ll be using Docker via Ubuntu and the command line interface. I know this approach is probably very advanced for most readers, but please bear with me. The concepts should be the same if you’re using the Docker Desktop GUI instead.

If you’re not familiar with Docker, an “image” contains all the necessary information to run an instance of the application. A “container” is a running instance of an image.

Running the Docker Containers

First we need to get the container images from Docker Hub. I will use the docker pull command.

# pull the TimeBase image
docker pull finos/timebase-ce-server:6.1.16

# pull the Web Admin image
docker pull epam/timebase-ws-server:1.0

The TimeBase documentation has a page on deploying with Docker. This post closely follows those examples.

One major difference is that TimeBase documentation uses the --link option to allow the containers to talk to each other, but the Docker documentation says that is a legacy feature that may be removed. The currently preferred method is to create a user-defined network.

# create a user-defined network so the containers to see each other
docker network create --driver bridge timebase-net

# make sure the network was created
docker network ls

By default, the docker containers will write data to a disk that’s only available to the container. You won’t be able to access the data from your computer or if the container isn’t running. There’s a --volume option to specify a mapping between a directory in the container and a directory on your local machine. I’m not going to use that here because you have to run the container as root in order for it to write to your local file system.

docker run --rm --detach \
 --publish 8011:8011 \
 --name timebase-server \
 --user deltix:deltix \
 --ulimit=nofile=65536:65536 \
 --network timebase-net \
 finos/timebase-ce-server:6.1.16

Here are the descriptions of the options used:

--rm: Automatically remove the container when it exits
--detach: Run the container in the background
--publish: Map the container’s port 8011 to my computer’s port 8011
--name: Assign a name to the container
--user: Run the container as the ‘deltix’ user instead of ‘root’
--ulimit: Set user process resource limits
--network: Use the ’timebase-net’ network

Now we can run the Web Admin container. Make sure to use the same network (timebase-net) and make sure the host name (dxtick://timebase-server) is the same as the name you used for the TimeBase container (i.e. ’timebase-server’ in this case). The --env option allows us to set environment variables. These are used to control various configuration options (note that these configuration options are for running a local instance, but they give you an idea of what you can change).

docker run --rm --detach \
 --publish 8099:8099 \
 --name timebase-admin \
 --env "JAVA_OPTS=-Dtimebase.url=dxtick://timebase-server:8011" \
 --network timebase-net \
 epam/timebase-ws-server:1.0

We can use the Web Admin’s interface once it’s running. Open a browser and enter localhost:8099 in the address bar. That should take you to the login page. The default username/password is admin/admin. After you log in you should see a page like the image below. The only stream in the database is events#.

Although this post is about running TimeBase and the Web Admin, I’m not going to end it without showing a little functionality.

Import Data

Let’s import some data and take a quick look. Download this data file. Then click the ‘Import from QSMSG’ button highlighted in red below.

Navigate to the file you just downloaded. Name the stream ‘bars’. You can leave the Description and Symbols fields blank, and leave Periodicity set to irregular.

Now click the ‘bars’ stream in the navigation bar on the left. You should see the data you just imported. It looks like the image below.

Okay, that’s all for now. In the next post, we’ll set up a data connector and watch some data stream into the database!

Streaming Market Data with TimeBase

Mon, 19 Jun 2023 12:14:00 -0500

This is the first post of a series on using TimeBase to stream real-time market data. TimeBase is a high performance event-based time series database and message broker. I used it on a proprietary trading desk that made markets in futures, and currently use it to build and test equity trading strategies. It was released as open-source in February 2021.

Preface

I am not affiliated with the company that created and maintains TimeBase (Deltix, now EPAM). I’m not currently compensated by them in any way for promoting their product. I’m merely a happy user who is excited that TimeBase is open source and I can show you how to do some cool stuff with it.

What’s Your Problem?

TimeBase addresses several key needs in automated trading. You need to process large amounts of real-time market data. This includes trades, best bid/offer, and/or the entire limit order book. You use this data to calculate indicators/features, determine when and where to place orders, monitoring unrealized P&L, and monitoring/managing risk of your positions.

All data in the trading system needs to be processed quickly and absolutely must be processed in strict time order. This is not trivial when you need to interleave data from multiple sources (e.g. exchanges and the trading system itself). The System Architecture section of “The Algorithmic Trading Platform” by Prerak Sanghvi describes the benefits of using a strictly time-sequenced stream of events. To summarize:

Synchronized: every system component always receives the same data in the same order.
Observable: the system is deterministic and can be debugged offline by replaying the data.
Auditable: You can re-create the state of the system at any point in time.
Streamlined: Tasks like logging and persisting to disk can be delegated to components that are off the critical path.

The data also needs be stored for analysis and debugging. Analysis includes things like running backtests, post-trade evaluation, and investigating market behavior. “Selecting a Database for an Algorithmic Trading System” by Prerak Sanghvi discusses the necessary components of a time-series database for algorithmic trading. To summarize:

Fast data ingest: millions of records per second (quote data can be 100+ million records per day)
Ability to process large amounts of historical data for patterns and trends
Time series operations and real-time analytics (e.g. window functions, aggregations, as-of joins)
Expressive query language
Optimized on-disk layout

TimeBase vs Alternatives

Why use TimeBase instead of other open-source projects like RabbitMQ, Kafka, InfluxDB, TimeScaleDB, or ClickHouse? The main reason is that TimeBase is both a message broker and a time-series database. The TimeBase website has its own “Why TimeBase” page and pages that compare popular time-series databases and message brokers. Here’s a summary of the benefits of TimeBase from those pages:

Based on configuration, it supports microsecond latencies or the ability to handle millions of messages per second on commodity hardware.
Enforces stream schemas with heterogeneous and potentially complex message structures.
The same APIs can be used to stream real-time data and replay historical data.
Able to replicate data to other TimeBase instances or applications.
The open-source community edition has multiple crypto exchange data connectors. The enterprise edition has 50+ built-in data connectors.

TimeBase Structure

This is a high-level summary of the TimeBase architecture page.

Data connectors handle connecting to external data sources and translating their data into the TimeBase format. There are many open source crypto exchange data connectors. The enterprise edition has another 50+ data connectors to all major exchanges and many data vendors.

The message broker provides a publish/subscribe pattern to write/read streaming data. The data is processed via readers and writers to streams.

Writers can only write to one stream. Readers can consume multiple streams simultaneously and the messages from every stream are interleaved so that every message consumed is always in guaranteed time order regardless of which stream they come from. It is extremely important that every consumer receives data strictly sequenced by time!

There are two types of streams, durable and transient. Durable streams are persisted to disk. Transient streams are only in memory and can be lossy or lossless.

Writers to lossy streams are not blocked by slow readers, so slow readers may not receive every message but always receive the next available message once they finish processing a message.
Writers to lossless streams are blocked by slow readers, so every reader always receives every message and every reader can only process data as fast as the slowest reader.

The database handles reading/writing data from/to disk, importing and exporting data, replicating data to other applications, and can aggregate data to regular bars. It has a query language (QQL) you can use to extract, filter, aggregate, and transform data in streams.

There’s also an open-source Web Administrator you can use to manipulate streams (create, delete, edit, import/export). It also allows you to view data, including monitoring live data streaming in to the database.

What’s Next?

Later posts in this series will cover at least the topics below. Please leave a comment or contact me with any other things you would like to see!

Building and running TimeBase from source/Docker
Building and running the Web Administrator from source/Docker
Setting up a data connector
Introduction to the Web Administrator (viewing/monitoring data, import/export)
Introduction to QQL, the quant query language

Thanks to TheRobotJames for helpful feedback, and to Adam Butler for encouraging me to write more!

getSymbols Rebooted

Mon, 22 May 2023 10:14:00 -0500

quantmod and getSymbols() have been a core part of the R/Finance ecosystem for over 15 years. We want to change some things, but they would break existing code. We can make these changes in the new ‘rfimport’ package instead.

Background

The quantmod package has been a core part of the R/Finance ecosystem for over 15 years. It’s awesome that the package is so popular, but that also comes with responsibility to maintain backward compatibility. Breaking changes may break code used for making business decisions, research, production, blog posts, books, courses, answers on stackoverflow, and much more. We take this responsibility seriously, and do our best to keep functions backward compatible. Sometimes breaking changes are necessary (e.g. bug fixes, changes to external data sources, etc.), but we do our best to make them carefully, with plenty of warning and lead time for users to adjust their code.

Motivation

There are things in quantmod that we want to change, but they would certainly break existing code. No matter how much we’d like to make those changes, we can’t justify breaking a large portion of the code our community has written in the past 15+ years.

We can create a new package instead of making these changes in quantmod. ‘rfimport’ is where we will work on new implementations that improve on the pieces in getSymbols() that we would like to change. This code is extremely alpha. This is the time to provide feedback, suggestions, feature requests, etc. Know that we will break things, maybe without warning. You should consider the API unstable until the 1.0.0 release.

Refresher on how `getSymbols()` works

By default getSymbols() creates objects in the environment it’s called from, and it returns the value of the Symbols argument. It’s good practice for functions to avoid changing anything in the user’s environment (this is called having side-effects). It’s better for functions to only return a value, like getSymbols(..., auto.assign = FALSE) does. getSymbols() does not support auto.assign = FALSE for more than one symbol.

getSymbols() also uses functionality that was formerly provided by the archived ‘Defaults’ package. This functionality allows users to set default values for getSymbols() source method arguments (e.g. return.class = "data.frame"). This is also a side-effect because it makes getSymbols() depend on something other than argument values.

getSymbols() specifies the data sources via its src argument, and uses the src argument to determine which source method to use (e.g. getSymbols("SPY", src= "yahoo") will call getSymbols.yahoo("SPY") behind the scenes). This is essentially method dispatch, but done manually rather than using R’s built-in S3 functionality.

What we’ve learned

We should avoid the side-effect of creating objects in the calling environment.
Data sources should use S3 method dispatch, and documentation needs to be easier to find.
Stock ticker symbology is a pain and we need a better way to handle it.
We need a way to provide functionality like the ‘Defaults’ package did, but without side-effects.

1. Automatically creating objects

getSymbols() creates an object for every value in the Symbols argument. This isn’t an issue for a few symbols, but it clutters the environment when there are several hundred symbols. You can load all the symbols into a separate environment, but that’s not a pattern most users are familiar with.

We wanted to remove the ability to load objects into the calling environment, and even created a warning about changing auto.assign = FALSE as the default for getSymbols() and recommending users replace their getSymbols() call with the loadSymbols() function that already exists. But we ultimately decided breaking the community’s code wasn’t worth it.

Automatically creating objects makes it cumbersome to put prices for all symbols into one object. This is a common use case and there are several steps. It should be possible with one or two function calls. Here’s an example.

symbols <- c("SPY", "AAPL")
getSymbols(symbols)

# Put all the prices into one xts object,
prices <- do.call(merge, lapply(symbols, get))
# or
prices <- do.call(merge, mget(symbols))

# Extract only the Close prices
close_prices <- Cl(prices)

# Remove ".Close" suffix so close_prices[, "SPY"] works
colnames(close_prices) <- sub(".Close", "", colnames(close_prices))

Automatically creating objects also makes passing all the data to another function awkward. It causes users to do things like:

Call getSymbols() in any function that needs data, which may mean the same data is imported multiple times.
Pass the same symbols object to getSymbols() and the other function. Then the other function searches through environments to find the objects with named with those symbols.
Users could put all the data in an environment and use that as an argument to the function, but I haven’t seen many people use this pattern.

2. Data source methods

Different getSymbols() source methods can (or may need to) have different arguments. Ideally the source methods wouldn’t be exported because users shouldn’t call them directly (users should call getSymbols("SPY", src= "yahoo") instead of getSymbols.yahoo("SPY")). It’s hard to find documentation for unexported functions, which means it’s hard to know what arguments are available for each source method.

The source methods are named like S3 methods even though getSymbols() isn’t a generic function and the source methods aren’t actual S3 methods. This has the potential to create odd behavior that would confuse users.

3. Ticker symbology

There are two major issues with ticker symbols.

Exchanges and data providers sometimes use different ticker symbols for the same security.
Some ticker symbols are not valid R object names.

Another issue is when the ticker symbol is similar to the name of one of the price columns. This has come up several times with Lowe’s (LOW). The Lo() and OHLC() functions think all of the columns with the ticker symbol in the column name are the low price for the period.

Same security, different ticker

This isn’t getSymbols()’s fault and it’s out of our control, but it could be handled better. Exchange and data source symbology is awful. Identifiers for the same series are often different across exchanges and data providers. For example: the symbol for Berkshire Hathaway B-class shares is “BRK-B” for Yahoo Finance, “BRK/B” for the SIP (Securities Information Processor), “BRK B” for ICE, and probably “BRK.B” somewhere else.

This is a difficult problem and will likely take a lot of effort to get right. Therefore it won’t be a high priority initially.

Invalid R object names

getSymbols() tries to create objects with valid R names, but only does so for some symbols that aren’t valid R object names. For example, BRK-B, BRK B, and BRK/B aren’t valid R objects names because valid names start with a letter or a dot (.), and can only contain letters, numbers, a dot, or an underscore.

Here are some common examples of ticker symbology woes:

^DJI isn’t a valid R object name because it starts with ^. So getSymbols() creates an object with the ^ removed. But then you can’t use the code below to put all the prices into one object. Also notice that getSymbols() returns "^DJI" even though it creates an object with a different name.

symbols <- c("^DJI", "BRK-B")
getSymbols(symbols)
## [1] "^DJI" "BRK-B"

prices <- do.call(merge, mget(symbols))
## Error: value for '^DJI' not found

You have to remove the leading ^ manually. And you have to set fixed = TRUE in the call to sub() because ^ is a special character in regular expressions. Sigh.

prices <- do.call(merge, mget(sub("^", "", symbols, fixed = TRUE)))

Recall that BRK-B also isn’t a valid R name because of the -. But it wasn’t an issue in the code above because getSymbols() made an object named BRK-B, not an object with a valid R name. This is confusing for users because they can’t easily access that object (i.e. head(BRK-B) is an error). This is a pervasive issue for several foreign exchanges with tickers that begin with numbers (e.g. 000001.SZ).

Another issue with symbols that aren’t valid R object names is that many R functions will convert column names into valid R object names, including merge.xts(). So you can’t use the input symbol to subset the resulting xts object. Here’s an example:

# Extract the close prices and remove ".Close" suffix
close_prices <- Cl(prices)
colnames(close_prices) <- sub(".Close", "", colnames(close_prices))

# Extract the close price for "BRK-B"
close_prices[, "BRK-B"]
## Error in `[.xts`(close_prices, , "BRK-B") : subscript out of bounds
colnames(close_prices)
## [1] "DJI" "BRK.B"

setSymbolLookup() exists to help with things like this, but it’s another function users have to learn to use and my experience is that most users don’t know about setSymbolLookup(). I just had to look at the source to figure out how to use it to make getSymbols() return a valid R object for "BRK-B".

setSymbolLookup(BRK.B = list(name = "BRK-B", src = "yahoo"))

If I have to look at the source code to figure out how to do this, users don’t have a chance. You may think, “but you could document how to do this”, but writing documentation isn’t fun. And who reads the documentation anyway? ;-)

4. ‘Defaults’ functionality

The ‘Defaults’ functionality in quantmod comes from the archived ‘Defaults’ package. This functionality allows users to set new default argument values to any getSymbols() source function. This is helpful because it makes importing easier. But it means getSymbols() relies on something other than its parameter values, and it’s good practice to avoid side-effects like this.

This gave users the ability to set preferences like return class, periodicity (e.g. hourly, daily, monthly), connection settings (e.g. credentials, API keys).

‘rfimport’ design and features

The design of ‘rfimport’ is influenced by the DBI package, which provides a set of generic ‘database interface’ functions. Users create connection objects by creating a ‘driver’ object for the specific database and passing that to dbConnect(). Then you pass that connection object to the other DBI functions. For example, to query an execute a statement for a PostgreSQL database:

library(RPostgreSQL)

driver <- PostgreSQL()
conn <- DBI::dbConnect(driver)
student_count <- DBI::dbGetQuery(conn, "select count(*) from students")

The ‘rfimport’ sym_yahoo() function corresponds to the PostgreSQL() function in the example above. And the import_ohlc() function pulls the data like DBI::dbGetQuery(). For example:

library(rfimport)

# The sym_* functions are a combination of the
# driver, connection, and query in DBI
sym <- sym_yahoo("SPY")

# Import some data from Yahoo Finance
spy <- import_ohlc(sym)

Symbol specification

The package introduces a new virtual S3 class "symbol_spec" as the basis for creating sub-classes that hold all necessary information to connect to a data source. This virtual class allows users to combine symbols from different data sources into a single vector. For example: import_ohlc_collection(c(sym_yahoo("SPY"), sym_tiingo("DIA"))) will import data for “SPY” from Yahoo Finance and data for “DIA” from Tiingo.

Each data source will have its own symbol_spec constructor. The constructor will have an argument for the vector of symbols and other arguments for all the other data source connection settings. It will return an object that inherits from the new virtual symbol_spec For example sym_yahoo() will return a c("yahoo", "symbol_spec") class vector.

The help page for the symbol spec constructors can also document the import methods that the data source supports. So help("sym_yahoo") would also contain information about import_ohlc.yahoo() and import_ohlc_collection.yahoo(). That way, users don’t need to know the name of the data source method in order to find its documentation.

Ticker symbology

The package would standardize how index tickers are specified. One possibility is to prefix the ticker with an i or i_ (e.g. iDJI or i_DJI).

It would also standardize how to specify share classes, warrants, preferred, etc. One possibility is to use an underscore to identify share classes, a lowercase ‘w’ for warrants, and a lowercase ‘p’ for preferred. For example, BRK_B for Berkshire Hathaway B shares, FOOw for warrants, BARp for preferred. We could also include a translation table and/or function. This would take a lot of effort to do correctly.

An easier alternative would be creating a way to map source symbols to user-defined values. It makes the most sense to do this is the sym_() constructor. But how should the mapping be specified? Some possibilities:

sym_yahoo(BRK.B = list(symbol = "BRK-B"))
sym_yahoo(c(BRK.B = "BRK-B", "DIA"))
sym_yahoo(c("BRK.B", "DIA"), sym_db = list(BRK.B = "BRK-B"))

Generic import functions

The package will have generic functions import_ohlc() and import_ohlc_collection() to dispatch on symbol_spec sub-classes. import_ohlc() only handles a single symbol and returns one xts object. import_ohlc_collection() will return a list of xts objects for one or more symbols.

Other generic import functions may be added in the future. It may make sense to include generic import functions that return specific types of data. For example: import_statements() for financial statements, and import_bbo() for best bid and offer.

The generics will have a symbol_spec, dates, periodicity, and ... arguments.

dates can be either an ISO 8601 date interval (e.g. dates = "2021-01-01/2021-12-31") or a two-element vector with the start and end dates (e.g. dates = c("2021-01-01", "2021-12-31")). The vector can be Date, POSIXct, or a character that is coercible to one of those two classes.

For example:

# one symbol returned as an xts object
spy <- import_ohlc(sym_yahoo("SPY"), dates = "2021/2022")

# two symbols returned as a list of xts objects
stocks <- sym_tiingo(c("AAPL", "NFLX")) |>
 import_ohlc_collection(dates = "2021-03-01/2022-11-31")

The periodicity argument specifies the interval between data points (e.g. daily, monthly, 15-minute). The data source determines the possible periodicity values, so the data source method is responsible for ensuring the requested periodicity value is available from the data source. ‘rfimport’ will provide a standard way to specify the periodicity values. Then the source methods can translate those values into the value source needs. For example, one data source may use “monthly” for monthly data and another may use “months”. Users would set periodicity = "months" for either source and the source method would translate the value to “monthly”.

Data source methods

Each data source will have a S3 method for the relevant import generics, rather than a src argument like getSymbols(). Calling import_ohlc(sym_yahoo("SPY")) will call the corresponding import_ohlc.yahoo() method to import data from Yahoo Finance. import_ohlc(sym_tiingo("DIA")) will call import_ohlc.tiingo() to import data from Tiingo.

Returned data

The built-in data source methods will automatically include dividends and splits (when available) for daily OHLCV data. They will be included as attributes on the returned OHLCV object. This will allow users to switch between adjusted and unadjusted prices without having to re-download the data.

The built-in data sources methods also will not include the series symbol in the OHLCV column names like getSymbols() currently does. It may make sense to include an attribute with the “source symbol” and the “R symbol” on the returned xts object (e.g. src_symbol = "^DJI" and r_symbol = "iDJI"). Then that attribute can be used later as part of the column names.

Providing ‘Defaults’ functionality

Though we want to avoid side-effects, we probably want to provide a way to set credentials so they do not have to be provided for every import call.

We could provide this functionality in a pure way by creating an options object that holds a list of values. Users would create this object once and pass it to the relevant ‘rfimport’ function (either sym_() or import_ohlc()). The default options could be created by a function like sym__options(). This would be similar to the ‘control’ arguments to many optimization routines (e.g. DEoptim.control()).

Open questions and considerations

How should we specify the class of the returned object?

Set it via a return_class argument in the symbol_spec constructor
- PRO: each source is likely to have a specific data structure, and it wouldn’t require creating a generic import function for each return type.
- CON: allows the potential for one call to an import_*_collection() function to return a list of heterogeneous objects.
Set it via a return_class argument in the import method
- PRO: the method would return a list of objects that are all the same class.
- CON: the generic and/or the default import method would need a return_class argument.
Create a new generic import functions for each return class
- PRO: makes it clear what the import function returns.
- CON: namespace clutter, don’t want generics for every class. Possibly provide generics for most widely used non-xts classes: data.frame, data.table, tibble, tsibble.
The symbol specification can store a function that controls what data is returned. This doesn’t seem appealing because it adds complexity and the user could call that function after the data is returned. For example: sym_yahoo("SPY", return_func = as.data.table).

How can we make it easier to manipulate results?

The most common use case is making a wide xts object with close prices from a list of xts objects. This currently requires several steps that are likely unfamiliar to most users. It should be possible with one or two function calls. We can consider Garrett See’s qmao package for inspiration. For example, use ‘price frames’ to replace do.call(merge, list_of_xts_objects).

There are lots of other common manipulations, like aggregating to a higher periodicity or applying a function to many symbols’ data. The import functions will return something list-like, so users can use lapply() to apply any other function to each series.

I need your help!

I don’t want to do this in a vacuum. Please try the new package, provide feedback, suggestions, feature requests, and help clarify documentation.

I need to know how you’re using getSymbols() and how you would use the new package. I’m not omniscient, so your feedback will be extremely valuable!

xts_0.13.1 on CRAN

Mon, 24 Apr 2023 10:54:00 -0500

An updated version of xts is now on CRAN. This release patches a few issues with the features added in version 0.13.0 and addresses a few maintenance issues that popped up recently.

Patches for features added in 0.13.0

Format each column individually before printing. The top/bottom rows could have a different number of decimal places and there are often multiple varying spaces between columns. For example:
```
 close volume ma
2022-01-03 09:31:00 476.470 803961.000 NA
2022-01-03 09:32:00 476.700 179476.000 NA
2022-01-03 09:33:00 476.540 197919.000 NA
 ...
2023-03-16 14:52:00 394.6000 46728.0000 392.8636
2023-03-16 14:53:00 394.6500 64648.0000 392.8755
2023-03-16 14:54:00 394.6500 69900.0000 392.8873
```
There are 4 spaces between the index and the ‘close’ column, 2 between ‘close’ and ‘volume’, and 4 between ‘volume’ and ‘ma’. There should be a consistent number of spaces between the columns. Most other classes of objects print with 1 space between the columns. The top rows have 3 decimals and the bottom rows have 4. These should also be the same. (#321)
Make column names based on number of columns. The original code was a lot more complicated because it tried to account for truncating the number of printed columns. That functionality was removed because of how complicated it was. So now we can simply create printed column names from the number of columns. (#395)
Only convert printed index values to character. Converting the entire index to character is time-consuming for xts objects with many observations. It can take more than a second to print an xts object with 1 million observations.
Reduce instances when dplyr::lag() warning is shown. The warning was shown whenever it detected dplyr is installed, even if the user wasn’t actively using dplyr. That caused an excessive amount of noise when other packages attached xts (e.g. quantmod). Thanks to Duncan Murdoch for the report and suggested fix! (#393)

Bug Fixes

Return ‘POSIXct’ if object has no ’tclass’. An empty string is not a valid ’tclass’, so it can cause an error.
Fix xts() for zero-row data.frames. The xts() constructor would create an object with a list for coredata when x is a data.frame with no rows. xts objects can’t have lists as coredata, so it should convert x to a matrix and throw an error if x is a list. (#394)
Fix as.data.frame() when converting a data.frame with column names to xts when there’s only one non-time-based column. Previously the xts object would not have the data.frame column name. (#391)
Treat NA the same as NULL for ‘start’ or ’end’ in window.xts(). NULL represents an undefined index value and NA represents an unknown or missing index value. xts does not allow NA as index values, so subsetting an xts or zoo object by NA returns a zero-length object. Therefore a NA (unknown) index value is essentially the same as an undefined index value. (#383, #345)
Warn and remove NA when periodicity() called on date-time vector with any NA values. Previously it threw the uninformative error below. (#289)
```
Error in try.xts(x, error = "'x' needs to be timeBased or xtsible") :
 'x' needs to be timeBased or xtsible
```
Account for timezones when making names for the list split.xts() creates. This was specifically a problem if the xts object’s index was yearmon because as.yearmon.POSIXct() always sets tz = "GMT" when calling as.POSIXlt(), regardless of the xts’ index ’tzone’ attribute. That can cause the as.yearmon() results to be different days for GMT and the index’s timezone. Use format.POSIXct() for “months” because it checks for a ’tzone’ attribute before converting to POSIXlt and calling format.POSIXlt(). The conversion to POSIXlt is important because it checks and uses the ’tzone’ attribute before considering the ’tz’ argument. So it effectively ignores the tz = "GMT" setting in as.yearmon(). This is also the reason for calling as.POSIXlt() before calling as.yearqtr(). (#392)
Ignore attribute order in all.equal(). Attribute order shouldn’t matter. That can be checked with identical().

Chores

Add notes on plot.xts() nomenclature and structure. Also add ASCII art to illustrate definitions and layout. (#103)
Register missing S3 methods and update signatures. With R-devel (83995-ish), R CMD check notes these S3 methods are not registered. It also notes that the signatures for as.POSIXct.tis() and str.replot_xts() do not match the respective generics. R CMD check also thinks time.frequency() is a S3 method because time() is a generic. The function isn’t exported, so renaming won’t break any external code. Thanks to Kurt Hornik for the report. Issues with functionality for ’tis’ were also identified. I removed ’tis’ support entirely because the implementation was not even a bare minimum, and it’s not clear it even worked correctly. (#398)
Add instructions to update old objects. Old xts objects do not have ’tclass’ and ’tzone’ attributes on the index. Add a function to update the object attributes and add a note to the warning to show how to use it. Also, only call tzone() and tclass() once in check.TZ(). Calling these functions multiple times throws multiple warnings for xts objects created before the ’tclass’ and ’tzone’ were attached to the index instead of the xts object. (#306)

quantmod_0.4.22 on CRAN

Sun, 16 Apr 2023 03:41:57 -0500

An updated version of quantmod is now on CRAN. It adds functions HL(), is.HL(), and has.HL() to check for ‘high’ and ’low’ price columns. It also makes accessing Yahoo Finance price, dividend, and split data more robust. getSymbols.FRED() got to and from arguments, like other getSymbols() methods. The remaining changes are bug fixes and maintenace chores.

This was mainly a maintenance and bug fix release, but it does include a couple nice features. quantmod versions 0.4.17 through 0.4.21 included several relevant features that weren’t highlighted in any previous posts. They’re included in a separate section below.

New Features

Exported HL(), is.HL(), and has.HL() functions and added documentation. These were added in 0.4.20 but not exported or included in the documentation.
Switched to the Yahoo Finance v8 JSON endpoint and removed the v7 CSV endpoint. There seems to be a rate limit for the number of tickers you can request via the CSV endpoint. The yfinance python library uses the JSON endpoint and doesn’t seem to have rate limit issues. (#360, #362, #364)
getSymbols.FRED() now supports to and from arguments. So users can set the ‘from’ and ’to’ arguments for FRED data like they can for other data sources like Yahoo. Those values had been ignored and the entire series was always returned. (#368)

Bug Fixes

Fixed getDividends() and getSplits() for stocks that issue monthly dividends. (#372)
Added error handling to getSplits() and getDividends(). getDividends() didn’t handle cases where the download failed, or when dividends needed to be split-adjusted but there were no splits. It also tried to set colnames on the empty xts object that’s returned when there are no dividends. getSplits() had the same colnames issue. Thanks to Chris Cheung for the report! (#366)
Remove “^” prefix from getSymbols() return value. The name of the object getSymbols() created and the symbol value it returned were inconsistent when the ‘Symbols’ argument has a “^” prefix and auto.assign = TRUE:
- getSymbols() removed the “^” from the object it creates, but
- returned the ‘Symbols’ argument unchanged, and
- removed the “^” from the column names of the object it creates.
The example below will create an object named IXIC but the value of sym will be “^IXIC”.
```
 sym <- getSymbols("^IXIC")
```
That means x <- get(sym) will not work because an object named ^IXIC doesn’t exist. (#371)

Chores

Moved jsonlite from Suggests to Imports so it doesn’t cause a problem when a package that doesn’t also Suggest jsonlite but uses getSymbols(). Thanks to Kurt Hornik for the report and fix! (#380)
Fixed S3 method issues. R-devel (83995-ish) added a check that found methods that were not registered (str.replot(), seriesHi.timeSeries(), and seriesLo.timeSeries()). It was also confused by range.bars() and unique.formula.names() because they are named like S3 methods. Neither were exported so they didn’t affect users. Thanks to Kurt Hornik for the report! (#375)

Changes in Prior Versions

New Features

Added HL() and supporting functions. These are analogues to HLC(), OHLC(), etc. Thanks for Karl Gauvin for the nudge to implement them.
Added adjusted close price to getSymbols.tiingo() output. Thanks to Ethan Smith for the suggestion and patch! (#289, #345)
Updated getSymbols.tiingo() to use a Date index for daily data. Thanks to Ethan Smith for the report! (#350)
Updated getOptionChain() to return all the fields that Yahoo Finance provides. Thanks to Adam Childers (@rhizomatican) for the patch! (#318, #336)
Added orats as a source for getOptionChain(). Thanks to Steve Bronder (@SteveBronder) for the suggestion and implementation! (#325)
Added “Defaults” handling to getQuote() and getQuote.yahoo(). Thanks to Ethan Smith for the report. (#291)
Added Bid and Ask fields to the output from getQuote(). Thanks to @jrburl for the report and PR. (#302)

Bug Fixes

Removed check for Yahoo Finance cookies because the site no longer responds with a cookie, and that caused the connection attempt to fail. This affected getSymbols(), getDividends(), and getSplits(). Thanks to several users for reporting, and especially to @pverspeelt and @alihru for investigating potential fixes! (#358)
Updated getSymbols.yahooj() for changes to the web page. (#312)
Removed unneeded arguments to the getSymbols.tiingo() implementation. Thanks to Ethan Smith for the suggestion and patch! (#343, #344)
Load dividends and splits data into the correct environment when the user provides a value for the env argument. The previous behavior always loaded the data into the environment the function was called from. Thanks to Stewart Wright for the report and patch! (#33)
Improved the error message when getSymbols() cannot import data for a symbol because the symbol is not valid or does not have historical data. Thanks to Peter Carl for the report. (#333)
Fixed the getMetals() example in the documentation. The example section previously had an example of getFX(). Thanks to Gerhard Nachtmann (@nachti) for the report and patch! (#330)
Fixed getQuote() so it returns data when the ticker symbol contains an “&”. Thanks to @pankaj3009 for the report! (#324)
Fixed addMACD() when col is specified. Thanks to @nvalueanalytics for the report! (#321)
Fixed issues handling https:// in getSymbols.yahooj(). Thanks to @lobo1981 and @tchevri for the reports and Ethan Smith for the suggestion to move from XML to xml2. (#310, #312)
Fixed getSymbols.yahoo(), getDividends(), and getSplits() so they all handle download errors and retry again. Thanks for @helgasoft for the report on getSymbols.yahoo() and @msfsalla for the report on getDividends() and getSplits(). (#307, #314)
Added implied volatility and last trade date to getOptionChain() output. Thanks to @hd2581 and @romanlelek for the reports. And thanks to @rjvelasquezm for noticing the error when lastTradeDate is NULL. (#224, #304)
Fixed getOptionChain() to throw a warning and return NULL for every expiry that doesn’t have data. (#299)
Fixed “Defaults” to handle unexported function (e.g. getQuote.av(). Thanks to @helgasoft for the report. (#316)
importDefaults() didn’t call get() on vector with length > 1. Thanks to Kurt Hornik for the report. (#319)
chartTheme() now works when quantmod is not attached. Thanks to Kurt Hornik for the report.

xts_0.13.0 on CRAN

Tue, 21 Feb 2023 14:58:00 -0500

An updated version of xts is now on CRAN. This release adds several exciting changes: open-ended time-of-day subsetting, smarter conversions to xts from data.frames/data.tables/tibbles; to.period() handles custom endpoint values, print() truncates rows like data.table, and str() provides more informative output. There are also changes to make xts more consistent with zoo, some minor speed improvements, and the usual smattering of bug fixes.

For some reason, I decided it was a good idea to go through the oldest GitHub issues and determine whether they should be fixed or closed without being fixed. Some of the GitHub issues are open issues from when xts was still on R-Forge! The oldest issue fixed in this release was opened on 2013-09-15, and another one was opened on 2014-03-09! Better late than never I guess. ;-)

New Features

The coolest new feature is the ability to use open-ended ranges for time-of-day subsetting. So you can subset by time of day from the start/end of the day without providing the start/end times (i.e. 00:00:00.000/23:59:59.999). Thanks to Chris Katsulis for the suggestion! (#243)

Here’s an example:

# an hourly sequence of times, and an xts object using them
times <- timeBasedSeq("2023-02-01/2023-02-05/H")
x <- xts(seq_along(times), times)

# function to show the first and last index values for each day
index_range_by_day <- function(x) {
 by_day <- split(x, "days")
 index_range <- function(y) {
 paste(start(y), end(y), sep = " / ")
 }
 lapply(by_day, index_range)
}

# between the start of the day and 5pm
index_range_by_day(x["/T1800"])
## $`2023-02-01`
## [1] "2023-02-01 / 2023-02-01 18:00:00"
## 
## $`2023-02-02`
## [1] "2023-02-02 / 2023-02-02 18:00:00"
## 
## $`2023-02-03`
## [1] "2023-02-03 / 2023-02-03 18:00:00"
## 
## $`2023-02-04`
## [1] "2023-02-04 / 2023-02-04 18:00:00"
## 
## $`2023-02-05`
## [1] "2023-02-05 / 2023-02-05 18:00:00"

# between 5am and the end of the day
index_range_by_day(x["T0500/"])
## $`2023-02-01`
## [1] "2023-02-01 05:00:00 / 2023-02-01 23:00:00"
## 
## $`2023-02-02`
## [1] "2023-02-02 05:00:00 / 2023-02-02 23:00:00"
## 
## $`2023-02-03`
## [1] "2023-02-03 05:00:00 / 2023-02-03 23:00:00"
## 
## $`2023-02-04`
## [1] "2023-02-04 05:00:00 / 2023-02-04 23:00:00"
## 
## $`2023-02-05`
## [1] "2023-02-05 05:00:00 / 2023-02-05 23:00:00"

You can now pass custom endpoints to to.period() using the ‘period’ argument. So you can aggregate on something other than the times that endpoints() supports. Thanks to Ethan B. Smith for the suggestion! (#302)

data(sample_matrix)
x <- as.xts(sample_matrix)

# aggregate to OHLC by week ending on Friday
week_fri <- to.period(x, endpoints(x, "weeks"))
head(week_fri)
## x.Open x.High x.Low x.Close
## 2007-01-07 50.03978 50.42188 49.95041 49.99185
## 2007-01-14 50.03555 50.62395 49.80454 50.60145
## 2007-01-21 50.61724 50.77336 50.02142 50.42090
## 2007-01-28 50.36008 50.43875 49.87468 49.88096
## 2007-02-04 49.85624 50.55509 49.76308 50.55509
## 2007-02-11 50.52389 50.91776 50.45977 50.91160

# aggregate to OHLC by week ending on Wednesday
wednesdays <- which(.indexwday(x) == 3)
week_wed <- to.period(x, wednesdays)
head(week_wed)
## x.Open x.High x.Low x.Close
## 2007-01-03 50.03978 50.42188 49.95041 50.39767
## 2007-01-10 50.42096 50.42096 49.80454 49.97246
## 2007-01-17 49.88529 50.77336 49.88529 50.48644
## 2007-01-24 50.48051 50.60712 50.02142 50.23145
## 2007-01-31 50.20738 50.28268 49.76308 50.22578
## 2007-02-07 50.22448 50.71661 50.19101 50.60611

Enhancements

The release also contains some quality of life changes to print() and str(). Now print() only shows the first and last ‘show.rows’ rows (default 10) if number of rows is > ‘max.rows’ (default 100), similar to data.table (#321).

data(sample_matrix)
x <- as.xts(sample_matrix)
x
## Open High Low Close
## 2007-01-02 50.03978 50.11778 49.95041 50.11778
## 2007-01-03 50.23050 50.42188 50.23050 50.39767
## 2007-01-04 50.42096 50.42096 50.26414 50.33236
## 2007-01-05 50.37347 50.37347 50.22103 50.33459
## 2007-01-06 50.24433 50.24433 50.11121 50.18112
## 2007-01-07 50.13211 50.21561 49.99185 49.99185
## 2007-01-08 50.03555 50.10363 49.96971 49.98806
## 2007-01-09 49.99489 49.99489 49.80454 49.91333
## 2007-01-10 49.91228 50.13053 49.91228 49.97246
## 2007-01-11 49.88529 50.23910 49.88529 50.23910
## ... 
## 2007-06-21 47.71012 47.71012 47.61106 47.62921
## 2007-06-22 47.56849 47.59266 47.32549 47.32549
## 2007-06-23 47.22873 47.24771 47.09144 47.24771
## 2007-06-24 47.23996 47.30287 47.20932 47.22764
## 2007-06-25 47.20471 47.42772 47.13405 47.42772
## 2007-06-26 47.44300 47.61611 47.44300 47.61611
## 2007-06-27 47.62323 47.71673 47.60015 47.62769
## 2007-06-28 47.67604 47.70460 47.57241 47.60716
## 2007-06-29 47.63629 47.77563 47.61733 47.66471
## 2007-06-30 47.67468 47.94127 47.67468 47.76719

Now str() outputs more descriptive information for xts objects. It differentiates between xts objects that are empty (no data and zero-length index), zero-width (no data and has index values), or zero-length (no data–but has a column dimension and may have column names–and zero-length index). It also adds column names to the output. (#168, #378)

empty <- numeric()

# empty -- no data and zero-length index
str(.xts(NULL, empty))
## An empty xts object 
## Data: double [0, 0]
## Index: POSIXct,POSIXt [0] (TZ: "")

# zero length -- no rows of data and a zero-length index,
# but has a column dimension and may have column names
str(.xts(empty, empty))
## A zero-length xts object 
## Data: double [0, 1]
## Index: POSIXct,POSIXt [0] (TZ: "")

zero_length_with_colnames <-
 .xts(matrix(empty, dimnames = list(NULL, "zero")), empty)
str(zero_length_with_colnames)
## A zero-length xts object 
## Data: double [0, 1]
## Columns: zero
## Index: POSIXct,POSIXt [0] (TZ: "")

# zero width -- no data and has index values
str(xts(NULL, Sys.Date()))
## A zero-width xts object on 2023-02-21 / 2023-02-21 containing:
## Data: double [0, 0]
## Index: Date [1] (TZ: "UTC")

There’s a nice improvement to as.xts() for data.frame and similar objects (e.g. data.table, tibble). It will look for a time-based column in the data.frame if it cannot create an index from the row names. (#381)

d <- data.frame(as.Date("2023-02-21"), A = 21, B = 42)
as.xts(d)
## A B
## 2023-02-21 21 42

This release also includes a new xts method for na.fill() that significantly increases performance when ‘fill’ is a scalar. And it adds a startup warning that dplyr::lag() breaks method dispatch, which means calls to lag(my_xts) won’t work any more, and suggests a couple ways to work around that breakage.

Bug Fixes

Fixed a typo in the Description section of the documentation for period.apply() (#205), and added detail to the argument definitions. The original Description has:

the data from INDEX[k] to INDEX[k+1]

But that’s not consistent with the code. It should be:

the data from INDEX[k]+1 to INDEX[k+1]
Made merge.xts() results consistent with merge.zoo() for zero-width objects. Previously, merge.xts() returned an empty xts object if called on two or more zero-width xts objects. merge.zoo() would return a zero-width object with the correct index. (#227, #379)
Also made merge.xts() results consistent with merge.zoo() for zero-length xts objects that have columns. The result of merge.xts() did not include the columns of any objects that had one or more columns, but zero rows. A join should include all the columns of the joined objects, regardless of the number of rows in the object. This is consistent with merge.zoo(). Thanks to Ethan B. Smith for the report and testing! (#222)
Fixed a long-standing issue with Ops.xts(). Now it always returns an object with the same class as the first (left-hand side) argument. It previously returned an xts object even if the first argument was a subclass of xts. (#49)
Squashed a bug in reclass() that did not copy the tclass, tzone, or tformat from ‘match.to’ to the result object. Now it always copies those index attributes. (#43)

Other

Migrated unit tests from RUnit (which is actively maintained, but no longer actively developed) to tinytest. Thanks Mark van der Loo!
Added to the endpoints() documentation to make it clearer that the result is based on the UNIX epoch (midnight 1970, UTC) and not the first observation in the xts index. Thanks to GitHub user Eluvias for the suggestion! (#299)
Removed an unnecessary check in na.locf() (which is not user-facing). Thanks to GitHub user @cgiachalis for the suggestion! (#307)
Updated C entry points so they’re not able to accidentally be found via dynamic lookup (i.e. .Call("foo", ...)). This makes each call to the C code a few microseconds faster, which is nice. (#260)

xts_0.12.2 on CRAN

Sat, 15 Oct 2022 10:21:00 -0500

An updated version of xts is now on CRAN. This release is a big one, with lots of changes. Plotting functionality got a lot of attention. Another notable change is that merge.xts() now supports suffixes. Plus the obligatory bug fixes and refinements to make xts more robust.

Plotting functionality enhancements and bug fixes

You can now omit the data time range from the upper-right portion of a plot by setting main.timespan = FALSE. (#247)
plot.xts() gained a yaxis.ticks argument to control the number of y-axis grid lines, instead of always drawing 5 y-axes grid lines. Thanks to Fredrik Wartenberg for the feature request and patch! (#374)
Fixed addEventLines() when plotted objects have a ‘yearmon’ index. The ISO-8601 range string was not created correctly. Thanks to @paessens for the report. (#353)
The ‘ylim’ argument is now robust against numerical precision issues. Thanks to @bollard for the report, PR, and a ton of help debugging intermediate solutions! (#368)
Series added to a panel now extend the panel’s y-axis. Previously the y-axis limits were based on the first series’ values and not updated when new series were added. So values of the new series did not appear on the plot if they were outside of the original series’ min/max. Thanks to Vitalie Spinu for the report and help debugging and testing! (#360)
All series added to any panel of a plot now update the x-axis of all panels. So the entire plot’s x-axis will include every series’ time index values within the original plot’s time range. This behavior is consistent with chart_Series(). Thanks to Vitalie Spinu for the report and help debugging and testing! (#360, #216)
All y-values are now plotted for series that have duplicate index values, but different data values. Thanks to Vitalie Spinu for the report and help debugging and testing! (#360)
Adding a series can now extend the x-axis before/after the plot’s existing time index range by setting extend.xaxis = TRUE. That ensures all of the new series’ time index values are included in the plot. extend.xaxis = FALSE by default to maintain backward compatibility. Thanks to Vitalie Spinu for the report and help debugging and testing! (#360)

Other enhancements and bug fixes

Ops.xts() no longer changes column names (via make.names()) when the two objects do not have identical indexes. This makes it consistent with Ops.zoo(). Thanks to Anton Antonov for the report! (#114)
Subsetting a zero-length xts object now returns an object with the same storage type as the input. It previously always returned a ’logical’ xts object. (#376)
tclass() and tzone() now return the correct values for zero-length xts objects, instead of the defaults from the .xts() constructor. Thanks to Andre Mikulec for the report and suggested patch! (#255)
first() and last() now return a zero-length xts object when n = 0. They previously returned the entire object. This is consistent with the default head() and tail() functions, and data.table’s first() and last() functions. Thanks to Ethan B. Smith for the report and patch! (#350)
Subsetting a zero-width xts now returns an object with the same class, tclass, tzone, and xtsAttributes as the input. Thanks to @shikokuchuo for the report! (#359)
Now endpoints() always returns last observation. Thanks to GitHub user Eluvias for the report. (#300)
Now endpoints() errors for every on value when k < 1. It was not throwing an error for k < 1 for on of “years”, “quarters”, or “months”. Thanks to Eluviasfor the report. (#301)
Fixed a breaking change (introduced in 0.11.0) in window() for yearmon and yearqtr indexes. In xts < 0.11.0, window.zoo() was dispatched when window() was called on a xts object because there was no window.xts() method. window.zoo() supports additional types of values for the start argument, and possibly other features. Thanks to @annaymj for the report. (#312)
Clarified documentation for axTicksByTime() to say that returns index locations (e.g. 1, 2, 3) and not timestamps. Thanks to Gabor Grothendieck for the suggestion and feedback. (#354)
Fixed merge.xts() on xts objects containing complex types when fill is provided. It previously threw an error because it treated fill as double instead of complex. Thanks to Gabor Grothendieck for the report. (#346)
Added a message to tell the user how to disable the “object timezone is different from the system timezone” warning (set options(xts_check_TZ = FALSE)). Thanks to Jerzy Pawlowski for the nudge. (#113)
rbind() now handles xts objects without dim attribute. It previously threw an obscure error if one of the xts objects did not have a dim attribute. (#361)
split.xts() now always return a named list, making it consistent with split.zoo(). Thanks to Gabor Grothendieck for the report. (#357)
xts objects with a zero-length POSIXct index now have a zero-length POSIXct vector instead of a zero-length integer vector for the index. Thanks to Jasper Schelfhout for the report and PR! (#363, #364)
Add supported for suffixes in merge.xts() results. The suffixes are consistent with merge.default() and not merge.zoo(), because merge.zoo() automatically uses “.” as a separator between column names, but the default method doesn’t. Thanks to Alex Chernyakov for the initial report, QiuxiaoMu for testing, and Pierre Lamarche for the nudge. Better late than never? (#38, #371)

You may have noticed that several of these issues have been open a long time. I’ve been revisiting historical issues and deciding whether to implement them or close them. I’ve already implemented some cool ones in the development version of xts.

I’m most excited about open-ended time-of-day subsetting. Now you can do things like:

x["/T1700"] # start of the day until 5pm
x["T0500/"] # 5am until the end of the day

Mean rolling correlation of XLF constituents

Sat, 19 Sep 2020 07:23:00 -0500

I follow Quantocracy on Twitter, and I found Rolling mean correlation in the tidyverse by Robot Wealth. They say to let them know if you’d approach it differently. I would, so I thought it would be interesting to replicate the analysis using tools I’m familiar with: xts and TTR.

The xts package is an extension of the very excellent zoo package. zoo objects are for ordered observations. Underneath, they are a matrix that can be ordered by anything: numbers, letters, dates, times, and more. xts objects are a special type of zoo object that can only be ordered by a date-time. They are the most common data structure used for working with financial time series, and are used in many of the major time series packages. You can find more details about xts objects in the xts vignette.

I like xts so much, I took over as maintainer when Jeff Ryan started working at a hedge fund that didn’t let him continue open source work.

TTR is the first R package I wrote, all the way back in 2007, before R was cool! TTR has a collection of over 50 technical indicators for creating technical trading rules. The package also provides fast implementations of common rolling-window functions, and several volatility calculations. We’re going to use its ROC() (rate-of-change) function to calculate returns.

Okay, now on to the code!

First, you need to download the data set. I’ve saved a copy of the data that was provided in the Robot Wealth post. You can download it here. The Robot Wealth post used the tidyverse, so data is saved in the preferred tidyverse data structure, a tibble (it’s like a data frame).

Then we need to load() the data into our R session. This creates an object named prices_xlf. The data has 10 columns: ticker, date, open, high, low,close, volume, dividends, closeunadj, inSPX. We’re only going to use the date, ticker, and close columns.

Now we need to convert the prices_xlf tibble into an xts object. There’s no standard way to do this, because tibbles don’t have a pre-defined structure for financial time series. We can ‘read’ and convert the data from a tibble into a zoo object using read.zoo(). We’re using read.zoo() because there isn’t a read.xts() function, and we can easily convert from zoo to xts using as.xts().

load("xlfprices.RData")

library(xts)
x <- read.zoo(prices_xlf[, c("date", "ticker", "close")],
 index.column = "date", split = "ticker")
## Warning in zoo(rval4[[i]], ix[[i]]):
## some methods for "zoo" objects do not work
## if the index entries in 'order.by' are not unique
## Error in merge.zoo(AFL = structure(c(30.54, 29.74, 29.475, 29.66, 29.95, :
## series cannot be merged with non-unique index entries in a series

We pass prices_xlf to read.zoo(), but only with the columns we need for our analysis. The index.column argument tells read.zoo() which column in the data has the ordered index. The split argument allows us to reshape the data from a long format into a wide format, where each ticker is in its own column. This is the standard format for xts objects, because it makes working with financial time series a lot easier.

Now to run the code. Hmm… it throws an error. The error means there are duplicate dates for at least one of the tickers in the prices_xlf object. Depending on how many duplicates there are, this may or may not bias the results, but we should remove them anyway so the analysis is correct. This is another benefit of xts/zoo objects.

Let’s take a look at the duplicates, and then remove them.

# find the duplicates
duplicate_rows <- duplicated(prices_xlf)

# view the duplicates
head(prices_xlf[duplicate_rows, ])
## ticker date open high low close volume dividends
## 61839 CB 2016-01-29 110.18 113.17 110.00 113.07 4205800 0
## 61904 CB 2016-01-28 108.55 109.84 108.42 109.52 3313800 0
## 61969 CB 2016-01-27 108.15 109.91 107.07 108.10 3433200 0
## 62034 CB 2016-01-26 108.51 109.76 108.00 108.58 2669500 0
## 62099 CB 2016-01-25 109.95 109.99 107.82 108.00 2985000 0
## 62164 CB 2016-01-22 109.70 110.97 109.48 110.04 2296000 0
## closeunadj inSPX
## 61839 113.07 TRUE
## 61904 109.52 TRUE
## 61969 108.10 TRUE
## 62034 108.58 TRUE
## 62099 108.00 TRUE
## 62164 110.04 TRUE

# remove the duplicates
prices <- unique(prices_xlf)

The duplicated() function returns a logical (true/false) vector as long as the number of rows in your data. Any row it finds that matches a previous row in the data will be TRUE in the vector. Note that only the duplicated rows are TRUE. The first rows found will be FALSE. Subsetting prices_xlf by the duplicated() result will return the rows that exist somewhere in previous rows in the data.

You can probably guess what the unique() function does. It removes all the duplicated rows. Now that we removed the duplicates, we can try the read.zoo() call again.

# reshape data into wide format
x <- read.zoo(prices[,c("date", "ticker", "close")],
 index.column = "date", split = "ticker")

Great, that worked! Now we will convert from zoo to xts, because xts gives us a handful of fancy features in addition to all the awesomeness that comes with zoo. We’re not going to use those fancy features in this post, but I promise, they’re fancy.

Next we will calculate returns using the ROC() (rate-of-change) function from the TTR package.

# convert from zoo to xts
x <- as.xts(x)

# calculate returns
library(TTR)
returns <- ROC(x) # log returns
returns <- ROC(x, type = "discrete") # arithmetic returns

The single call to as.xts() is all you need to convert from zoo to xts. ROC() calculates log returns by default, but it will calculate discrete (or arithmetic) returns if you set type = "discrete". We’re going to use discrete returns to keep things consistent with the Robot Wealth post.

Next we will create a function to calculate the mean pairwise correlation for each pair of columns in our xts object. We can get all the pairwise correlations from the correlation matrix.

Once we calculate the correlation matrix, we can calculate the mean correlation by taking the mean of the entire matrix. You may be thinking that this will take the mean of each correlation value two times (once for the upper triangle of the matrix, and another time for the lower triangle). But the values in both triangles are the same, because the matrix is symmetric about the diagonal. So this will not affect the mean calculation.

mean_cor <-
 function(returns)
{
 # calculate the correlation matrix
 cor_matrix <- cor(returns, use = "pairwise.complete")

 # set the diagonal to NA (may not be necessary)
 diag(cor_matrix) <- NA

 # calculate the mean correlation, removing the NA
 mean(cor_matrix, na.rm = TRUE)
}

Since our data are in a wide format, calculating the correlation matrix is takes a single call to the cor() function that comes with your R installation.

Then we set the diagonal of the matrix to NA because they are all equal to 1. That may not be necessary, but it could bias the results, and I’m not ready to spend time thinking about it. :)

Finally, we take the mean of the entire correlation matrix.

Now that we have a handy-dandy mean_cor() function to calculate the mean pairwise correlations, we can call the function on a rolling, 60-day period. We can do this with the rollapply() function from the zoo package.

# calculate the rolling mean correlation over 60 periods
cors <- rollapply(returns, 60, mean_cor, by.column = FALSE, align = "right")

We set align = "right" in order ‘right-align’ the result. That means the timestamp for each rolling window will be the right-most (or last/largest) value in the window. This is important because we do not know the value for the rolling period until the end of the window. We would severely bias our results if we used the ’left’ (first) or ‘center’ (middle) timestamp for our window calculation.

We also need to set by.column = FALSE. Otherwise, the rollapply() function will run the function on each column of the xts object individually. And it doesn’t make sense to try and calculation the correlation matrix of a single series.

Now, let’s plot our rolling 60-day correlations. We only need to call the plot() function to get a quick look.

plot(cors, main = "Rolling mean XLF correlations")

The main thing I like about my approach is how few lines of code it takes. The most complicated piece is the mean_cor() function, but even that is fairly straightforward.

You can do similar analysis using this same pattern. You need to create another function to calculate the metric, but everything else will be the same.

Like the Robot Wealth version, you can easily do this entire analysis in memory. You don’t have to bother with chunking it up into smaller pieces and piecing it back together.

One difference is that the data in their version is just under 3 million (!) rows, and 6 columns (~18 million data points). This version is 1346 rows and 65 columns (less than 100,000 data points). So you could quickly do the analysis in memory this way on ETFs or indexes with many more constituents (e.g. the Russell 3000). I’ve worked on xts data sets with ~1 billion rows of tick data on my machine with 32GB of RAM.

xts_0.12.1 on CRAN

Sun, 13 Sep 2020 10:49:00 -0500

An updated version of xts reached CRAN on 2020-09-09. Time-of-day subsetting (e.g. x["T10:00/T13:00"]) is 200x faster! (This post includes some notes on some nifty changes in 0.12.0 too, since I didn’t post about 0.12.0 when it was released.)

This is a long-overdue post. I’m trying to get int the habit of posting and announcing each of my package releases. So I’m writing posts this morning for the most recent release of the most popular packages I maintain. I released an updated version of xts to CRAN on 2020-09-09.

I’m going to highlight a handful of the changes that involve:

moving index class and index timezone from the xts object itself to the index,
improvements to time-of-day subsetting (x["T10:00/T13:00"])
user contributions, and
several bug fixes.

Changes to index attributes

The most significant user-facing change in this release is a bug-fix for the functions that would change the tclass of the xts index. This would happen in calls to reclass(), period.apply(), and logical operations on POSIXct indexes. Thanks to Tom Andrews for the report and testing, and to Panagiotis Cheilaris for contributing test cases (#322, #323).

This was a regression due to the main change in version 0.12-0. All the index-attributes were removed from the xts object and are now only attached to the index itself (#245). We took great care to maintain backward compatibility, and throw warnings when deprecated functions are called and when index-attributes are found on the xts object. I apologize for taking this long to get the fix on CRAN.

Time-of-day subsetting

Another change in 0.12-0 is a significant (~200x!) performance improvement to time-of-day subsetting, thanks to StackOverflow user3226167 (#193).

Then Claymore Marshall added many examples of time-of-day subsetting to ?subset.xts. He also fixed a bug in time-of-day subsetting where subsetting by hour only returned wrong results (#304, #326, #328).

User contributions

There were also several more user-contributed changes. I love when the community that uses open source software contributes to the project! It’s so much more fun than working on it by myself. :)

These are in a bulleted list in order to highlight each user’s contribution.

Jasen Macike updated plot.xts() to support y-axis labels via the ylab argument (#333, #334).
Michael Chirico added an internal isUTC() function that recognizes many UTC-equivalent time zones (#319).
Dirk Eddelbuettel updated the C API header to fix the signatures of do_merge_xts() and is_xts, which did not return the required type to be called via .Call(). Thanks to Tomas Kalibera for the report (#317), and to Dirk for the PR (#337).
This is a breaking change, but it’s only in the C API, and is required to avoid the potential to crash your R session.
Harvey Smith fixed the possible values for the major.ticks, minor.ticks, and grid.ticks.on arguments to plot.xts() in the Details section of the documentation (#291).
Performance for the period.XYZ() functions (sum, prod, min, max) is much faster (#278). Thanks to Chris Katsulis for the report, and Harvey Smith for several examples.

Bug fixes

first() now operates correctly on non-xts objects when n = -1. Previously it would always return the last two values. Thanks to GitHub user vxg20 for the report (#325).

The .xts() constructor would create an xts object with row names if x had row names. This shouldn’t happen, because xts objects do not have or support row names (#298).

Several binary operations (e.g. +, -, !=, <, etc.) on variations of uncommon xts objects with other xts, matrix, or vector objects, could result in malformed xts objects (#295). Some examples of the types of uncommon xts objects: no dim attribute, zero-width, zero-length.

Calling as.matrix() on an xts object without a dim attribute no longer throws an error (#294).

merge.xts() now honors check.names = FALSE (#293). It also creates shorter column names when passed unnamed objects, consistent with zoo (#248).

as.zoo.xts() is now only registered for zoo versions prior to 1.8-5. Methods to convert an object to another class should reside in the package that implements the target class. Thanks to Kurt Hornik for the report (#287).

.parseISO8601() no longer has a potential length-1 logical error. Thanks to Kurt Hornik for the report (#280).

endpoints() now honors k > 0 when on = "quarters". Thanks to @alkment for the report (#279).

TTR_0.24.2 on CRAN

Sun, 13 Sep 2020 08:41:00 -0500

An updated version of TTR is on CRAN now. This is mainly a bug-fix release. There were several issues in the underlying C code that caused various issues. I’ll spare you the gory details. If you’re really interested, you can find them in the CHANGES file.

[This is another one of my long-overdue posts. I’m trying to get int the habit of posting and announcing each of my package releases. So I’m writing posts this morning for the most recent release of the most popular packages I maintain.]

I released an updated version of TTR to CRAN on 2020-09-01.

Now for the bug fixes you might actually notice.

ALMA() could return an object whose length didn’t match the length of the input when the input was not an xts object. This bug has been around for years. I’m sorry I just now got to it.

The bug was caused by the differences in rollapply.default() in zoo and rollapply.xts(). The xts method pads with NA by default, whereas the default version does not. Thanks to GitHub user marksimmonds for the report! (#29)
MFI() has been fixed for the case where money flow is always > 0. The denominator of the money ratio is zero if there is no negative money flow for n consecutive observations (e.g. during a strong up-trend). This causes the money flow index to be Inf. Now the money flow index is set to 100 in this case.

Also, the money ratio will be NaN if there’s no money flow for n consecutive observations (e.g. if there are no trades). This causes the money flow index to be NaN. Now the money flow index is to 50 in this case. Thanks to GitHub user jgehw for the report, reproducible example, and suggested patch! (#81)

I look forward to your questions and feedback! If you have a question, please ask on Stack Overflow and use the [r] and [ttr] tags. Or you can send an email to the R-SIG-Finance mailing list (you must subscribe to post). Open an issue on GitHub if you find a bug or want to request a feature. Please read the contributing guide first! It will help save time for both of us. ;-)

quantmod_0.4-16 on CRAN

Tue, 10 Mar 2020 07:23:00 -0500

A new version of quantmod is on CRAN! One really cool thing about this release is that almost all the changes are contributions from the community.

Ethan Smith made more excellent contributions to getQuote() in this release. It no longer throws an error if one or more symbols are missing. And it handles multiple symbols in a semicolon-delimted string, just like getSymbols(). For example, you can get quotes for multiple symbols by calling getQuote("SPY;AAPL").

@jrburl made a great enhancement to getOptionChain(). Now, instead of throwing an error, it sets volume and open interest to NA if those columns are missing from the Yahoo Finance data. They also submitted a pull request to handle cases where Bid and/or Ask data are missing too. Unfortunately, that pull request came after I had already pushed to CRAN.

Unfortunately, Yahoo! Finance continues to make changes to how they return data. Thankfully, quantmod users are diligent and catch these changes. @helgasoft noticed the split ratio delimiter changed from / to :. So, for example, a 2-for-1 split was 1/2 but is now 2:1.

@helgasoft also noticed that Alpha Vantage discontinued their “batch quote” functionality, which broke getQuote(). Thankfully, they provided a patch that used the single-quote request, so getQuote() works with Alpha Vantage again!

@matiasandina noticed that I had incorrectly labelled the dividend pay date as the ex-dividend date in the data getQuote() returned from Yahoo Finance. Whoops!

See the news file for the other bug fixes. Thanks for using quantmod!

microbenchmark_1.4-7 on CRAN

Thu, 10 Oct 2019 06:21:00 -0500

I pushed an updated microbenchmark to CRAN a couple weeks ago. There were two noteworthy changes, thanks to great contributions from @MichaelChirico and @harvey131.

Michael fixed a bug in the check for whether the unit argument was a character string (#9, #10). The prior behavior was an uninformative error.

Harvey added a feature to allow you to use a string for common checks: “equal”, “identical”, and “equivalent” (#16). So you don’t need to create a custom function to use all.equal(), all.equal(..., check.attributes = FALSE), and identical, respectively.

I also converted the unit tests to use RUnit. I also made some changes to the repo, including adding a contributing guide and issue/pull-request templates.

I look forward to your questions and feedback! If you have a question, please ask on Stack Overflow and use the [r] and [microbenchmark] tags. Or you can send an email to the R-SIG-Finance mailing list (you must subscribe to post). Open an issue on GitHub if you find a bug or want to request a feature.

quantmod_0.4-14 on CRAN

Mon, 25 Mar 2019 06:53:00 -0500

I just pushed a new release of quantmod to CRAN! getSymbols() no longer stops if there’s a problem with a ticker symbol. And getQuote() can now import quotes from Tiingo.

I’m most excited about the update to getSymbols() so it doesn’t throw an error and stop processing if there’s a problem with one ticker symbol. Now getSymbols() will import all the data it can, and provide an informative error message for any ticker symbols it could not import.

At a close second, I’m also excited about being able to import quotes from Tiingo using getQuote()!. But don’t thank me; thank Ethan Smith for the feature request [#247] and pull request [#250].

There are also several bug fixes in this release. The most noticeable are fixes to getDividends() and getSplits(). Yahoo! Finance continues to have stability issues. Now it returns raw dividends instead of split-adjusted dividends (thanks to Douglas Barnard for the report [#253]), and the actual split adjustment ratio instead of the inverse (e.g. now 1/2 instead of 2/1). I suggest using a different data provider. See my post: Yahoo! Finance Alternatives for some suggestions.

See the news file for the other bug fixes. Please let me know what you think about these changes. I need your feedback and input to make quantmod even better!

xts 0.11-2 on CRAN

Tue, 06 Nov 2018 06:35:00 -0600

xts version 0.11-2 was published to CRAN yesterday. This is quick a bug-fix release.

Notable changes are below:

The xts method for shift.time() is now registered. Thanks to Philippe Verspeelt for the report and PR (#268, #273).
An if-statement in the xts constructor will no longer try to use a logical vector with length > 1. Code like if (c(TRUE, TRUE)) will throw a warning in an upcoming R release, and this patch will prevent that warning. Thanks to Hugh Parsonage for the report and PR (#270, #272).
Fix subset when index(i) and i contain duplicates. Observations were being incorrectly dropped, and behavior is now consistent with zoo. Thanks to Stack Overflow user scs for the report, and Philippe Verspeelt for the help debugging (#275).
Make column names for merge() results with unnamed objects shorter and more like zoo (#248). Previously, column names could be hundreds, even thousands, of characters. This change has the added benefit of making na.fill() much faster (#259). NOTE: This may BREAK existing code for integer unnamed objects.
The to.period() family of functions now use the index timezone when converting intraday index values to daily values (or lower frequency). Previously, the dates would be calculated as UTC dates, instead of dates in the local timezone (as they are now). Thanks to Garrett See and Gabor Grothendieck for the reports (#53, #277).

xts 0.11-1 on CRAN

Wed, 12 Sep 2018 13:36:00 -0500

xts version 0.11-1 was published to CRAN this morning. xts provides data structure and functions to work with time-indexed data. This release contains some awesome features that will transparently make your xts code even faster!

There’s a new window.xts() method, thanks to Corwin Joy (#100, #240). Corwin also refactored and improved the performance of the binary search algorithm used to subset xts objects. Tom Andrews reported and fixed a few related regressions (#251, #263, #264).
The na.locf.xts() method loops over columns of multivariate objects in C code, for improved speed and memory performance. Thanks to Chris Katsulis and Tom Andrews for their reports and patches (#232, #233, #234, #235, #237).
After many years, merge.xts() can finally handle multiple character or complex xts objects. Thanks to Ken Williams for the report (#44).
You can use “quarters” to specify tick/grid mark locations on plots. Thanks to Marc Weibel for the report (#256).

There are also a few notable bug fixes:

make.index.unique() always returns a unique and sorted index. Thanks to Chris Katsulis for the report and example (#241).
Plots have better axis tick mark locations, thanks to Dirk Eddelbuettel (#246).
periodicity() now warns instead of errors if the xts object contains less than 2 observations (#230).
first() and last() now keep dims when they would otherwise be dropped by a regular row subset. This is consistent with head() and tail(). Thanks to Davis Vaughan for the report (#226).
An invalid ISO8601 range subset now returns no data instead of all rows (#96).

As always, I’m looking forward to your questions and feedback! If you have a question, please ask on Stack Overflow and use the [r] and [xts] tags. Or you can send an email to the R-SIG-Finance mailing list (you must subscribe to post). Open an issue on GitHub if you find a bug or want to request a feature, but please read the contributing guide first!

Learning to code is worth it

Fri, 24 Aug 2018 13:03:00 -0500

Someone recently shared this great talk by Chris Allen from lambda conf 2017. The title of the talk is “Why Johnny Can’t Code Good,” but the content is more about how to grow as a programmer. His points are true whether you’re just starting out, or have been coding for years.

My notes from Chris’ talk are below, in the order they appear in the presentation. My thoughts are in parentheses.

He’s not talking about people who can’t code, but rather those who haven’t learned to code. They usually work in the industry, but only know just enough to get things done. They aren’t super-independent, and may have trouble taking on new things. All of us have been here before (often multiple times in different disciplines; whenever we start something new).

The problem is when you get stuck working on pre-defined tasks in a well-defined space. The computer science industry’s priorities are now wrapped around accommodating people who are comfortable staying there.

For example, nodejs, Go, and other new languages optimize for “zero-to-blog”, not for something maintainable, or that allows people to build useful abstractions. There’s more focus on what is marketable in a short blog post.

(An interesting point on hiring:) you can’t “hire only the best”, unless you’re able to attract talent. You can’t select for it, because other employers trying to “hire only the best” too.

Using new tools is not learning new skills. Learning how to learn, without someone feeding you pre-digested material, is how you grow.

Don’t say something is easy. It’s always going to be harder for some, it depends on their context. Either something is worth learning, or it’s not. There may be some return-on-investment cutoff point, but it’s either worth trying or it’s not.

We’re an amnesiac culture. We don’t remember 5 years ago (and we don’t like learning history).

Don’t train how you play. Train harder, be more focused and structured. (This is a sports analogy. The idea is that you work harder in training than you will work during a game. That makes the game seem easy. I have a habit of doing harder things when I’m doing something of lesser consequence. People often asked me how I learned something, and the answer is often “I broke something and learned until I understood what I broke and what was needed to fix it.” This is sub-optimal at work, when a faster solution that isn’t fully understood is preferable to a slower solution that is.)

R/Finance 2018 Registration

Fri, 20 Apr 2018 13:26:00 -0500

This year marks the 10th anniversary of the R/Finance Conference! As in prior years, we expect more than 250 attendees from around the world. R users from industry, academia, and government will joining 50+ presenters covering all areas of finance with R. The conference will take place on June 1st and 2nd, at UIC in Chicago.

You can find registration informationon the conference website, or you can go directly to the Cvent registration page.

Note that registration fees will increase by 50% at the end of early registration on May 21, 2018.

We are very excited about keynote presentations by JJ Allaire, Li Deng, and Norm Matloff. The conference agenda (currently) includes 18 full presentations and 33 shorter “lightning talks”. As in previous years, several (optional) pre-conference seminars are offered on Friday morning. We’re still working on the agenda, but we have another great lineup of speakers this year!

There is also an (optional) conference dinner at Wyndham Grand Chicago Riverfront in the 39th Floor Penthouse Ballroom and Terrace. Situated directly on the riverfront, it is a perfect venue to continue conversations while dining and drinking.

We would to thank our 2018 Sponsors for the continued support enabling us to host such an exciting conference:

UIC Liautaud Master of Science in Finance
  Microsoft
R Consortium
  RStudio
  William Blair
  Citadel
Quasardb

On behalf of the committee and sponsors, we look forward to seeing you in Chicago!

Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson, Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich

Goodbye Google, Hello Tiingo!

Fri, 13 Apr 2018 11:14:00 -0500

First, the bad news:

Google Finance no longer provides data for historical prices or financial statements, so we say goodbye to getSymbols.google() and getFinancials.google(). (#221) They are now defunct as of quantmod 0.4-13.

Now, the good news:

Thanks to Steve Bronder, getSymbols() can now import data from Tiingo! (#220) This feature is part of quantmod 0.4-13, which is now on CRAN. Windows and Mac binaries should be built in a day or two.

Tiingo is a web service that provides tools and data for financial analysis. They provide daily price history for US stocks and ADRs, Chinese stocks, Mutual Funds, and ETFs. There is up to 30+ years of history, including raw prices and split/dividend adjusted prices.

All this data is accessible for free, with reasonable symbol and bandwidth limits. All you need to get started is a one-time registration for an API token. You should see your API token just above the beginning of the metadata section, after logging in, of course. Tiingo has a well-documented daily price data API that returns either JSON or CSV.

To get started, install the latest quantmod from CRAN. Then you call:

getSymbols("MSFT", src = "tiingo", api.key = "\[your key\]")

Where you replace "\[your key\]" with the API key you receive after registration. You can use setDefaults() to set your API key one time, and use it for all getSymbols.tiingo() calls.

setDefaults("getSymbols.tiingo", api.key = "\[your key\]")

Other notable changes:

There is now a getQuote.alphavantage() that allows you to pull real-time quotes from Alpha Vantage. Thanks to Ethan Smith! (#213, #223)
Speaking of Alpha Vantage, getSymbols.av() can now pull weekly and monthly adjusted prices. (#212)
The URL in getSymbols.oanda() and getFX() has been updated, so they work again. (#225)
getQuote.yahoo() no longer errors when a field has no data for all requested tickers. (#208)
saveChart() actually saves charts now (#154). Brilliant!

xts 0.10-2 on CRAN

Mon, 19 Mar 2018 05:30:00 -0500

This xts release contains mostly bugfixes, but there are a few noteworthy features. Some of these features were added in version 0.10-1, but I forgot to blog about it. Anyway, in no particular order:

endpoints() gained sub-second accuracy on Windows (#202)!
na.locf.xts() now honors x and xout arguments by dispatching to the next method (#215). Thanks to Morten Grum for the report.
na.locf.xts() and na.omit.xts() now support character xts objects. Thanks to Ken Williams and Samo Pahor for the reports (#42).

Many of the bug fixes were related to the new plot.xts() introduced in 0.10-0. And a handful of bug fixes were to make xts more consistent with zoo in some edge cases.

R/Finance 2018: Call for Papers

Tue, 09 Jan 2018 11:32:00 -0600

R/Finance 2018: Applied Finance with R

June 1 and 2, 2018

University of Illinois at Chicago

Call For Papers

The tenth annual R/Finance conference for applied finance using R will be held June 1 and 2, 2018 in Chicago, IL, USA at the University of Illinois at Chicago. The conference will cover topics including portfolio management, time series analysis, advanced risk tools, high-performance computing, market microstructure, and econometrics. All will be discussed within the context of using R as a primary tool for financial risk management, portfolio construction, and trading.

Over the past nine years, R/Finance has included attendees from around the world. It has featured presentations from prominent academics and practitioners, and we anticipate another exciting line-up for 2018.

We invite you to submit complete papers in pdf format for consideration. We will also consider one-page abstracts (in txt or pdf format) although more complete papers are preferred. We welcome submissions for both full talks and abbreviated lightning talks. Both academic and practitioner proposals related to R are encouraged.

All slides will be made publicly available at conference time. Presenters are strongly encouraged to provide working R code to accompany the slides. Data sets should also be made public for the purposes of reproducibility (though we realize this may be limited due to contracts with data vendors). Preference may be given to presenters who have released R packages.

Please submit proposals online at http://go.uic.edu/rfinsubmit . Submissions will be reviewed and accepted on a rolling basis with a final submission deadline of February 2, 2018. Submitters will be notified via email by March 2, 2018 of acceptance, presentation length, and financial assistance (if requested).

Financial assistance for travel and accommodation may be available to presenters. Requests for financial assistance do not affect acceptance decisions. Requests should be made at the time of submission. Requests made after submission are much less likely to be fulfilled. Assistance will be granted at the discretion of the conference committee.

Additional details will be announced via the conference website as they become available. Information on previous years’ presenters and their presentations are also at the conference website. We will make a separate announcement when registration opens.

For the program committee:

Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson, Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich

RQuantLib 0.4.4 for Windows

Fri, 05 Jan 2018 14:02:00 -0600

I’m pleased to announce that the RQuantLib Windows binaries are now up to 0.4.4! The RQuantLib pre-built Windows binaries have been frozen on CRAN since 0.4.2, but now you can get version 0.4.4 binaries on Dirk’s ghrr drat repo.

Installation is as simple as:

drat::addRepo("ghrr") # maybe use 'install.packages("drat")' first 
install.packages("RQuantLib", type="binary")

I will be able to create Windows binaries for future RQuantLib versions too, now that I have a Windows QuantLib build (version 1.11) to link against.

Dirk and I plan to talk with CRAN about getting the new binaries hosted there. Regardless, they will always be available via the drat repo.

getSymbols and Alpha Vantage

Fri, 06 Oct 2017 16:12:00 -0500

Thanks to Paul Teetor, getSymbols() can now import data from Alpha Vantage! This feature is part of the quantmod 0.4-11 release, and provides another another data source to avoid any Yahoo Finance API changes*.

Alpha Vantage is a free web service that provides real-time and historical equity data. They provide daily, weekly, and monthly history for both domestic and international markets, with up to 20 years of history. Dividend and split adjusted close prices are available for daily data. They also provide near real-time price bars at a resolution of 1 minute or more, for up to 10 recent days.

All you need to get started is a one-time registration for an API key. Alpha Vantage has clean, documented, public API that returns either JSON-encoded data or a CSV file. The arguments to getSymbols.av() closely follow the native API, so be sure to use their documentation!

To get started, install the latest quantmod from CRAN. Then you call:

getSymbols("MSFT", src = "av", api.key = "\[your key\]")

Where you replace "\[your key"\] with the API key you receive after registration. You can use setDefaults() to set your API key one time, and use it for all getSymbols.av() calls.

setDefaults("getSymbols.av", api.key = "\[your key\]")

* Speaking of API changes, this release also includes a fix for a Yahoo Finance change (#174).

xts 0.10-0 on CRAN!

Fri, 07 Jul 2017 14:10:00 -0500

A new, and long overdue, release of xts is now on CRAN! The major change is the completely new plot.xts() written by Michael Weylandt and Ross Bennett, and which is based on Jeff Ryan’s quantmod::chart_Series() code.

Do note that the new plot.xts() includes breaking changes to the original (and rather limited) plot.xts(). However, we believe the new functionality more than compensates for the potential one-time inconvenience. And I will no longer have to tell people that I use plot.zoo() on xts objects!

This release also includes more bug fixes than you can shake a stick at. We squashed several bugs that could have crashed your R session. We also fixed some (always pesky and tricky) timezone issues. We’ve also done more sanity checking (e.g. for NA in the index), and provide more informative errors when things aren’t right. And last, but not least, unit tests are running again!

I’m sure you were hoping to see some examples of the new plot.xts() functionality. Rather than clutter up this blog post with code, check out the basic examples, and the panel functionality examples that Ross Bennett created.

Importing and managing financial data

Wed, 21 Jun 2017 07:07:00 -0500

I’m excited to announce my DataCamp course on importing and managing financial data in R! I’m also honored that it is included in DataCamp’s Quantitative Analyst with R Career Track!

You can explore the first chapter for free, so be sure to check it out!

Course Description

Financial and economic time series data come in various shapes, sizes, and periodicities. Getting the data into R can be stressful and time-consuming, especially when you need to merge data from several different sources into one data set. This course covers importing data from local files as well as from internet sources.

Course Outline

Chapter 1: Introduction and downloading data

A wealth of financial and economic data are available online. Learn how getSymbols() and Quandl() make it easy to access data from a variety of sources.

Chapter 2: Extracting and transforming data

You’ve learned how to import data from online sources, now it’s time to see how to extract columns from the imported data. After you’ve learned how to extract columns from a single object, you will explore how to import, transform, and extract data from multiple instruments.

Chapter 3: Managing data from multiple sources

Learn how to simplify and streamline your workflow by taking advantage of the ability to customize default arguments to getSymbols(). You will see how to customize defaults by data source, and then how to customize defaults by symbol. You will also learn how to handle problematic instrument symbols

Chapter 4: Aligning data with different periodicities

You’ve learned how to import, extract, and transform data from multiple data sources. You often have to manipulate data from different sources in order to combine them into a single data set. First, you will learn how to convert sparse, irregular data into a regular series. Then you will review how to aggregate dense data to a lower frequency. Finally, you will learn how to handle issues with intra-day data.

Chapter 5: Importing text data, and adjusting for corporate actions

You’ve learned the core workflow of importing and manipulating financial data. Now you will see how to import data from text files of various formats. Then you will learn how to check data for weirdness and handle missing values. Finally, you will learn how to adjust stock prices for splits and dividends.

quantmod 0.4-9 on CRAN

Wed, 07 Jun 2017 12:25:00 -0500

A new release of quantmod is now on CRAN! The only change was to address changes to Yahoo! Finance and their effects on getSymbols.yahoo(). GitHub issue #157 contains some details about the fix implementation.

Unfortunately, the URL wasn’t the only thing that changed. The actual data available for download changed as well.

The most noticeable difference is that the adjusted close column is no longer dividend-adjusted (i.e. it’s only split-adjusted). Also, only the close price is unadjusted; the open, high, and low are split-adjusted.

There also appear to be issues with the adjusted prices in some instruments. For example, users reported issues with split data for XLF and SPXL in GitHub issue #160. For XLF, there a split and a dividend on 2016-09-16, even on the Yahoo! Finance historical price page for XLF. As far as I can tell, there was only a special dividend. The problem with SPXL is that the adjusted close price isn’t adjusted for the 4/1 split on 2017-05-01, which is also reflected on the Yahoo! Finance historical prices page for SPXL.

Another change is that the downloaded data may contain rows where all the values are “null”. These appear on the website as “0”. This is a major issue for some instruments. Take XLU for example; 188 of the 624 days of data are missing between 2014-12-04 and 2017-05-26 (ouch!). You can see this is even true on the Yahoo! Finance historical price page for XLU.

If these changes have made you look for a new data provider, see my post: Yahoo! Finance Alternatives.

Yahoo Finance Alternatives

Wed, 07 Jun 2017 12:20:00 -0500

I assume that you’re reading this because you are one of many people who were affected by the changes to Yahoo Finance data in May (2017). Not only did the URL change, but the actual data changed as well!

The most noticeable difference is that the adjusted close column is now only split-adjusted, whereas it used to be split- and dividend-adjusted. Another oddity is that only the close prices is unadjusted (strangely, the open, high, and low are split-adjusted).

All these issues can be dealt with using tools that are currently available. For example, you can unadjust the open, high, and low prices using the ratio of close to adjusted close prices. And you can adjust for both splits and dividends using quantmod::adjustOHLC().

Unfortunately, there also appear to be issues with data quality. Some instruments have rows where all the prices and volume are zeros (e.g. XLU). The adjusted close in some instruments is incorrect because of missing split events, or double-counting splits and special dividends.

So, what are your alternatives? If you’re just tinkering, you can try other free data sources like Google Finance or Quandl. Note that Google Finance data is already split-adjusted, so you might need to adjust for dividends, or un-adjust for splits, depending on your needs. Quandl has a wiki of end-of-day stock prices curated by the community. You only need a free account to access the data.

If you’re using the data to make actual investment decisions, you should really be using a professional data provider. At the very least, you get someone to yell at when the data have errors. :) First, you should check if your broker provides the historical data you need (e.g. Interactive Brokers provides historical and real-time data to account-holders).

If your broker doesn’t provide historical data, here are a few providers you may want to consider:

Tiingo

Free historical end-of-day data (registration and API key required)
Up to 50+ years of daily data (split and dividend adjusted) for over 65,000 equities, mutual funds, and ETFs
Free historical and real-time crypto data (4+ year of daily prices, 2+ years of intraday data)
Free historical and real-time intraday data from IEX, beginning in August 2017
Available via getSymbols()

Alpha Vantage

Free historical and intraday equity data (registration and API key required)
Up to 20 years of daily data (split and dividend adjusted available)
Up to 10 days of intraday data (1min, 5min, 15min, 30min, 60min)
Available via getSymbols()

eoddata

Provide limited historical data for free
For a one-time fee:
- $20-$50 for 10 years of daily data
- $40-$100 for 20 years of daily data

CSI Data *

Massive historical equity database
$600 annually for 30 years of daily data
Ability to adjust for splits and dividends

IQFeed

Mainly a real-time data provider, but also has historical data
Features
Pricing, starts at $78/month

Leave a comment if you know of another end-of-day data provider that I didn’t list!

* FULL DISCLOSURE: I receive a referral fee for annual subscriptions to CSI products if you use the FOSS coupon code.

quantmod 0.4-8 on CRAN

Wed, 19 Apr 2017 10:45:00 -0500

I pushed a bug-fix release of quantmod to CRAN last night. The major changes were to

getSymbols.FRED() (#141)
getSymbols.oanda() (#144)
getSymbols.yahoo() (#149)

All three providers made breaking changes to their URLs/interfaces.

getSymbols.google() also got some love. It now honors all arguments set via setSymbolLookup() (#138), and it correctly parses the date column in non-English locales (#140).

There’s a handy new argument to getDividends(): split.adjust. It allows you to request dividends unadjusted for splits (#128). Yahoo provides split-adjusted dividends, so you previously had to manually unadjust them for splits if you wanted the original raw values. To import the raw unadjusted dividends, just call:

rawDiv <- getDividends("IBM", split.adjust = FALSE)

Note that the default is split.adjust = TRUE to maintain backward-compatibility.

Stack Financials: Analyze Financial Statement Data

Tue, 14 Feb 2017 10:49:00 -0600

A quantmod user asked an interesting question on StackOverflow: Looping viewFinancials from quantmod. Basically, they wanted to create a data.frame that contained financial statement data for several companies for several years. I answered their question, and thought others might find the function I wrote useful… hence, this post!

I called the function stackFinancials() because it would use getFinancials() and viewFinancials() to pull financial statement data for multiple symbols, and stack them together in long form. I chose a long data format because I don’t know whether the output of viewFinancials() always has the same number of rows and columns for a given type and period. The long format makes it easy to put all the data in one object.

stackFinancials <-
function(symbols, type = c("BS", "IS", "CF"), period = c("A", "Q")) {
 # Ensure the type and period arguments match viewFinancials 
 type <- match.arg(toupper(type[1]), c("BS", "IS", "CF"))
 period <- match.arg(toupper(period[1]), c("A", "Q"))

 # Simple function to get financials for one symbol 
 getOne <- function(symbol, type, period) {
 gf <- getFinancials(symbol, auto.assign = FALSE)
 vf <- viewFinancials(gf, type = type, period = period)
 # Put viewFinancials output into a data.frame 
 df <- data.frame(vf, line.item = rownames(vf), type = type,
 period = period, symbol = symbol,
 stringsAsFactors = FALSE, check.names = FALSE)
 # Reshape data.frame into long format 
 long <- reshape(df, direction="long", varying=seq(ncol(vf)),
 v.names="value", idvar="line.item",
 times=colnames(vf))
 # Reset row.names to "automatic" 
 rownames(long) <- NULL
 # Return data 
 long
 }
 # Loop over all symbols 
 allData <- lapply(symbols, getOne, type = type, period = period)
 # rbind() all into one data.frame 
 do.call(rbind, allData)
}

Here’s a simple example of how to use stackFinancials() to pull the quarterly (period = "Q") income statements (type = "IS") for General Electric and Apple:

library(quantmod)
Data <- stackFinancials(c("GE", "AAPL"), type = "IS", period = "Q")
head(Data, 4)
## line.item type period symbol time value 
## 1 Revenue IS Q GE 2016-12-31 33088 
## 2 Other Revenue, Total IS Q GE 2016-12-31 NA 
## 3 Total Revenue IS Q GE 2016-12-31 33088 
## 4 Cost of Revenue, Total IS Q GE 2016-12-31 24775

Now that we have the output in Data, let’s do something with it. You could simply subset Data to extract the components you want. For example, if you wanted to look at Apple’s quarterly revenue, you could subset Data where symbol == "AAPL" and line.item == "Total Revenue". But if you’re going to slicing-and-dicing a lot, it can often help to write a general function to simplify things. So I wrote extractLineItem(). It takes the output of stackFinancials() and a regular expression of the line item you want, and it returns an xts object that contains the given line items for all symbols in the data.

extractLineItem <- function(stackedFinancials, line.item) {
 if (missing(stackedFinancials) || missing(line.item)) {
 stop("You must provide output from stackFinancials(),",
 "and the line.item to extract")
 }
 # Select line items matching user input 
 match.rows <- grepl(line.item, Data$line.item, ignore.case = TRUE)
 sfSubset <- Data[match.rows,]
 getItem <- function(x) {
 # Create xts object 
 output <- xts(x$value, as.yearmon(x$time))
 # Ensure column names are syntactically valid 
 valid.names <- make.names(paste(x$symbol[1], x$line.item[1]))
 # Remove repeating periods 
 colnames(output) <- gsub("\\.+", "\\.", valid.names)
 output
 }
 # Split subset by line.item and symbol 
 symbol.item <- split(sfSubset, sfSubset[, c("symbol", "line.item")])
 # Apply getItem() to each chunk, and merge into one object 
 do.call(merge, lapply(symbol.item, getItem))
}

Let’s use extractLineItem() to compare total revenue for GE and AAPL.

totalRevenue <- extractLineItem(Data, "total revenue")
totalRevenue
## AAPL.Total.Revenue GE.Total.Revenue 
## Dec 2015 75872 24654 
## Mar 2016 50557 27845 
## Jun 2016 42358 61339 
## Sep 2016 46852 90605 
## Dec 2016 78351 33088
plot(totalRevenue, main = "Quarterly Total Revenue, AAPL (black) vs GE (red)")

You could also combine multiple calls to extractLineItem() to calculate ratios not included in the output from viewFinancials(). For example, you could divide operating income by total revenue to calculate operating margin.

operatingIncome <- extractLineItem(Data, "operating income")
operatingIncome
## AAPL.Operating.Income GE.Operating.Income 
## Dec 2015 24171 2863 
## Mar 2016 13987 545 
## Jun 2016 10105 4736 
## Sep 2016 11761 6138 
## Dec 2016 23359 2892
plot(operatingIncome / totalRevenue, main = "Quarterly Operating Margin, AAPL (black) vs GE (red)")

R/Finance 2017: Call for Papers

Wed, 04 Jan 2017 08:11:00 -0600

R/Finance 2017: Applied Finance with R

May 19 and 20, 2017

University of Illinois at Chicago

The ninth annual R/Finance conference for applied finance using R will be held on May 19 and 20, 2017 in Chicago, IL, USA at the University of Illinois at Chicago. The conference will cover topics including portfolio management, time series analysis, advanced risk tools, high-performance computing, market microstructure, and econometrics. All will be discussed within the context of using R as a primary tool for financial risk management, portfolio construction, and trading.

Over the past eight years, R/Finance has included attendees from around the world. It has featured presentations from prominent academics and practitioners, and we anticipate another exciting line-up for 2017.

We invite you to submit complete papers in pdf format for consideration. We will also consider one-page abstracts (in txt or pdf format) although more complete papers are preferred. We welcome submissions for both full talks and abbreviated “lightning talks.” Both academic and practitioner proposals related to R are encouraged.

Financial assistance for travel and accommodation may be available to presenters, however requests must be made at the time of submission. Assistance will be granted at the discretion of the conference committee.

Please submit proposals online at http://go.uic.edu/rfinsubmit. Submissions will be reviewed and accepted on a rolling basis with a final deadline of February 28, 2017. Submitters will be notified via email by March 31, 2017 of acceptance, presentation length, and financial assistance (if requested).

Additional details will be announced via the conference website www.RinFinance.com as they become available. Information on previous years’ presenters and their presentations are also at the conference website. We will make a separate announcement when registration opens.

For the program committee:

Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson, Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich

quantmod 0.4-6 on CRAN

Mon, 29 Aug 2016 11:38:00 -0500

CRAN just accepted a bugfix release of quantmod. The most pertinent changes were to fix getSymbols.oanda (#36) and getOptionChain.yahoo (#92). It also includes a fix to addTRIX (#72).

Oanda changed their URL format from http to https, and getSymbols.oanda did not follow the redirect. Yahoo Finance changed the HTML for displaying options data, which broke getOptionChain.yahoo. The fix downloads JSON instead of scraping HTML, so hopefully it will be less likely to break. For more information, see the links to the GitHub issues above.

I added documentation for getPrice (#77), and removed the unused unsetSymbolLookup function and corresponding documentation (#115).

DataCamp course: Importing and managing financial data

Fri, 17 Jun 2016 11:32:00 -0500

The team at DataCamp announced a new R/Finance course series in a recent email:

Subject: Data Mining Tutorial, R/Finance course series, and more!

R/Finance - A new course series in the works
We are working on a whole new course series on applied finance using R. This new series will cover topics such as time series (David S. Matteson), portfolio analysis (Kris Boudt), the xts and zoo packages (Jeffrey Ryan), and much more. Start our first course Intro to Credit Risk Modeling in R today.

I’m excited to announce that I’m working on a course for this new series! It will provide an introduction to importing and managing financial data.

If you’ve ever done anything with financial or economic time series, you know the data come in various shapes, sizes, and periodicities. Getting the data into R can be stressful and time-consuming, especially when you need to merge data from several different sources into one data set. This course will cover importing data from local files as well as from internet sources.

The tentative course outline is below. I’d really appreciate your feedback on what should be included in this introductory course! So let me know if I’ve omitted something, or if you think any of the topics are too advanced.

Introduction to importing and managing financial data

Introduction and downloading data
getSymbols design overview, Quandl
Finding and downloading data from internet sources
E.g. getSymbols.yahoo, getSymbols.FRED, Quandl
Loading and transforming multiple instruments
Checking for errors (i.e. summary stats, visualizing)
Managing data from multiple sources
Setting per-instrument sources and default arguments
setSymbolLookup, saveSymbolLookup, loadSymbolLookup, setDefaults
Handling instruments names that clash or are not valid R object names
Aligning data with different periodicities
Making irregular data regular
Aggregating to lowest frequency
Combining monthly with daily
Combining daily with intraday
Storing and updating data
Creating an initial RData-backed storage
Adjusting financial time-series
Handling errors during update process

Registration for R/Finance 2016 is open!

Mon, 11 Apr 2016 08:14:00 -0500

You can find registration information and agenda details on the conference website. Or you can go directly to the Cvent registration page.

Note that registration fees will increase by 50% at the end of early registration on May 6, 2016.

The conference will take place on May 20 and 21, at UIC in Chicago. Building on the success of the previous conferences in 2009-2015, we expect more than 250 attendees from around the world. R users from industry, academia, and government will joining 50 presenters covering all areas of finance with R.

We are very excited about the four keynote presentations given by Patrick Burns, Frank Diebold, Tarek Eldin, and Rishi Narang. The conference agenda (currently) includes 17 full presentations and 33 shorter “lightning talks”. As in previous years, several (optional) pre-conference seminars are offered on Friday morning.

There is also an (optional) conference dinner at The Riverside Room and Gallery at Trump. Situated directly on the hotel’s new River Walk, it is a perfect venue to continue conversations while dining and drinking.

We would to thank our 2016 Sponsors for the continued support enabling us to host such an exciting conference:

UIC Liautaud Master of Science in Finance

Microsoft
MS-Computational Finance and Risk Management at University of Washington

Charles Schwab
Hull Investments
Interactive Brokers
OneMarketData
RStudio

On behalf of the committee and sponsors, we look forward to seeing you in Chicago!

Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson, Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich

Comment on Overnight SPY Anomaly

Mon, 16 Nov 2015 11:25:00 -0600

This post is in response to Michael Harris’ Price Action Lab post, where he uses some simple R code to evaluate the asymmetry of returns from the day’s close to the following day’s open. I’d like to respond to his 3 notes, which I’ve included below.

The R backtest assumes fractional shares. This means that equity is fully invested at each new position. This is important because it affects drawdown calculations.

When calculating the Sharpe ratio, the “geometric = FALSE” option must be used otherwise the result may not be correct. It took some time to figure that out.

The profit factor result in R does not reconcile with results from other platforms or even from excel. PF in R is shown as 1.23 but the correct value is 1.17. Actually, the profit factor is calculated on a per share basis in R, although returns are geometric.

I completely agree with the first point. I’m not sure Mike considers the output of SharpeRatio.annualized with geometric=TRUE to be suspect (he doesn’t elaborate). The overnightRets are calculated as arithmetic returns, so it’s proper to aggregate them using geometric chaining (i.e. multiplication).

I also agree with the third point, because the R code used to calculate profit factor is wrong. My main impetus to write this post was to provide a corrected profit factor calculation. The calculation (with slightly modified syntax) in Mike’s post is:

require(quantmod)
getSymbols('SPY', from = '1900-01-01')
SPY <- adjustOHLC(SPY, use.Adjusted=TRUE)
overnightRets <- na.omit(Op(SPY)/lag(Cl(SPY)) - 1)
posRet <- overnightRets > 0
profitFactor <- -sum(overnightRets[posRet])/sum(overnightRets[!posRet])

Note that profit factor in the code above is calculated by summing positive and negative returns, when it should be calculated using positive and negative P&L. In order to do that, we need to calculate the equity curve and then take its first difference to get P&L. The corrected calculation is below, and it provides the correct result Mike expected.

grossEquity <- cumprod(overnightRets+1)
grossPnL <- diff(grossEquity)
grossProfit <- sum(grossPnL[grossPnL > 0])
grossLoss <- sum(grossPnL[grossPnL < 0])
profitFactor <- grossProfit / abs(grossLoss)

I’d also like to respond to Mike’s comment:

Since in the past I have identified serious flaws in commercially available backtesting platforms, I would not be surprised if some of the R libraries have some flaws.

I’m certain all of the backtesting R packages have flaws/bugs. All software has bugs because all software is written by fallible humans. One nice thing about (most) R packages is that they’re open source, which means anyone/everyone can check the code for bugs, and fix any bugs that are found. With closed-source software, commercial or not, you depend on the vendor to deliver a patched version at their discretion and in their timing.

Now, I’m not making an argument that open source software is inherently better. I simply wanted to point out this one difference. As much as I love open source software, there are times where commercial vendor-supported software presents a more appealing set of tradeoffs than using open source software. Each situation is different.

New quantmod and TTR on CRAN

Fri, 24 Jul 2015 16:04:00 -0500

I just sent quantmod_0.4-5 to CRAN, and TTR_0.23-0 has been there for a couple weeks. I’d like to thank Ivan Popivanov for many useful reports and patches to TTR. He provided patches to add HMA() (Hull MA), ALMA(), and ultimateOscillator() functions.

James Toll provided a patch to the volatility() function that uses a zero mean (instead of the sample mean) in close-to-close volatility. The other big change is that moving average functions no longer return objects with column names based on the input object column names. There are many other bug fixes (see the CHANGES file in the package).

The biggest changes in quantmod were to fix getSymbols.MySQL() to use the correct dbConnect() call based on changes made in RMySQL_0.10 and to fix getSymbols.FRED() to use https:// instead of http:// when downloading FRED data. getSymbols.csv() also got some much-needed love.

I’d also like to mention that development has moved to GitHub for both TTR and quantmod.

plot.xts RFC

Mon, 20 Apr 2015 12:45:00 -0500

We have been working on a new charting engine for xts::plot.xts for the past couple years. It started with Michael Weylandt’s work during the 2012 Google Summer of Code, and Ross Bennett took up the torch during the 2014 GSoC.

This new engine improves the functionality, modularity, and flexibility of plot.xts by building off the framework Jeff Ryan began with quantmod::chart_Series. The modular framework allows users to plot an xts object and incrementally build custom charts by adding panels of new data (including transformations of the original xts object).

The main objective was to provide functionality similar to chartSeries and addTA for xts objects. The current code includes support for:

Basic time series plots with sensible defaults
Plotting xts objects by column “automagically” as separate panels
Small multiples with multiple pages
“Layout-safe” so multiple specifications/panels can be charted in a single device
Easily add data to an existing plot or add panels similar to quantmod::add*
Event lines

The xts team would greatly appreciate any comments, feedback, and bug reports before the upcoming CRAN release at the end of April.

The new version of plot.xts is in the main xts development code base, which is available on GitHub in the develop branch. GitHub is also the place to submit b ug reports and feature requests.

Note that the new plot.xts includes breaking changes to the original (and rather limited) plot.xts. However, we believe the new functionality more than compensates for the potential one-time inconvenience.

Registration Open for R/Finance 2015!

Tue, 31 Mar 2015 10:35:00 -0500

You can find registration information and agenda details (as they become available) on the conference website. Or you can go directly to the registration page. Note that there’s an early-bird registration deadline of May 15.

The conference will take place on May 29 and 30, at UIC in Chicago. Building on the success of the previous conferences in 2009-2014, we expect more than 250 attendees from around the world. R users from industry, academia, and government will joining 30+ presenters covering all areas of finance with R.

We are very excited about the four keynote presentations given by Emanuel Derman, Louis Marascio, Alexander McNeil, and Rishi Narang. The main agenda (currently) includes 18 full presentations and 19 shorter “lightning talks”. As in previous years, several (optional) pre-conference seminars are offered on Friday morning.

There is also an (optional) conference dinner that will once-again be held at The Terrace at Trump Hotel. Overlooking the Chicago river and skyline, it is a perfect venue to continue conversations while dining and drinking.

We would to thank our 2015 sponsors for the continued support enabling us to host such an exciting conference:

International Center for Futures and Derivatives at UIC

Revolution Analytics
MS-Computational Finance at University of Washington

OneMarketData
Ketchum Trading
RStudio
SYMMYS

On behalf of the committee and sponsors, we look forward to seeing you in Chicago!

For the program committee:
Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson, Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich

Import Japanese equity data into R with quantmod 0.4-4

Tue, 10 Mar 2015 12:09:00 -0500

I pushed quantmod 0.4-4 to CRAN this weekend. It adds a getSymbols.yahooj function to pull stock data from Yahoo Finance Japan, and fixes issues in getOptionChain.yahoo and getSymbols.oanda.

Changes to the Yahoo Finance and Oanda websites broke the getOptionChain.yahoo and getSymbols.oanda functions, respectively. I didn’t use getOptionChain.yahoo much, so I’m not certain I restored all the prior functionality. Let me know if there’s something I missed. I’d be glad to add a test case for that, or to add a test you’ve written.

The getSymbols.yahooj function is a major enhancement provided by Wouter Thielen. It allows quantmod users to pull stock data from Yahoo Finance Japan.

Japanese ticker symbols usually start with a number and it is cumbersome to use variable names that start with a number in the R environment, so the string “YJ” will be prepended to each of the Symbols. I recommend using setSymbolLookup to prepend the ticker symbols with “YJ” yourself, so you can just use the main getSymbols function.

For example, if you want to pull Sony data, you would run:

require(quantmod)
setSymbolLookup(YJ6758.T='yahooj')
getSymbols('YJ6758.T')

The full list of supported data sources for quantmod is now: Yahoo Finance-US, Yahoo Finance-Japan, Google Finance, csv, RData (including rds and rda), FRED, SQLite, MySQL, and Oanda.

Contributions to add support for additional data sources are welcomed. The existing getSymbols functions are good templates to start from.

Google Summer of Code 2015

Tue, 03 Mar 2015 09:04:00 -0600

The R Project has once again been selected as a mentoring organization for this year’s Google Summer of Code (GSoC). If you’re not familiar with GSoC, it’s a global program that offers students a stipend to write code for open source projects, under the direction of a mentor. Mentors get code written for their project, but no money. Students get something like a paid summer internship, with open-source contributions they can reference on their resume.

If you’re interested in participating as a student or a mentor, there’s an overview of the GSoC program on The R Project GSoC 2015 Wiki. The wiki also includes a timeline and links to prior year’s projects.

Several mentors from various backgrounds have already proposed projects for students to work on this summer. Mentors have until March 9th to submit projects they would be willing to support, and student applications begin on March 16th.

Updated quantmod on CRAN

Mon, 15 Dec 2014 09:43:00 -0600

An updated version of quantmod has just been released on CRAN. This is my first submission as the new maintainer. The major change was removing the dependency on the now-archived Defaults package. End-users shouldn’t notice a difference, since I basically copied the necessary functionality from Defaults and added it to quantmod.

There are also several bug fixes. A few worth noting are:

R/Finance 2015 Call for Papers

Tue, 18 Nov 2014 13:52:00 -0600

Call for Papers:

R/Finance 2015: Applied Finance with R
May 29 and 30, 2015
University of Illinois at Chicago

The seventh annual R/Finance conference for applied finance using R will be held on May 29 and 30, 2015 in Chicago, IL, USA at the University of Illinois at Chicago. The conference will cover topics including portfolio management, time series analysis, advanced risk tools, high-performance computing, market microstructure, and econometrics. All will be discussed within the context of using R as a primary tool for financial risk management, portfolio construction, and trading.

Over the past six years, R/Finance has included attendees from around the world. It has featured presentations from prominent academics and practitioners, and we anticipate another exciting line-up for 2015. This year will include invited keynote presentations by Emanuel Derman, Louis Marascio, Alexander McNeil, and Rishi Narang.

We invite you to submit complete papers in pdf format for consideration. We will also consider one-page abstracts (in txt or pdf format) although more complete papers are preferred. We welcome submissions for both full talks and abbreviated “lightning talks.” Both academic and practitioner proposals related to R are encouraged.

The conference will award two (or more) $1000 prizes for best papers. A submission must be a full paper to be eligible for a best paper award. Extended abstracts, even if a full paper is provided by conference time, are not eligible for a best paper award. Financial assistance for travel and accommodation may be available to presenters, however requests must be made at the time of submission. Assistance will be granted at the discretion of the conference committee.

Please make your submission online at: http://www.cvent.com/d/t4qy73. The submission deadline is January 31, 2015. Submitters will be notified via email by February 28, 2015 of acceptance, presentation length, and financial assistance (if requested).

Additional details will be announced via the conference website as they become available. Information on previous years’ presenters and their presentations are also at the conference website.

For the program committee:
Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson, Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich

R/Finance 2014 Review

Mon, 30 Jun 2014 07:40:00 -0500

It’s been more than a month since R/Finance 2014, and my job has finally slowed down enough to allow me to write down my thoughts (though I’m writing this over two days during my train to and from Chicago).

The comments below are based on my personal experience. If I don’t comment on a seminar or presentation, it doesn’t mean I didn’t like it or it wasn’t good; it may have been over my head or I may have been distracted with my duties as a committee member. All the currently available conference slides are available on the website.

Friday morning seminar:
I went to Dirk Eddelbuettel’s seminar because I may be writing a R package to query Deltix’s TimeBase database. Deltix provides a C++ API, so this is a perfect opportunity to use Rcpp.

Friday talks:
The first presentation was given by keynote Luke Tierney, who discussed recent and upcoming performance improvements to R, and introduced some new profiling tools in his proftools package (and a new proftools-GUI package).

Yang Lu explored the low-risk anomaly on high/low volatility portfolios with similar industry, size, and volume. Avery Moon discussed how they use R at Wealthfront to run cashflow simulations for their tax-loss harvesting strategy. Steven Pav used math and memes to discuss portfolio inference. Tobias Setz used the Bayesian Change Point method to analyze time series stability.

Paul Teetor and Matthew Clegg discussed different aspects of pairs trading. Kent Hoxsey demonstrated a simple way to explore trading signal expectation. Matthew Barry introduced the pbo package, which implements some of the ideas in the paper, The Probability of Backtest Overfitting.

Alexios Ghalanos was the day’s second keynote, and he discussed smooth transition autoregressive models and his new package, twinkle. Alexios wrote a post discussing his presentation, which you should definitely read.

Friday food/networking:
During the two-hour conference reception at UIC, I had some drinks and hors d’ouvres, talked with speakers, and meet people I encouraged to attend and/or present. Next was the (optional) dinner at The Terrace at Trump. It was cold and windy again this year, so we were inside the entire night. Same as last year, the food was fantastic, but the conversations were even better.

Saturday talks:
The first presentation was a lightning talk by Chirag Anand, where he introduced the eventstudies package, which is very well done. Casey King gave an incredibly informative and entertaining presentation on anti-money laundering and suspicious activity reporting in penny stocks using message board posts. Bryan Lewis introduced his IRL package and ran a 16 million node network analysis in < 2 minutes on his Chromebook, during his talk. Stephen Rush discussed his work on VPIN (volume synchronized probability of informed trading), while competing with Steven Pav for the “presentation with the most memes”.

Bob McDonald gave the third keynote presentation, where he discussed using R to teach derivatives in MBA classes. He also explained his decision to adopt R in terms of valuing an option. Eric Zivot discussed his upcoming book, “Modeling Financial Time Series with S-Plus R”. Rohini Grover measured the imprecision of implied volatility estimates in volatility indexes using the ifrogs package.

Bill Cleveland gave the final keynote and talked about the “divide-and-recombine” method for large, complex data, using R and Hadoop. Gregor Kastner introduced his stochvol package, and Matthew Dixon showed how to calibrate stochastic volatility models using his “alpha” gpusvcalibration package. Dirk Eddelbuettel closed the conference with a lightning talk on his recently-released RcppRedis package.

The committee also presented the awards for best papers. The winners were:

Portfolio inference with this one weird trick, Steven E. Pav
Dealing with Stochastic Volatility in Time Series Using the R Package stochvol, Gregor Kastner
Re-Evaluation of the Low-Risk Anomaly in Finance via Matching, Yang Lu, Daniel Wu, Kwok Yu
All words are not equal: Sentiment dynamics and information content within CEO letters, Kris Boudt, James Thewissen

Saturday food/networking:
As always, the conference ended with one more trip to Jaks Tap. I spent some time giving college students some advice about starting their careers, and discussed the presentation I gave earlier in the week at the Chicago R User Group on Profiling for Speed.

Last, but not least: none of this would be possible without the support of fantastic sponsors:
International Center for Futures and Derivatives at UIC, Revolution Analytics, MS-Computational Finance at University of Washington, OneMarketData, RStudio, TIBCO, SYMMS, and paradigm4.

Introduction to PortfolioAnalytics

Sat, 29 Mar 2014 12:25:00 -0500

This is a guest post by Ross Bennett. Ross is currently enrolled in the University of Washington Master of Science in Computational Finance & Risk Management program with an expected graduation date of December 2014. He worked on the PortfolioAnalytics package as part of the Google Summer of Code 2013 project and continues to work on the package as a Research Assistant at the University of Washington.

His work on the package focused on implementing a portfolio specification to separate and modularize assets, constraints, and objectives. Support for additional constraints including group, diversification, and factor exposure constraints was also added. The random portfolio solver was expanded to include two additional methods of generating random portfolios. The optimization backends were further standardized for sets of constraints and objectives that can be solved via linear and quadratic programming solvers using the ROI package. Charts including risk budget and efficient frontiers were added as well as standardizing the charting across all optimization engines.

This post is meant to provide a very basic introduction to the PortfolioAnalytics package. PortfolioAnalytics is an R package designed to provide numerical solutions and visualizations for portfolio problems with complex constraints and objectives. A key feature of PortfolioAnalytics is the ability to specify a portfolio with assets, constraints, and objectives that is solver agnostic, where the objective can be comprised of any valid R function. PortfolioAnalytics utilizes multiple solvers as backends for the optimization; linear programming, quadratic programming, differential evolution, random portfolios, particle swarm, and generalized simulated annealing. For optimization problems that can be formulated as linear or quadratic problems, these can be solved very fast and efficiently using the appropriate linear or quadratic solver supported by PortfolioAnalytics. For optimization problems with more complex constraints and objectives, a global solver such as differential evolution or random portfolios can be used to find a solution.

Ross will be giving a tutorial on PortfolioAnalytics at the R/Finance 2014 Conference. The tutorial will cover the key features of PortfolioAnalytics along with several comprehensive examples. Those who want to learn more about how R is used in finance are encouraged to attend.

The primary functions in PortfolioAnalytics to specify a portfolio with constraints and objectives are portfolio.spec(), add.constraint(), and add.objective().

library(PortfolioAnalytics)
data(edhec)
returns <- edhec[, 1:6]
funds <- colnames(returns)

Here we create a portfolio object with portfolio.spec(). The assets argument is a required argument to the portfolio.spec() function. assets can be a character vector with the names of the assets, a named numeric vector, or a scalar value specifying the number of assets. If a character vector or scalar value is passed in for assets, equal weights will be created for the initial portfolio weights.

init.portfolio <- portfolio.spec(assets = funds)

The portfolio object is an S3 class that contains portfolio level data as well as the constraints and objectives for the optimization problem. You can see that the constraints and objectives lists are currently empty, but we will add sets of constraints and objectives with add.constraint() and add.objective().

print.default(init.portfolio)
## $assets 
## Convertible Arbitrage CTA Global Distressed Securities 
## 0.1667 0.1667 0.1667 
## Emerging Markets Equity Market Neutral Event Driven 
## 0.1667 0.1667 0.1667 
## 
## $category_labels 
## NULL 
## 
## $weight_seq 
## NULL 
## 
## $constraints 
## list() 
## 
## $objectives 
## list() 
## 
## $call 
## portfolio.spec(assets = funds) 
## 
## attr(,"class") 
## [1] "portfolio.spec" "portfolio"

Here we add the full investment constraint. The full investment constraint is a special case of the leverage constraint that specifies the weights must sum to 1 and is specified with the alias type="full_investment" as shown below.

init.portfolio <- add.constraint(portfolio = init.portfolio, type = "full_investment")

Now we add box constraint to specify a long only portfolio. The long only constraint is a special case of a box constraint where the lower bound of the weights of each asset is equal to 0 and the upper bound of the weights of each asset is equal to 1. This is specified with type="long_only" as shown below. The box constraint also allows for per asset weights to be specified.

init.portfolio <- add.constraint(portfolio = init.portfolio, type = "long_only")

The following constraint types are supported:

leverage
box
group
position_limit1
turnover2
diversification
return
factor_exposure
transaction_cost2

Not supported for problems formulated as quadratic programming problems solved with optimize_method="ROI".
Not supported for problems formulated as linear programming problems solved with optimize_method="ROI".

Below we create two new portfolio objects. Note that we areassigning new names for the portfolios. This re-uses the constraints from init.portfolio() and adds the objectives specified below to minSD.portfolio and meanES.portfolio while leaving init.portfolio unchanged. This is useful for testing multiple portfolios with different objectives using the same constraints because the constraints only need to be specified once and several new portfolios can be created using an initial portfolio object.

# Add objective for portfolio to minimize portfolio standard deviation 
minSD.portfolio <- add.objective(portfolio=init.portfolio,
 type="risk",
 name="StdDev")

# Add objectives for portfolio to maximize mean per unit ES 
meanES.portfolio <- add.objective(portfolio=init.portfolio,
 type="return",
 name="mean")

meanES.portfolio <- add.objective(portfolio=meanES.portfolio,
 type="risk",
 name="ES")

Note that the name argument in add.objective() can be any valid R function. Several functions are provided in the PerformanceAnalytics package that can be specified as the name argument such as ES/ETL/CVaR, StdDev, etc.

The following objective types are supported:

return
risk
risk_budget
weight_concentration

As demonstrated above, the add.constraint() and add.objective() functions were designed to be very flexible and modular so that constraints and objectives can easily be specified and added to portfolio objects.

PortfolioAnalytics provides a print() method so that we can easily view the assets, constraints, and objectives that we have specified for the portfolio.

print(minSD.portfolio)
## ************************************************** 
## PortfolioAnalytics Portfolio Specification 
## ************************************************** 
## 
## Call: 
## portfolio.spec(assets = funds) 
## 
## Assets 
## Number of assets: 6 
## 
## Asset Names 
## [1] "Convertible Arbitrage" "CTA Global" "Distressed Securities" 
## [4] "Emerging Markets" "Equity Market Neutral" "Event Driven" 
## 
## Constraints 
## Number of constraints: 2 
## Number of enabled constraints: 2 
## Enabled constraint types 
## - full_investment 
## - long_only 
## Number of disabled constraints: 0 
## 
## Objectives 
## Number of objectives: 1 
## Number of enabled objectives: 1 
## Enabled objective names 
## - StdDev 
## Number of disabled objectives: 0

print(meanES.portfolio)
## ************************************************** 
## PortfolioAnalytics Portfolio Specification 
## ************************************************** 
## 
## Call: 
## portfolio.spec(assets = funds) 
## 
## Assets 
## Number of assets: 6 
## 
## Asset Names 
## [1] "Convertible Arbitrage" "CTA Global" "Distressed Securities" 
## [4] "Emerging Markets" "Equity Market Neutral" "Event Driven" 
## 
## Constraints 
## Number of constraints: 2 
## Number of enabled constraints: 2 
## Enabled constraint types 
## - full_investment 
## - long_only 
## Number of disabled constraints: 0 
## 
## Objectives 
## Number of objectives: 2 
## Number of enabled objectives: 2 
## Enabled objective names 
## - mean 
## - ES 
## Number of disabled objectives: 0

Now that we have portfolios set up with the desired constraints and objectives, we use optimize.portfolio() to run the optimizations. The examples below use optimize_method="ROI", but several other solvers are supported including the following:

DEoptim (differential evolution)
random portfolios
- sample
- simplex
- grid
GenSA (generalized simulated annealing)
pso (particle swarm optimization)
ROI (R Optimization Infrastructure)
- Rglpk
- quadprog

The objective to minimize standard deviation can be formulated as a quadratic programming problem and can be solved quickly with optimize_method="ROI".

# Run the optimization for the minimum standard deviation portfolio 
minSD.opt <- optimize.portfolio(R = returns, portfolio = minSD.portfolio,
 optimize_method = "ROI", trace = TRUE)

print(minSD.opt)
## *********************************** 
## PortfolioAnalytics Optimization 
## *********************************** 
## 
## Call: 
## optimize.portfolio(R = returns, portfolio = minSD.portfolio, 
## optimize_method = "ROI", trace = TRUE) 
## 
## Optimal Weights: 
## Convertible Arbitrage CTA Global Distressed Securities 
## 0.0000 0.0652 0.0000 
## Emerging Markets Equity Market Neutral Event Driven 
## 0.0000 0.9348 0.0000 
## 
## Objective Measure: 
## StdDev 
## 0.008855

The objective to maximize mean return per ES can be formulated as a linear programming problem and can be solved quickly with optimize_method="ROI".

# Run the optimization for the maximize mean per unit ES 
meanES.opt <- optimize.portfolio(R = returns, portfolio = meanES.portfolio,
 optimize_method = "ROI", trace = TRUE)

print(meanES.opt)
## *********************************** 
## PortfolioAnalytics Optimization 
## *********************************** 
## 
## Call: 
## optimize.portfolio(R = returns, portfolio = meanES.portfolio, 
## optimize_method = "ROI", trace = TRUE) 
## 
## Optimal Weights: 
## Convertible Arbitrage CTA Global Distressed Securities 
## 0.0000 0.2940 0.2509 
## Emerging Markets Equity Market Neutral Event Driven 
## 0.0000 0.4552 0.0000 
## 
## Objective Measure: 
## mean 
## 0.006635 
## 
## 
## ES 
## 0.01837

The PortfolioAnalytics package provides functions for charting to better understand the optimization problem through visualization. The plot() function produces a plot of of the optimal weights and the optimal portfolio in risk-return space. The optimal weights and chart in risk-return space can be plotted separately with chart.Weights() and chart.RiskReward().

plot(minSD.opt, risk.col="StdDev", chart.assets=TRUE,
 main="Min SD Optimization",
 ylim=c(0, 0.0083), xlim=c(0, 0.06))

plot(meanES.opt, chart.assets=TRUE,
 main="Mean ES Optimization",
 ylim=c(0, 0.0083), xlim=c(0, 0.16))

This post demonstrates how to construct a portfolio object, add constraints, and add objectives for two simple optimization problems; one to minimize portfolio standard deviation and another to maximize mean return per unit expected shortfall. We then run optimizations on both portfolio objects and plot the results of each portfolio optimization. Although this post demonstrates fairly simple constraints and objectives, PortfolioAnalytics supports complex constraints and objectives as well as many other features that will be covered in subsequent posts.

The PortfolioAnalytics package is part of the ReturnAnalytics project on R-Forge. For additional examples and information, refer to the several vignettes and demos are provided in the package.

R/Finance 2014 Registration Open

Sat, 29 Mar 2014 12:24:00 -0500

As announced on the R-SIG-Finance mailing list, registration for R/Finance 2014 is now open! The conference will take place May 17 and 18 in Chicago.

Building on the success of the previous conferences in 2009-2013, we expect more than 250 attendees from around the world. R users from industry, academia, and government will joining 30+ presenters covering all areas of finance with R.

We are very excited about the four keynote presentations given by Bob McDonald, Bill Cleveland, Alexios Ghalanos, and Luke Tierney. The main agenda (currently) includes 16 full presentations and 21 shorter “lightning talks”. We are also excited to offer four optional pre-conference seminars on Friday morning.

The (optional) conference dinner will once-again be held at The Terrace at Trump Hotel. Overlooking the Chicago river and skyline, it is a perfect venue to continue conversations while dining and drinking.

More details of the agenda are available at:
http://www.RinFinance.com/agenda/

Registration information is available at:
http://www.RinFinance.com/register/

and can also be directly accessed by going to:
http://www.regonline.com/RFinance2014

We would to thank our 2014 Sponsors for the continued support enabling us to host such an exciting conference:

International Center for Futures and Derivatives at UIC

Revolution Analytics
MS-Computational Finance at University of Washington

OneMarketData
RStudio
TIBCO
SYMMS
paradigm4

On behalf of the committee and sponsors, we look forward to seeing you in Chicago!

Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson, Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich

Optimizing quantstrat: from 30 hours to 30 minutes

Mon, 04 Nov 2013 06:37:00 -0600

The complaint I hear most frequently about quantstrat is that it’s slow, especially for large data. Some of this slow performance is due to quantstrat treating all strategies as path-dependent by default. Path dependence requires rules to be re-evaluated for each timestamp with a signal. More signals equates to longer run-times.

If your strategy is not path-dependent, you can get a fairly substantial performance improvement by turning path-dependence off. If your strategy truly is path-dependent, keep reading…

I started working with Brian Peterson in late August of this year, and we’ve been working on a series of very large backtests over the last several weeks. Each backtest consisted of ~7 months of 5-second data on 72 instruments with 15-20 different configurations for each.

These backtests really pushed quantstrat to its limits. The longest-running job took 32 hours. I had some time while they were running, so I decided to profile quantstrat. I was able to make some substantial improvements, so I thought I’d write a post to tell you what’s changed and highlight some of the performance gains we’re seeing.

The biggest improvement came from changing how we subset the xts object in ruleOrderProc and ruleSignal. We were using the POSIXct timestamp to get the current row, even though we know the row number. It’s much faster to subset an xts object by the row number than a POSIXct object. For example:

x <- .xts(1:1e8, 1:1e8) # 10mm obs
n <- 1e8/2 # row number
i <- index(x)[n] # timestamp for row 'n'

# by POSIXct timestamp
system.time(x[i,])
## user system elapsed
## 0.181 0.299 0.481

# by integer (basically instantaneous)
system.time(x[n,])
## user system elapsed
## 0.001 0.000 0.000

This change alone got the job runtime down to just over 2 hours (from 32 hours). If you think I would be happy enough with that, you don’t know me. Several other changes helped get that 2-hour run down to under 30 minutes.

We now calculate periodicity(mktdata) in applyRules and pass that value to ruleOrderProc This avoids re-calculating that value for every order, since mktdata doesn’t change inside applyRules
The dimension reduction algorithm has to look for the first time the price crosses the limit order price. We were doing that with a call to which(sigThreshold(...))[1]. The relational operators (<, <=, >, >=, and ==) and which operate on the entire vector, but we only need the first value, so I replaced that code with a with C-based .firstThreshold function that stops as soon as it finds the first cross.
applyStrategy only accumulates values returned from applyIndicators, applySignals, and applyRules if debug=TRUE. This saves a little time, but can save a lot of memory for large mktdata objects.
The internal ruleProc function now constructs the rule function call using the mktdata symbol instead of its deparsed values. So the rule function call now looks like
```
ruleFunction(mktdata, ...)
```
instead of
```
ruleFunction(c(50.04, 50.23, 50.42, 50.37, 50.24, 50.13, 50.12,
 50.37, 50.24, 50.22, 49.95, 50.23, 50.26, 50.22, 50.11, 49.99,
 50.33, 50.33, 50.18, 49.99), ...)
```
You can imagine how large the old call would be for a 10-million-row mktdata object.

All these changes are most significant for large data sets. The small demo strategies included with quantstrat are also faster, but the net performance gains increase as the size of the data, the number of signals (and therefore the number of rule evaluations), and number of instruments increases.

You’re still reading? What are you waiting for? Go install the latest from GitHub and try it for yourself!

R/Finance 2014 Call for Papers

Thu, 17 Oct 2013 20:28:00 -0500

We’re getting ready for this year’s R/Finance conference. Here’s the call for papers. I hope to see you there!

R/Finance 2014: Applied Finance with R
May 16 and 17, 2014
University of Illinois at Chicago

The sixth annual R/Finance conference for applied finance using R will be held on May 16 and 17, 2014 in Chicago, IL, USA at the University of Illinois at Chicago. The conference will cover topics including portfolio management, time series analysis, advanced risk tools, high-performance computing, market microstructure, and econometrics. All will be discussed within the context of using R as a primary tool for financial risk management, portfolio construction, and trading.

Over the past five years, R/Finance has included attendees from around the world. It has featured presentations from prominent academics and practitioners, and we anticipate another exciting line-up for 2014.

We invite you to submit complete papers in pdf format for consideration. We will also consider one-page abstracts (in txt or pdf format), although more complete papers are preferred. We welcome submissions for both full talks and abbreviated “lightning talks”. Both academic and practitioner proposals related to R are encouraged.

Presenters are strongly encouraged to provide working R code to accompany the presentation/paper. Data sets should also be made public for the purposes of reproducibility (though we realize this may be limited due to contracts with data vendors). Preference may be given to presenters who have released R packages.

The conference will award two (or more) $1000 prizes for best papers. A submission must be a full paper to be eligible for a best paper award. Extended abstracts, even if a full paper is provided by conference time, are not eligible for a best paper award. Financial assistance for travel and accommodation may be available to presenters at the discretion of the conference committee. Requests for assistance should be made at the time of submission.

Please submit your papers or abstracts online at: goo.gl/OmKnu7. The submission deadline is January 31, 2014. Submitters will be notified via email by February 28, 2014 of acceptance, presentation length, and decisions on requested funding.

For the program committee:
Gib Bassett, Peter Carl, Dirk Eddelbuettel, Brian Peterson,
Dale Rosenthal, Jeffrey Ryan, Joshua Ulrich

R/Finance 2013 Review

Tue, 28 May 2013 08:00:00 -0500

It’s been one week since the 5th Annual R/Finance conference, and I finally feel sufficiently recovered enough to share my thoughts. The conference is a two-day whirlwind of applied quantitative finance, fantastic networking, and general geekery.

Friday morning seminar:
I went to (and live-tweeted) Jeff Ryan’s seminar because I wanted to learn more about how he uses mmap+indexing with options data. There I realized that POSIXlt components use a zero-based index because they mirror the underlying tm struct, and that mmap+indexing files can be shared across cores and you can read them from other languages (e.g. Python).

Friday talks:
The first presentation was by keynote Ryan Sheftel, who talked about how he uses R on his bond trading desk. David Ardia showed how expected returns can be estimated via the covariance matrix. Ronald Hochreiter gave an overview of modeling optimization via his modopt package. Tammer Kamel gave a live demo of the Quandl package and said, “Quandl hopes to do to Bloomberg what Wikipedia did to Britannica.”

I had the pleasure of introducing both Doug Martin, who talked about robust covariance estimation, and Giles Heywood, who discussed several ways of estimating and forecasting covariance, and proposed an “open source equity risk and backtest system” as a means of matching talent with capital.

Ruey Tsay was the next keynote, and spoke about using principal volatility components to simplify multivariate volatility modeling. Alexios Ghalanos spoke about modeling multivariate time-varying skewness and kurtosis. Unfortunately, I missed both Kris Boudt’s and David Matteson’s presentations, but I did get to see Winston Chang’s live demo of Shiny.

Friday food/networking:
The two-hour conference reception at UIC was a great time to have a drink, talk with speakers, and say hello to people I had never met in person. Next was the (optional) dinner at The Terrace at Trump. Unfortunately, it was cold and windy, so we only spent 15-20 minutes on the terrace before moving inside. The food was fantastic, but the conversations were even better.

Saturday talks:

I missed the first block of lightning talks. Samantha Azzarello discussed her work with Blu Putnam, which used a dynamic linear model to evaluate the Fed’s performance vis-a-vis the Taylor Rule. Jiahan Li used constrained least squares on 4 economic fundamentals to forecast foreign exchange rates. Thomas Harte talked about regulatory requirements of foreign exchange pricing (and wins the award for most slides, 270); basically documentation is important, Sweave to the rescue!

Sanjiv Das gave a keynote on 4 applications: 1) network analysis on SEC and FDIC filings to determine banks that pose systematic risk, 2) determining which home mortgage modification is optimal, 3) portfolio optimization with mental accounting, 4) venture capital communities.

I had the pleasure of introducing the following speakers: Dirk Eddelbuettel showed how it’s easy to write fast linear algebra code with RcppArmadillo. Klaus Spanderen showed how to use QuantLib from R, and even how to to call C++ from R from C++. Bryan Lewis talked about SciDB and the scidb package (SciDB contains fast linear algebra routines that operate on the database!). Matthew Dowle gave an introduction to data.table (in addition to a seminar).

Attilio Meucci gave his keynote on visualizing advanced risk management and portfolio optimization. Immediately following, Brian Peterson gave a lightning on implementing Meucci’s work in R (Attilio works in Matlab), which was part of a Google Summer of Code project last year.

Thomas Hanson presented his work with Don Chance (and others) on computational issues in estimating the volatility smile. Jeffrey Ryan showed how to manipulate options data in R with the greeks package.

The conference wrapped up by giving away three books, generously donated by Springer, to three random people who submitted feedback surveys. I performed the random drawing live on stage, using my patent-pending TUMC method (I tossed the papers up in the air).

The committee also presented the awards for best papers. The winners were:

Regime switches in volatility and correlation of ﬁnancial institutions, Boudt et. al.
A Bayesian interpretation of the Federal Reserve’s dual mandate and the Taylor Rule, Putnam & Azzarello
Nonparametric Estimation of Stationarity and Change Points in Finance, Matteson et. al.
Estimating High Dimensional Covariance Matrix Using a Factor Model, Sun (best student paper)

Saturday food/networking:

The whirlwind came to a close at Jaks Tap. I was finally able to ask speed-obsessed Matthew Dowle about potential implementations of a multi-type xts object (a Google Summer of Code project this year). I also spoke to a few people about how to add options strategy backtesting to quantstrat.

Last, but not least: none of this would be possible without the support of fantastic sponsors: International Center for Futures and Derivatives at UIC, Revolution Analytics, MS-Computational Finance at University of Washington, Google, lemnica, OpenGamma, OneMarketData, and RStudio.

FOSS Trading

xts_0.13.2 on CRAN

Features

Enhancements

Bug Fixes

Adaptive Asset Allocation Extended

1. Equal weight portfolio of all asset classes

2. Equal risk contribution using all asset classes

3. Equal weight portfolio of highest momentum asset classes

4. Equal risk contribution portfolio of highest momentum asset classes

5. Minimum variance portfolio of highest momentum asset classes

Conclusion

Portfolio Results by Sample Period

Replication Period

Out-of-Sample: 2015-2023

Out-of-Sample: 2015-2021

Adaptive Asset Allocation Replication

Data

Replication

1. Equal weight portfolio of all asset classes

2. Equal risk contribution using all asset classes

3. Equal weight portfolio of highest momentum asset classes

4. Equal risk contribution portfolio of highest momentum asset classes

5. Minimum variance portfolio of highest momentum asset classes

Conclusion

quantmod_0.4.25 on CRAN

New Features

Bug Fixes

Running TimeBase in Docker

Getting Started

Running the Docker Containers

Import Data

Other Posts in this TimeBase Series

Streaming Market Data with TimeBase

Preface

What’s Your Problem?

TimeBase vs Alternatives

TimeBase Structure

What’s Next?

Other Posts in this TimeBase Series

getSymbols Rebooted

Background

Motivation

Refresher on how getSymbols() works

What we’ve learned

1. Automatically creating objects

2. Data source methods

3. Ticker symbology

Same security, different ticker

Invalid R object names

4. ‘Defaults’ functionality

‘rfimport’ design and features

Symbol specification

Ticker symbology

Generic import functions

Data source methods

Returned data

Providing ‘Defaults’ functionality

Open questions and considerations

How should we specify the class of the returned object?

How can we make it easier to manipulate results?

I need your help!

xts_0.13.1 on CRAN

Patches for features added in 0.13.0

Bug Fixes

Chores

quantmod_0.4.22 on CRAN

New Features

Bug Fixes

Chores

Changes in Prior Versions

New Features

Bug Fixes

xts_0.13.0 on CRAN

New Features

Enhancements

Bug Fixes

Other

xts_0.12.2 on CRAN

Plotting functionality enhancements and bug fixes

Refresher on how `getSymbols()` works