Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which behavior should be expected from merge.xts regarding suffixes? #371

Closed
pierre-lamarche opened this issue Jul 28, 2022 · 2 comments
Closed
Labels
enhancement Enhancement to existing feature
Milestone

Comments

@pierre-lamarche
Copy link
Contributor

Description

As a newcomer to the xts package, I truly appreciate the intuitive way this package makes it possible to deal with time series in R. Yet I am still puzzled by the behavior of the merge.xts function, although I understand some way has been walked since the latest release of the package. Especially regarding suffixes, the function adds an extra dot separator between the former name of the column(s) and the suffix, even though you specify an empty suffix (which should be regarded as the will to leave the name of the column unchanged in my opinion). I am aware this is consistent with the merge.zoo function, yet as a naive user, I would rather expect a behaviour of such function consistent with the merge.default or merge.data.frame, especially for features that are set through same parameters.

Expected behavior with minimal reproductible example

Taking example from previous issues, here's a piece of code and its current behavior with the dev version of the package:

remotes::install_github("https://github.com/joshuaulrich/xts")

library(xts) 
a <- data.frame(alpha=1:10, beta=2:11) 
xts1 <- xts(x=a, order.by=Sys.Date() - 1:10) 
b <- data.frame(alpha=3:12, beta=4:13) 
xts2 <- xts(x=b, order.by=Sys.Date() - 1:10) 
c <- data.frame(alpha=5:14, beta=6:15) 
xts3 <- xts(x=c, order.by=Sys.Date() - 1:10)                                               

merge(xts1, xts2, xts3, suffixes=c("", "t_1", "t_2"))

Here I obtain columns with suffixes ., .t_1 and .t_2, although it may not be my wish to get such a dot (especially for the first set of columns). I would expect to be able to specify the whole suffix the way merge.default does (in this case, I choose to specify _ rather than . just for the sake of the example):

merge(xts1, xts2, xts3, suffixes=c("", "_t_1", "_t_2"))

In my opinion, it would be a more intuitive way of dealing with suffixes, also assuming you would like to implement a treatment on time series mimicking as much as possible the way dataframes are processed in R.

Browsing the code, it looks like I can see which lines would have to be adapted to address the issue the way I am suggesting. I would be more than happy to have a try and submit a PR, would you find this opinion agreeable to some point.

@joshuaulrich
Copy link
Owner

Thanks for the report/comment and pointing out that the merge() generic doesn't add a '.' between the column name and the suffix. I understand expecting merge.xts() (and merge.zoo()) to be consistent with that.

I think we can make merge.xts() consistent with the generic because the functionality didn't work until recently, and it hasn't been released to CRAN yet.

That said, it's way too late to make that change to merge.zoo(). I'll talk with the authors about adding an optional argument (e.g. suffix.sep = ".") that will allow uses to remove the '.' separator.

It would be great if you could submit a PR! Change the 2 unit tests first and that will help ensure your changes do what you expect.

@zeileis
Copy link
Collaborator

zeileis commented Aug 21, 2022

Thanks for the suggestion. I have just added an argument merge(..., sep = ".") to the development version of zoo (revision 1176) on R-Forge. You can install it via:

install.packages("zoo", repos = "https://R-Forge.R-project.org")

Then we can set up data similar to that in the post above:

library("zoo")
a <- data.frame(alpha=1:5, beta=2:6) 
b <- data.frame(alpha=3:7, beta=4:8) 
c <- data.frame(alpha=5:9, beta=6:10) 
z1 <- zoo(a, Sys.Date() - 1:5)
z2 <- zoo(b, Sys.Date() - 1:5)
z3 <- zoo(c, Sys.Date() - 1:5)

And then we can assign custom suffixes with a custom separator like this:

merge(z1, z2, z3, suffixes = c("", "_t_1", "_t_2"), sep = "")
##            alpha beta alpha_t_1 beta_t_1 alpha_t_2 beta_t_2
## 2022-08-17     5    6         7        8         9       10
## 2022-08-18     4    5         6        7         8        9
## 2022-08-19     3    4         5        6         7        8
## 2022-08-20     2    3         4        5         6        7
## 2022-08-21     1    2         3        4         5        6

In this application I agree it would be more convenient to just set the suffixes with no need to set the sep as well. But as Josh @joshuaulrich already pointed out, it is too late to change that now. I think the solution should be tolerable, though.

Let me know if you notice anything that doesn't work as intended and/or advertised.

@joshuaulrich joshuaulrich added this to the 0.12.2 milestone Oct 6, 2022
@joshuaulrich joshuaulrich added the enhancement Enhancement to existing feature label Oct 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement to existing feature
Projects
None yet
Development

No branches or pull requests

3 participants