Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge.xts drops column on join with an xts with no rows #222

Closed
ethanbsmith opened this issue Jan 8, 2018 · 9 comments
Closed

merge.xts drops column on join with an xts with no rows #222

ethanbsmith opened this issue Jan 8, 2018 · 9 comments
Assignees
Labels
Milestone

Comments

@ethanbsmith
Copy link
Contributor

Description

a join should always produce the union of column form the joined tables. The various types of joins alter the rows produced. specifically, a left join should always output the number of rows in the left table, and the number of columns form both the left and right tables

merge.xts does not produce the correct output columns when one of the input xts object contains no rows

I tried to find the problem, but it looks like the bulk of the code is implemented in C and beyond my abilities. let me know if i can help

Expected behavior

Expected output is that the the columns from the empty table are included, with the value specified by the fill parameter

Minimal, reproducible example

x <- xts(x = matrix(1:9, 3, 3), order.by = as.Date(17167:17169))
colnames(x) <- c("a","b","c")
### this join works correctly
y <- xts(coredata(x), order.by = index(x)+5)
merge(x,y,join="left")

### these join on the empty table produce the correct rows, but do not include columns from y
y <- x[1 == 2]
merge(x, y, join = "left")
merge(y,x,  join = "right")
merge(y,x)
merge(x,y)

Session Info

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 16299)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] XML_3.98-1.9         jsonlite_1.4         curl_2.6             data.table_1.10.5    quantmod_0.4-12      TTR_0.23-2           xts_0.10-0           zoo_1.8-0            RODBC_1.3-15         doParallel_1.0.10   
[11] iterators_1.0.8      foreach_1.4.4        plotrix_3.6-6        RevoUtilsMath_10.0.0 RevoUtils_10.0.5     RevoMods_11.0.0      MicrosoftML_1.5.0    mrsdeploy_1.1.2      RevoScaleR_9.2.1     lattice_0.20-35     
[21] rpart_4.1-11         checkpoint_0.4.0    

loaded via a namespace (and not attached):
[1] codetools_0.2-15       CompatibilityAPI_1.1.0 grid_3.4.1             R6_2.2.0               tools_3.4.1            compiler_3.4.1         rtvs_1.0.0.0           mrupdate_1.0.1     
@ethanbsmith ethanbsmith changed the title merge.xts drops column on join with an xts with no wors merge.xts drops column on join with an xts with no rows Jan 8, 2018
@joshuaulrich
Copy link
Owner

joshuaulrich commented Mar 14, 2018

Thanks for the report! I would also note that the current behavior differs from merge.zoo(). Is zoo's behavior what you expect?

x <- zoo(x = matrix(1:9, 3, 3), order.by = as.Date(17167:17169))
colnames(x) <- c("a","b","c")
y <- zoo(coredata(x), order.by = index(x)+5)
z <- y[FALSE]

merge(x, z, all = c(TRUE, FALSE))
           a.x b.x c.x a.z b.z c.z
2017-01-01   1   4   7  NA  NA  NA
2017-01-02   2   5   8  NA  NA  NA
2017-01-03   3   6   9  NA  NA  NA

merge(z, x, all = c(FALSE, TRUE))
           a.z b.z c.z a.x b.x c.x
2017-01-01  NA  NA  NA   1   4   7
2017-01-02  NA  NA  NA   2   5   8
2017-01-03  NA  NA  NA   3   6   9

merge(z, x)
           a.z b.z c.z a.x b.x c.x
2017-01-01  NA  NA  NA   1   4   7
2017-01-02  NA  NA  NA   2   5   8
2017-01-03  NA  NA  NA   3   6   9

merge(x, z)
           a.x b.x c.x a.z b.z c.z
2017-01-01   1   4   7  NA  NA  NA
2017-01-02   2   5   8  NA  NA  NA
2017-01-03   3   6   9  NA  NA  NA

@ethanbsmith
Copy link
Contributor Author

yes, the zoo merge function produces the expected output shape (number of rows and columns with NA padding). however, i think xts uses different rules for handling duplicate column names.

@joshuaulrich joshuaulrich self-assigned this Oct 10, 2022
@joshuaulrich joshuaulrich added this to the 0.12.3 milestone Nov 3, 2022
@ethanbsmith
Copy link
Contributor Author

edge case test for all inputs having zero rows

x <- xts(x = matrix(1:9, 3, 3), order.by = as.Date(17167:17169))
colnames(x) <- c("a","b","c")
merge(x[F,], x[F,]) #expect [0,6] output
merge(x[F,], x[F,], x[F,]) #expect [0,9] output

seg faults

x <- xts(x = matrix(1:9, 3, 3), order.by = as.Date(17167:17169))
merge(x[F,], x, x[F,]) #faults if run multiple times

@joshuaulrich
Copy link
Owner

joshuaulrich commented Nov 4, 2022

Your first example matches the output from merge.zoo() (except merge.zoo() converts the result from integer to numeric).

x <- xts(matrix(1:9, 3, 3), as.Date(17167:17169), dimnames = list(NULL, c("a","b","c")))
(x0 <- x[FALSE,])
##  a b c

z <- as.zoo(x)
(z0 <- z[FALSE,])
##  a b c

merge(z0, z0)
## Data:
## numeric(0)
## 
## Index:
## Date of length 0

merge(x0, x0)
## Data:
## integer(0)
## 
## Index:
## Date of length 0

@jaryan, @ggrothendieck, @zeileis, what do you think about this edge case where we merge two zoo objects with columns but zero observations? The result is an empty zoo object, but Ethan expects an object with zero observations and 6 columns.

Merging this type of zoo object with at least one zoo object that has observations returns a zoo object with the same number of columns as the sum of the columns of the inputs. So I can understand Ethan's expectation.

Thoughts?


Your second example errors for me, but it doesn't crash the R session even under valgrind and gctorture(TRUE). What's your sessionInfo()?

EDIT: Of course, it crashed right after I posted this... never mind.

merge(x0, x0, x0) 
## Error in merge.xts(x0, x0, x0) : 
##   INTEGER() can only be applied to a 'integer', not a 'double'

sessionInfo()
## R version 4.2.1 (2022-06-23)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices datasets  utils     methods   base     
## 
## other attached packages:
## [1] tinytest_1.3.1 xts_0.12.2.2   zoo_1.8-11    
## 
## loaded via a namespace (and not attached):
## [1] compiler_4.2.1  parallel_4.2.1  tools_4.2.1     bspm_0.3.10     grid_4.2.1      lattice_0.20-45

@ethanbsmith
Copy link
Contributor Author

some color on my perspective:

  1. it conforms to relational algebra, which is how i think about R matrix merging/subsetting
  2. it requires less exception handling, as the empty output can still pass through subsequent subsetting/transform code. similar to the idea behind run* functions optionally return all NA if nrow(x) < n TTR#68

joshuaulrich added a commit that referenced this issue Nov 4, 2022
This was wrong for a couple edge cases. Merging two or more zero-length
objects with column names returned an empty object. That's consistent
with zoo, but it's inconsistent with set algebra.

This change keeps the number of columns for zero-length objects, so we
have to process column names. (I also made sure that the result's
index has the attributes from the first argument, 'x').

See #222.
@ethanbsmith
Copy link
Contributor Author

testing on a3e9fb2

x <- xts(matrix(1:9, 3, 3), as.Date(17167:17169), dimnames = list(NULL, c("a","b","c")))
merge(x, 1:3) #works fine
merge(x, x, 1:3) #errors

rest looks great!

joshuaulrich added a commit that referenced this issue Nov 5, 2022
Also check both the coredata and the index, not just the coredata.

See #222.
joshuaulrich added a commit that referenced this issue Nov 5, 2022
This would return 0 columns for zero-length xts objects that had dims,
which could cause a segfault.

See #222.
@joshuaulrich
Copy link
Owner

Thanks for the testing and feedback again! These are fixed now.

I added the examples in your previous comment to the unit tests, and I'll add these too. Can you share more of the tests you're running? It would be great if you could add them to inst/tinytest/test-merge.R.

@ethanbsmith
Copy link
Contributor Author

those issues all came up running through my analytics suite. will add tests if i think of anything else

current patch works for all my scenarios! Thank you!

@joshuaulrich
Copy link
Owner

Awesome news! Thanks for testing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants