last.default() isn't working properly for data.frames with indices as row names #226

DavisVaughan · 2018-03-01T23:48:52Z

Super quick reprex:

library(xts)
ex <- data.frame(x = c(1,2), row.names = c("2017-01-01", "2017-01-02"))
last(ex)
#> Error in `[.data.frame`(x, ((NROW(x) - n + 1):NROW(x))): undefined columns selected

Theoretically, if a data.frame has row names that correspond to indices it can be converted to xts with no issue. Because of this, this block of the if statement in xts:::last.default() is run:

xx <- try.xts(x, error = FALSE)
if (is.xts(xx)) {
    xx <- last.xts(x, n = n, keep = keep, ...)
    return(reclass(xx))
}

However, inside last.xts(x, ...) the row-wise subsetting of xts is used. This causes problems with data.frames as you can see in the above error.

Is there any reason you can't just pass xx to last.xts() rather than x? The reclass() should bring it back to a data.frame() in the end. There may be issues with other objects that can be converted to xts that I am not immediately seeing.

So maybe this?

xx <- try.xts(x, error = FALSE)
if (is.xts(xx)) {
    xxx <- last.xts(xx, n = n, keep = keep, ...)
    return(reclass(xxx))
}

The text was updated successfully, but these errors were encountered:

DavisVaughan · 2018-03-01T23:56:04Z

FWIW, I forked and changed that bit and your unit tests all still pass. However, I don't know if anything was specifically testing for this.

Yay

library(xts)
ex <- data.frame(x = c(1,2), row.names = c("2017-01-01", "2017-01-02"))
ex_last <- last(ex, n = "day")
ex_last
#>            x
#> 2017-01-02 2

class(ex_last)
#> [1] "data.frame"

joshuaulrich · 2018-03-02T00:15:06Z

Thanks for the report! I don't believe there are any unit tests for this... so I'll investigate the code and change history to see if I can uncover any lurking dragons.

DavisVaughan · 2018-03-02T00:15:55Z

I appreciate it!

joshuaulrich · 2018-03-02T00:37:04Z

Notes to future me: This first appears in xts in commit 17ca5ea, which appears to be a direct move from quantmod somewhere close to commit joshuaulrich/quantmod@8f9c1bc. Code was finally deleted in joshuaulrich/quantmod@5b2197c.

joshuaulrich · 2018-03-06T12:15:39Z

first.default() is also broken, but in an insidious way:

ex <- data.frame(x = 1:9, y = 9:1, row.names = format(Sys.Date()-9:1))
first(ex)
#            x
# 2018-02-25 1
# 2018-02-26 2
# 2018-02-27 3
# 2018-02-28 4
# 2018-03-01 5
# 2018-03-02 6
# 2018-03-03 7
# 2018-03-04 8
# 2018-03-05 9
first(as.xts(ex))
#            x y
# 2018-02-25 1 9

DavisVaughan · 2018-03-06T23:06:30Z

Seems to be from xx <- x[1:n] in first.xts(). So the same rowwise subsetting is the culprit.

I guess the "easiest" fix is to convert all x[1:n] style calls in first.xts and last.xts to the explicit x[1:n,] style. This would work on xts and data.frames. Not sure if you guys get performance boosts from using the x[1:n] style call though?

Update: Speed looks about the same.

library(xts)
data("sample_matrix")
ex <- as.xts(sample_matrix)

idx <- 1:30

identical(ex["2007-01"], ex[idx])
#> [1] TRUE

microbenchmark::microbenchmark(
  ex["2007-01",],
  ex["2007-01"],
  ex[idx,],
  ex[idx],
  times = 1000
)
#> Unit: microseconds
#>             expr     min       lq      mean  median       uq       max
#>  ex["2007-01", ] 652.028 713.3365 833.68885 752.507 856.4175  6895.690
#>    ex["2007-01"] 650.126 711.9830 900.25484 749.143 843.0755 74472.160
#>        ex[idx, ]  14.795  19.0500  25.52245  24.656  27.2795   122.928
#>          ex[idx]  14.997  18.8765  25.11938  24.213  26.7725   161.784
#>  neval
#>   1000
#>   1000
#>   1000
#>   1000

joshuaulrich · 2018-03-07T13:08:59Z

I guess the "easiest" fix is to convert all x[1:n] style calls in first.xts and last.xts to the explicit x[1:n,] style. This would work on xts and data.frames. Not sure if you guys get performance boosts from using the x[1:n] style call though?

While hypotheses about various fixes are potentially helpful, it's even more helpful to have unit tests that validate the intended functionality. For example, see my commits on the 226-first-last branch. The tests for ts came from failing examples after my first attempted fix idea. So even though that fix didn't work, the attempt did provide a new unit test case.

These tests show that first() and last() work as expected for zoo objects and vectors. The tests for data.frame and matrix objects are deactivated because they either error or fail. Note that first(x, -n) returns the same result as tail(x, -n), and last(x, -n) returns the same result as head(x, -n). See #226.

Make first() and last() keep dims when a regular row subset would normally drop them. This is consistent with head() and tail(). Re-activate all unit tests for first() and last(). See #226.

The fix for #226 introduced a regression for the case when 'x' is an xtsible vector and 'n' is a character period (e.g. "1 day"). This occurred because the default methods for first() and last() call the xts methods--on the *original* object--if 'x' can be coerced to xts. For example, a call to first(.Date(1:3), "1 day") would call first.xts(.Date(1:3), "1 day") This was a problem because first.xts() and last.xts() assumed 'x' would always have dims. Fixes #303.

joshuaulrich self-assigned this Mar 6, 2018

joshuaulrich added the bug label Mar 6, 2018

joshuaulrich closed this as completed in 5f6476b Jul 9, 2018

joshuaulrich added this to the Release 0.11-0 milestone Jul 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

last.default() isn't working properly for data.frames with indices as row names #226

last.default() isn't working properly for data.frames with indices as row names #226

DavisVaughan commented Mar 1, 2018

DavisVaughan commented Mar 1, 2018

joshuaulrich commented Mar 2, 2018

DavisVaughan commented Mar 2, 2018

joshuaulrich commented Mar 2, 2018 •

edited

joshuaulrich commented Mar 6, 2018

DavisVaughan commented Mar 6, 2018 •

edited

joshuaulrich commented Mar 7, 2018 •

edited

last.default() isn't working properly for data.frames with indices as row names #226

last.default() isn't working properly for data.frames with indices as row names #226

Comments

DavisVaughan commented Mar 1, 2018

DavisVaughan commented Mar 1, 2018

joshuaulrich commented Mar 2, 2018

DavisVaughan commented Mar 2, 2018

joshuaulrich commented Mar 2, 2018 • edited

joshuaulrich commented Mar 6, 2018

DavisVaughan commented Mar 6, 2018 • edited

joshuaulrich commented Mar 7, 2018 • edited

joshuaulrich commented Mar 2, 2018 •

edited

DavisVaughan commented Mar 6, 2018 •

edited

joshuaulrich commented Mar 7, 2018 •

edited