HTTP Error 401 when called on large amount of tickers #360

rpfreitasxyz · 2022-04-30T00:33:52Z

Hi, Thank you very much for the fix and comments here. getSymbols.yahoo() now works for me, but I have a different problem. When I run getSymbols.yahoo() successfully in a loop for more than about 300-400 tickers, I started to get "HTTP error 401" for all following downloads. There are some failed downloads of invalid tickers in between though. Does anyone know what the issue is? Maybe I have the same problem with @rhamo.

Here is an example of the subsequent download:

> getSymbols.yahoo("AAPL",from="2022-04-27",to="2022-04-29",auto.assign=FALSE)
Warning: AAPL download failed; trying again.
Warning: Unable to import "AAPL".
AAPL download failed after two attempts. Error message:
HTTP error 401.
[1] "Error in open.connection(file, \"rt\") : HTTP error 401.\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in open.connection(file, "rt"): HTTP error 401.>

Thank you!

Originally posted by @edwinhung in #358 (comment)

Thank you all for the quick fix in regards to tq_get()! However, I believe that this new issue has arisen.

The text was updated successfully, but these errors were encountered:

Knxd3 · 2022-04-30T11:39:51Z

Is a fix planned for tq_get? getSymbols works with many tickers because of the 1 second pause. Thanks.

msperlin · 2022-05-01T12:20:03Z

The limits do not seem to be too restrictive. After reaching the 404 error, I was able to get successfull api calls in a few minutes.

After that, I downloaded all sp500 stocks (2010-today) in a single call:
df_sp500 <- yfR::yf_collection_get("SP500", first_date = '2010-01-01')

If anyone can confirm this in your own R session, please do.

It looks as the restrictions are based on time between api calls for the same ip. This could invalidate any parallel computation, which is what I'm testing now.

msperlin · 2022-05-01T13:01:03Z

As expected, any parallel use of quantmod::getSymbol() reaches the limit very easilly. As such, I'm removing the parallel option from yfR and BatchGetSymbols.

When using a single session (non-parallel), yfR runs fine for any large sample of stocks.

msperlin · 2022-05-01T13:01:41Z

If you can, please confirm if the code below runs fine:

remotes::install_github("msperlin/yfR")

df_sp500 <- yfR::yf_collection_get("SP500", first_date = '2010-01-01')

rpfreitasxyz · 2022-05-01T13:49:52Z

If you can, please confirm if the code below runs fine:
remotes::install_github("msperlin/yfR")

df_sp500 <- yfR::yf_collection_get("SP500", first_date = '2010-01-01')

The code runs fine, with a caveat:

The issue really seems to be the amount of API calls. Even though they're not in parallel, they are still (500+) sequential calls to the Yahoo! Finance API, and as such, it is quite inconsistent whether or not the whole dataframe will be downloaded. I believe this has already been worked around with your implementation of a cache system.

Thus, if it hasn't been downloaded completely at once, I suggest users wait for a few minutes, then run the code again, until completion.

Either way, thank you, @msperlin!

msperlin · 2022-05-02T11:10:31Z

Yes, lack of consistency in equivalent calls to BatchGetSymbols/yf_get can be troublesome. I'll see how I can control for this, at least letting the user know about the 404 error.

There seems to be a rate limit for the number of tickers you can request via the CSV endpoint. The yfinance python library [1] uses the JSON endpoint and doesn't seem to have rate limit issues. [1] https://github.com/ranaroussi/yfinance Closes #362. See #360.

joshuaulrich · 2022-05-22T18:44:30Z

I just added an option to use the JSON endpoint instead of the CSV endpoint. Can you try that and see if you still get the 401 responses? You can install the patch via: remotes::install_github("joshuaulrich/quantmod@362-yahoo-json-endpoint"). Then call quantmod::getSymbols("SPY", use.json.api = TRUE)

msperlin · 2022-05-22T18:52:29Z

Sure, let me try..

msperlin · 2022-05-22T19:12:07Z

I just added an option to use the JSON endpoint instead of the CSV endpoint. Can you try that and see if you still get the 401 responses? You can install the patch via: remotes::install_github("joshuaulrich/quantmod@362-yahoo-json-endpoint"). Then call quantmod::getSymbols("SPY", use.json.api = TRUE)

Changed the call to getSymbols and I tried my best to reach the 401 response, with not success. I ranned yfR with parallel execution (14 cores) and it worked as expected.

msperlin · 2022-05-22T19:12:17Z

looks good..

msperlin · 2022-05-22T20:03:33Z

anyone can test it here:

remotes::install_github("msperlin/yfR@testing-json-entry")

library(yfR)

n_workers <- parallel::detectCores() - 1
future::plan(future::multisession, workers = n_workers)
available_collections <- yf_get_available_collections()

df <- yf_collection_get(collection = "SP500",
                          first_date = Sys.Date() - 10*365,
                          last_date = Sys.Date(),
                          do_parallel = TRUE)

dplyr::n_distinct(df$ticker)

@joshuaulrich please let me know if and when you're incorporating these changes.. I'll wait for your update in CRAN.

thanks.

joshuaulrich · 2022-05-22T20:10:11Z

I'm considering whether or not to make the JSON endpoint the default for getSymbols.yahoo(). It seems like it's a better endpoint, but I don't have any experience with it. @msperlin, what do you think about switching from the CSV endpoint to the JSON endpoint for the default?

msperlin · 2022-05-22T20:18:59Z

Being honest, I'm not sure. I'll have to think about that. Quality wise, I suspect the YF data comes from the same source and, wheter it is json or csv, the output should be the same.

But, the csv entrypoint is restricted by IP, which forces user to behave better, which is good. The restriction is also not that bad (I can still download everything I need for my classes, for example). While I would prefer to allow parallel computing with yfR, I also know that we should be thankful to YF for still keeping the API open...

what do you think?

ethanbsmith · 2022-05-22T20:25:05Z

some thoughts:

since you've done the bulk of the work of moving to the V8 api, may as well loosen the validation on period to support intra-day and kill #351

also, why not just remove the v7 code path entirely. in principle, i think supporting code to work around throttling (if thats what yahoo is doing) is not really a worth while battle. moving to higher rev makes sense to me, but if yahoo is really throttling and serious about it, its going to comer up again sooner or later. i'd just see this as another notch in the growing list of issues w/ yahoo data in general

joshuaulrich · 2022-05-23T14:48:19Z

I suspect the YF data comes from the same source and, whether it is json or csv, the output should be the same.

I would hope so, but I wouldn't be surprised if there are some differences... because data is awful. ;)

Also, why not just remove the v7 code path entirely.

I'm thinking the same thing. Thanks for mentioning the intra-day issue. That's a great point.

The v7 endpoint seems to be rate-limited, and the v8 endpoint includes intra-day data. See #360. See #362.

### Changes in 0.4.22 (2023-04-05) 1. Move jsonlite from Suggests to Imports so it doesn't cause a problem when a package that doesn't also Suggest jsonlite uses getSymbols(). Thanks to Kurt Hornik for the report and fix! [#380](joshuaulrich/quantmod#380) ### Changes in 0.4.21 (2023-03-29) 1. Fix S3 method issues. R-devel (83995-ish) added a check for possible S3 method issues. Register methods it found that were not registered: `str.replot()`, `seriesHi.timeSeries()`, and `seriesLo.timeSeries()`. It was also confused by `range.bars()` and `unique.formula.names()`. Remove `unique.formula.names()` because it wasn't exported or used internally. Rename `range.bars()` to `rangeBars()`, which isn't exported. Thanks to Kurt Hornik for the report! [#375](joshuaulrich/quantmod#375) 1. Remove "^" prefix from `getSymbols()` return value. When the 'Symbols' argument has a "^" prefix and `auto.assign = TRUE`: * `getSymbols()` removes the "^" from the object it creates, but * returns the 'Symbols' argument unchanged, and * removes the "^" from the column names of the object it creates. The example below will create an object named `IXIC` but the value of `sym` will be "^IXIC". sym <- getSymbols("^IXIC") That means `x <- get(sym)` will not work because an object named `^IXIC` doesn't exist. [#371](joshuaulrich/quantmod#371) 1. Add 'from' and 'to' arguments to `getSymbols.FRED()`. Users expect to be able to set the 'from' and 'to' arguments for FRED data like they can for Yahoo data. Those values were ignored and the entire series was always returned. [#368](joshuaulrich/quantmod#368) 1. Change interval to 1d for `getDividends()` and `getSplits()`. The "3mo" setting caused some dividends to be missing for companies that issued monthly dividends. Note that the response to this request also includes all the OHLCV data. But it's small (less than 1MB for 60+ years of daily data). [#372](joshuaulrich/quantmod#372) 1. Handle errors in `getSplits()` and `getDividends()`. `getDividends()` didn't handle cases where the download failed, or when dividends needed to be split-adjusted but there were no splits. It also tried to set colnames on the empty xts object that's returned when there are no dividends. `getSplits()` had the same colnames issue. Check for no splits by testing for `NULL` because that's more explicit. Thanks to Chris Cheung for the report! [#366](joshuaulrich/quantmod#366) 1. Export `HL()`, `is.HL()`, and `has.HL()` functions and add documentation. These were added in 0.4.18 but not exported or included in the documentation. 1. Use Yahoo Finance v8 JSON endpoint and remove the v7 CSV endpoint. There seems to be a rate limit for the number of tickers you can request via the CSV endpoint. The [yfinance python library](https://github.com/ranaroussi/yfinance) uses the JSON endpoint and doesn't seem to have rate limit issues. [#360](joshuaulrich/quantmod#360) [#362](joshuaulrich/quantmod#362) [#364](joshuaulrich/quantmod#364)

joshuaulrich mentioned this issue Apr 30, 2022

getSymbols.yahoo() always throws "Error in new.session()" #358

Closed

msperlin mentioned this issue May 1, 2022

yf api limits? ropensci/yfR#5

Closed

rpfreitasxyz closed this as completed May 1, 2022

joshuaulrich mentioned this issue May 22, 2022

Can only download data from Yahoo Finance for 100 symbols in one session #363

Closed

joshuaulrich mentioned this issue May 22, 2022

Add ability for getSymbols.yahoo() to import from JSON endpoint #362

Closed

msperlin mentioned this issue May 25, 2022

Error in download ropensci/yfR#8

Closed

joshuaulrich added a commit that referenced this issue May 29, 2022

Remove Yahoo Finance v7 endpoint

e5fdd6c

The v7 endpoint seems to be rate-limited, and the v8 endpoint includes intra-day data. See #360. See #362.

joshuaulrich added this to the Release 0.4.21 milestone Mar 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP Error 401 when called on large amount of tickers #360

HTTP Error 401 when called on large amount of tickers #360

rpfreitasxyz commented Apr 30, 2022 •

edited by joshuaulrich

Knxd3 commented Apr 30, 2022

msperlin commented May 1, 2022

msperlin commented May 1, 2022

msperlin commented May 1, 2022 •

edited

rpfreitasxyz commented May 1, 2022

msperlin commented May 2, 2022

joshuaulrich commented May 22, 2022 •

edited

msperlin commented May 22, 2022

msperlin commented May 22, 2022

msperlin commented May 22, 2022

msperlin commented May 22, 2022 •

edited

joshuaulrich commented May 22, 2022

msperlin commented May 22, 2022 •

edited

ethanbsmith commented May 22, 2022

joshuaulrich commented May 23, 2022 •

edited

HTTP Error 401 when called on large amount of tickers #360

HTTP Error 401 when called on large amount of tickers #360

Comments

rpfreitasxyz commented Apr 30, 2022 • edited by joshuaulrich

Knxd3 commented Apr 30, 2022

msperlin commented May 1, 2022

msperlin commented May 1, 2022

msperlin commented May 1, 2022 • edited

rpfreitasxyz commented May 1, 2022

msperlin commented May 2, 2022

joshuaulrich commented May 22, 2022 • edited

msperlin commented May 22, 2022

msperlin commented May 22, 2022

msperlin commented May 22, 2022

msperlin commented May 22, 2022 • edited

joshuaulrich commented May 22, 2022

msperlin commented May 22, 2022 • edited

ethanbsmith commented May 22, 2022

joshuaulrich commented May 23, 2022 • edited

rpfreitasxyz commented Apr 30, 2022 •

edited by joshuaulrich

msperlin commented May 1, 2022 •

edited

joshuaulrich commented May 22, 2022 •

edited

msperlin commented May 22, 2022 •

edited

msperlin commented May 22, 2022 •

edited

joshuaulrich commented May 23, 2022 •

edited