I almost wish Hadley had forked R to make the tidyverse. What I usually see are people that start using tidy functions and coding style, but at some point they realize they don’t know how to do something the tidy way or something hasn’t been implemented in a tidy package yet, so they fall back to base R.
Imho, transitioning from tidy to base R makes your code less readable than just using base R throughout.
If the tidyverse were forked and base R functions weren’t available then people would be forced to come up with a different solution and maybe they would stay committed to being tidy. I realize that probably won’t ever happen, there is too much work to reimplement all the missing base R functions.
Theres more to the R ecosystem than tidyverse packages. There's a whole suite of absolutely amazing R packages in the bioconductor ecosystem that rival tidyverse in speed and ease of use but targeting other data structures.
Some of the tidyverse packages are over kill and contain lots of foot guns.
I've seen code that was clean get butchered because someone had no idea how to do something basic in base R.
There's also another separate ecosystem for doing stats with their own flavors.
I worked in a job a decade back where I was the only tech guy and had a special 128 GB RAM machine. All the 'Big data' for the team was done by me using R tidyverse, data.table and few libraries and they thought of it as magic as there were few tech people there.
Still feel a lot of enterprises and industries looked over its capabilities then.
With LLMs the challenge of R syntax is a little easier for data analysts to climb, especially the new ones.
As a daily R user, hard disagree. With the exception of ggplot (and this is directly related to why I don't use ggplot and instead use base plotting), most of tidyverse is pretty similar to and consistent with most base R functions.
Tidyverse standalone would be borderline useless, as most of what it's best at is manipulating, transorming, and re-arranging your data in various ways. You still need to _do_ something with your data at the end, at which point, the entire rest of the R ecosystem comes into play.
Tidyverse is valuable specifically because it's the best at doing what it does, and what it does makes everything else easier, more legible, and faster.
Forking it would simultaneously make both R and tidyverse worse off.
> most of tidyverse is pretty similar to and consistent with most base R functions
What? The main tidyverse packages are popular because they are different from base R. If the packages duplicated base R functionality and usage was the same then nobody would use them.
> You still need to _do_ something with your data at the end, at which point, the entire rest of the R ecosystem comes into play.
This is exactly my point. You could use tidymodels or any number of packages to keep your code tidy, but people just bail after wrangling their data a little and then their code is disconnected. You might as well have done all your data cleaning with base R if you were going to fit a model outside the tidyverse anyway.
What I meant was that they are syntactically similar. They work the way that default base R functions work. They _look_ like base R functions. They aren't the same as base R functions. They fit smoothly into base R, often filling holes that base R has. One can (and I do) use base R and tidyverse functions with each other all the time
This is as opposed to ggplot. Which legitimately seems like a completely different language. It looks, reads, and acts differently than base R plotting. It sticks out like a sore thumb, and, in my opinion, does not have enough functionality to justify the departure from standard R conventions. Which is why I don't use it.
As to restating your point: Your original comment combined with what you have said here makes me completely confused. The fact that people don't "stay" in the tidyverse is evidence that it is well integrated and _shouldn't_ be forked. You can use it for what it's good for, and then go use other things that are better at what they are doing.
If people regularly did the entire pipeline of import > data manipulation > data analysis and never left the tidyverse, then you would have an argument that it should be forked.
The fact that people dont do this is evidence that it belongs how it is: a package.
I don't really understand your comment about "disconnected". My code doesn't feel disconnected other than that different portions of it are doing different things. But then again, I also think that tidyverse functions don't look that different from base R functions (which, again, is not the same thing as being the same as already existent R functions).
There's a school of thought of using mostly base R, for all its flaws it already had before Hadley, and selectively using some tidyverse packages. Base R has been the de-facto coding standard for academic statisticians for decades, with all the wealth of open source packages that that entails, and some of the tidyverse packages are just a godsend. ggplot2 is probably the most powerful plotting library I've seen, while being fairly accessible. You don't have to subscribe to an entire philosophy for data wrangling or plotting (and may even frown at the syntax overloading) to get a huge amount of utility out of it.
This. They're basically two languages sitting on top of each other. It's fascinating seeing students who have been taught using the tidyverse try to switch gears.
As someone that's been using R for 20 years, I don't necessarily wish that had happened, but I think the trend to teach intro R using the tidyverse is a bad development. People that use the tidyverse don't realize that it's complex. There are no doubt complex and frustrating parts of base R. For the most basic things, base R is natural. The tidyverse has you piping and using advanced concepts from the start.
From what I saw in the latest "language" surveys or whatever, R does seemingly seem to be making a slight comeback. I was actually surprised at its place above Ruby, iirc. Again, not that these surveys are the end-all-be-all, but I've also started to see a lot more data science postings that have R or Python be a requirement, where I feel like for a few years it was ALL Python.
Imho, transitioning from tidy to base R makes your code less readable than just using base R throughout.
If the tidyverse were forked and base R functions weren’t available then people would be forced to come up with a different solution and maybe they would stay committed to being tidy. I realize that probably won’t ever happen, there is too much work to reimplement all the missing base R functions.
reply