-
Notifications
You must be signed in to change notification settings - Fork 12
A framework for reproducible tables #69
Comments
I so hear this. I think the formatting aspect of summary tables in R is quite tedious and a barrier to winning people over from Excel for routine analyses. I took a crack at this with the janitor package, specifically creating tabulations and 2-way/crosstab/contingency tables and formatting them with percentages, rounding, etc. for quick publication. Though I have focused on simple counting and percentages, not any statistics; but maybe the formatting aspect could be leveraged? I'm rethinking the approach to janitor's tabulations and formatting, making the functions more modular and coherent and less a set of utilities. If this comes kind of close, maybe something could be built into janitor or those functions or ideas could be extended. Or if it should be something separate, that's great too and I'd love to help ⛏ |
This is a great idea! Creating these tables is something I find so frustrating when writing a paper or report. It's totally one of those things where I've just gone:
Except it's almost never just this once, and it adds to reproducibility hell. Having a tool(s) that makes it easier to create these sorts of tables would for sure ease one a pressure point in reproducibility.
|
I don't know if this might be of interest but I met the guy who built this the other day -I was impressed with the level of docs https://cran.r-project.org/web/packages/pivottabler/index.html |
@njtierney I'm digging |
Ah, good to know! It's great to gather all these resources together! Maybe we can work together on some examples of table we have made for papers/reports, and try all these different methods/pkgs out, and then work out what was great and what could be improved? |
Wow, it's nice to hear other people are having similar thoughts (well said @njtierney !). @sfirke and @haozhu233 - really appreciate the tools you've built and the fact that you've already spent so much time thinking about this problem. If any of these tools can be leveraged or extended that would be amazing. It would be great to figure out a way to incorporate more of the statistical/modeling aspect of the analysis. More specifically - in the case of a table that contains many models/test, a potential tool could pair nicely with the purrr workflow. I will mention the tangram package that came onto my radar yesterday that I don't know much about but seems to have a unique table building model. |
@njtierney Great idea! I think this type of "literature review" will be very useful for our community. After that we will have a better understanding of what we have right now and what exactly we need. I can imagine during the unconf, we can easily generate a blog post that @stefaniebutland would like to see. ;) |
So many interesting things to work on at the unconf! |
Agree with all that this is needed and this thread helps summarize a lot. Just an idea, what about a gallery of tables with the code to prodcue them. Something similar to @haozhu233 example above, but for different typical table types. |
I feel like having a gallery thing @jhollist just mentioned will definitely be super helpful. We can also borrow some ideas from the design of |
These are great ideas! I agree the lit review and gallery concept will both be very helpful and great resources for the broader community. It would be nice to take stock of what tools are out there and what types of tables should be covered. @haozhu233 - yes!! to your idea of structuring like |
This is great, I actually mentioned something like this to @stefaniebutland in my talk with her. It's a huge issue in sociology because we make crosstabs a lot and they are really a pain overall in R especially with multiple variables. This is something I wrote to make crosstab making easier for my students (and me) https://github.com/elinw/lehmansociology/blob/master/R/crosstab.R but the print function is really painful. Even what should be simple frequency tables are hard in base R, this is what we came up with just to illustrate https://github.com/elinw/lehmansociology/blob/master/R/frequency.R. @sfirke I'm going to have a look at janitor! |
Wow, If we are making a literature review then I think formattable should be in there. And of course tables. |
Someone gave a lightning talk at the Seattle useR meetup on this topic a few months ago. He showed a few examples, one of which was tableone |
Just saw desctable on my github timeline. It seems to be another good fit for this issue. |
Nice, lots of examples :) desctable is really interesting! It seems to focus on ease of process & content than styling (it's my impression that some of these packages seem to emphasize one or the other). I'll throw another one into the mix: arsenal, which gets more into stats and models. (@elinw this may be of interest to you for frequency tables...) |
There are also older-school table printing options, like |
Summary of this thread: There are lots of existing packages/functions for creating and/or formatting tables of various types. There seems to be a consensus that more work may be needed in this area, but we first need to understand all that is available right now. The great discussion in #78 could inform this process. From there, we can determine what is needed going forward. Potential ideas for the unconf, summarized from discussion above:
|
I will be following along with the unconf remotely (via slack, issues,
twitter, etc.) I'll keep my eye on this and if there is anything I can do
remotely, would be happy to do so. If it makes sense we could chat via
appear.in (I'm not on skype).
I do really like this idea of "lit reviews" for packages. It feels like a
more targeted/granular version of a task view and I think could be very
useful. We've had fits and starts of a discussion on what to do with
https://github.com/ropensci/maptools. It was intended to be a Task View
but we got some push back due to the overlap with the Spatial Task View.
Anyway, I think this general idea of targeted reviews could fill the void
between "packages useful for a broad area" and "use package X to do Y"
And thanks for the interesting discussion!
…On Wed, May 17, 2017 at 3:58 PM, Becca Krouse ***@***.***> wrote:
Summary of this thread:
There are lots of existing packages/functions for creating and/or
formatting tables of various types. There seems to be a consensus that more
work may be needed in this area, but we first need to understand all that
is available right now. The great discussion in #78
<#78> could inform this
process. From there, we can determine what is needed going forward.
Potential ideas for the unconf, summarized from discussion above:
1.
Perform "lit review" of existing packages
- perform as a case study for #78
<#78>
- compare existing packages by trying them out on a set of common
table types
- create a gallery of tables with the code to produce them
- create blog post
- present results of lit review to benefit the community
- reference WireCutter
<http://thewirecutter.com/leaderboard/headphones/> for ideas
2.
Are there improvements to be made? If so, planning the future of
tables in R:
- would be informed by the lit review
- consider extend existing packages or creating a new one
- borrow ideas from ggplot2
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#69 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AFL8S2H6BRu-LZf7u0gHgMQVBTr4ASejks5r61FhgaJpZM4NP4C8>
.
--
Jeff W. Hollister
email: [email protected]
cell: 401 556 4087
|
In case it wasn't in your list I just saw this https://gdemin.github.io/expss/ |
I was digging into the huxtable docs and found a vignette which compares the features of many table making packages: https://cran.r-project.org/web/packages/huxtable/vignettes/design-principles.html |
I work for the Federal Reserve Board (FRB). My duties include reading data from various sources (including pdf, excel, xml), processing these data, recompile and produce tables and charts Latex and FAME for publication purposes. I am currently searching for a similar tool(s) in R to replicate these processes (including creating tables). Very interested in this topic. |
This issue reminds me a little of this silly joke flowchart i made. But perhaps a (slightly) more serious flowchart would be helpful to people? I also wonder if it would be possible to create some kind of DSL for making tables that works with the pipe operator? Similar to @haozhu233 's suggestion to use something like a |
A grammar of tables w/ modular piping functions, ala ggplot2, would be wonderful. I have been stumbling toward something similar, though in a limited use case (simple one-way and two-way tabulations) - so far I have (on a dev branch):
But this is hardly a grammar - just a vote of enthusiasm for going in that direction 😀 |
|
@gshotwell I like that idea a lot, in a strange way it's like |
"Combining the two issues, we set out to to create a guide that could help users navigate package selection, using the case of reproducible tables as a case study." Repo: https://github.com/ropenscilabs/packagemetrics |
In my work (clinical research), we make a lot of tables, usually comparing 2 or more groups. It's nice to format the table programmatically so that it is reproducible and ready for publication. The process to do so usually looks something like this:
With tidy tools like dplyr, broom, and purrr, it is easier than ever before to create the self-contained data frame. However, getting all the necessary pieces and working the df into a table-ready format is a process that seems to be recreated from scratch each time. It would be great to have a tool that helps to automate this process a bit. Here's some vague thoughts on what this could look like:
Does anyone have any interest or thoughts about this topic? Are there any tools already out there that help with this? If not an unconf project, would love a related discussion about people’s workflows!
The text was updated successfully, but these errors were encountered: