« first day (580 days earlier)   

 
5 hours later…
 
1 hour later…
7:11 AM
Good morning @all
 
Hello
 
@Queen 4cv
 
zx8754 in r scanned 1011 questions between Jun 16 09:05 and Nov 22 08:28 filtered and ordered: 20 in batch 35
 
@Queen Done
 
zx8754 Thank you for your effort, you reviewed 20 questions, I counted 17 (85%) close votes and 13 questions closed
 
hello hello :-)
 
8:01 AM
licz <- function(dat, trait){
dat %>% group_by_(~type) %>%
summarise_(sr = lazyeval::interp(~mean(x, na.rm = T), x = as.name(trait)))
}

licz(df, 'D10')
 
@Axeman phew, that looks easy :)
please post as a comment, OP must be dying too.
Did you have to look up the manuals, or ?
 
I really like the simplicity of dplyr NSE, but the SE variants can be painful (this is relatively ok).
No, I've done this many times
 
good morning everyone
 
Once you realize you should build the call with interp it's quite ok. It becomes more challenging when you want your function to also use NSE
 
@Axeman if you can improve the question then this post can stay open and deserves an answer?
 
8:09 AM
@zx8754 I've answered a very similar question before, I can look it up for a dupe
found it
 
Heh, we can go through all nse posts and link all to each other as dupes
 
morning peeps
 
9:28 AM
Hello
 
10:32 AM
hello
 
hi
 
For crying out loud... close
 
Just killed a meta question :p
 
11:32 AM
Please reopen: stackoverflow.com/q/40735175/1412059 OP has improved their question.
 
12:13 PM
@Roland open
 
 
1 hour later…
1:21 PM
@DavidArenburg so that's what NA^... does
 
NA^ part is awesome.
 
2:58 PM
@Cath Ok, removed that part. Now remove your comments and go away — akrun 1 min ago
flagged as rude
 
@RonakShah lol that's nice of you (so you're the upvote on my first comment :-) ) it is so annoying to know he was waiting to understand what OP wanted, then he just sees my answer and so find in 2 sec another way of doing the same, because this is what he's good at
@RonakShah your flag probably returned helpful :-)
 
I kind of want to rollback his edits to leave the obvious copying/pasting part...
 
@Cath yes, he obviously is good at it...Try doing the same with him and you'll see how furious he becomes.
btw I am also the upvote on your answer ;-)
 
@RonakShah that's for sure !
@RonakShah thanks :-) (I'm like 2 points from my own hammer now I think ! :-) )
 
3:06 PM
@Cath that would be one big achievement...Would definitely be helpful to avoid answers on duplicate questions like these
0
Q: R data frame subsetting based on a column value frequency threshold

danbretI am a new R user and this is my first question submission (hopefully in compliance with the protocol). I have a data frame with two columns. df <- data.frame(v1 = c("A", "A", "B", "B", "B", "B", "C", "D", "D", "E" )) dfc <- df %>% count(v1) df$n <- with(dfc, n[match(df$v1,v1)]) v1 n 1 ...

 
yes, that's a "generic" Q/A... I think he just cannot help it... (stil answering dupes)
he would be mader once I join the hammer team :-)
 
Well, I think we should try to unfocus from him, just let him be and act whatever he's pleased. Just flag inapropriate content (comments) and do the usual close/dupe when needed. And ignore him if call after you for a closure.
I feel like stepping in David's shoes
 
@Tensibai hey @Tens :-)
 
Hello @Cath :)
 
yep, problem is, when akrun asks OP for desired output, then I answer, then, now tha the knows what needs to be computed, he posted a different, but with a part of copying/pasting, answer, it just so much annoys me... argh... ;-)
 
3:15 PM
I saw that, just ignore him, that won't change the face of internet
and even less of the world
 
lol indeed and it will make me live less (less longer) :-/
 
Just dont give him what he wants...
Attention/Upvotes :-P
 
Wooohoo, I should have not use 1e7 as source for this microbenchmark...
 
hmm comments were cleaned, a mod must have been passing by
 
Unit: milliseconds
               expr        min         lq       mean     median         uq        max neval    cld
          akrun(df)   487.2707   510.8357   546.1012   521.3872   537.4420   700.1267    10 a
 GGrothendieck4(df)   561.5427   576.1987   604.9359   589.0564   609.5453   696.4785    10 ab
 GGrothendieck3(df)   574.0275   584.4956   636.6241   612.6072   695.6640   729.3624    10 ab
 GGrothendieck2(df)   660.0946   712.2549   731.0311   741.4083   758.0153   759.0544    10  b
 NineHeightNine(df)  1204.6817  1223.2428  1253.4177  1250.8204  1277.2377  1340.0811    10   c
 
3:18 PM
who is nineightnine ?
 
989 is his/her nickname, as that should not work as a function name, I spelled the numbers
From the NA^ above
 
another new answer ? I don't think I saw this one
hmm actually I did :-/
@Tens nice benchmarks, have you tried with more columns ?
 
Nope
 
Pardon for the language but I feel like I can't work anymore today
so what's up peeps
 
@Tensibai I think that would be interesting, with a varying proportion of Nas
@DavidArenburg hey Dave :-)
 
3:22 PM
@Cath hey
 
3 answers already
 
@RonakShah already did :-/
 
Wow: alexis_laz version with complete.cases beat eveything
 
@Tensibai but he didn't post...
 
He did comment under G.Grothendieck post
 
3:25 PM
@Tensibai yep I saw that but the answer wasn't added (or was it ?)
 
yeah, his verisons always the best
 
@Tensibai you didn't put the benchmarks for complete.cases did you ?
 
though many of these answers don't reach the desired output exactly
2
 
Just added to the benchmark (on 1e6 rows, too lazy to rerun on 1e7)
@DavidArenburg I did though about using as.integer to get a numeric output but, even with it it won't match the exact desired output indeed
 
@Tensibai as you have all functions etc, you mind checking with like 10^5 columns ?
 
3:28 PM
I actually haven't expected this answer/question to drow so many attention. I've posted in comments cause I thought its just a dupe
 
@Tensibai nvm I can have it too ;-p
 
@Tensibai If that top_n answer got an upvote I really think it's unfair nurka got two downvotes
 
@DavidArenburg ???
 
@Tensibai I'm just agreeing with you...
 
Oh, didn't check to which message you were refering too
 
3:34 PM
@DavidArenburg it would probably have had a DV too if the nurka's downvoter saw it but the other answers were posted some times after nurka's
 
@Cath on 26 columns of 1e6 rows:
Unit: milliseconds
               expr        min         lq       mean     median         uq        max neval     cld
     alexis_laz(df)   90.93923   91.91509   93.42776   92.75552   95.31161   97.16838    10 a
 GGrothendieck2(df)  248.88182  260.67500  316.89156  318.07621  365.61113  391.87275    10  b
 GGrothendieck3(df)  470.91760  484.56498  504.83493  491.27278  520.24358  608.75102    10   c
 GGrothendieck4(df)  484.63140  498.03247  519.47379  505.21580  526.98018  630.48404    10   c
 
now compare outputs
 
@Tensibai hmm I bet things would change a bit more with like 10^3 rows and 10^4 columns
c(NA, 1)[(complete.cases(df))+1] works I think. Why did alexis negate the complete.cases ?
 
or NA^!complete.cases(df)
I'm on fire with the NA abuse today
2
 
@DavidArenburg probably more time-greedy, no ?
 
3:40 PM
@Cath time-greedy?
maybe because of the !, don't know
Elections ends in 4 hours btw
so stay tuned
 
@DavidArenburg yeah I have no idea whant I'm tlaking about (I just thought your sol would be the fastest and as it's not, I just guessed ^ must take some time...)
 
@erasmortg Hi, mate. Good to see you
 
@DavidArenburg yep, voted for Bhargav, ArtOf and Andy...
 
@Cath my solution has both matrix conversion and unnecessary rowSums
 
@Cath Any hint ot produce such a DF in a few characters ?
 
3:43 PM
As I said, I thought it's going to be closed as a dupe within seconds so didn't put match thinking into it.
 
@Tensibai I did:
f <- data.frame(matrix(round(runif(1000*1000, 1, 100)), ncol=1000))
df <- apply(df, 2, function(x) {x[sample(1:1000, 10, replace=FALSE)] <- NA; x})
but it's quite lame...
and 10 NAs per column is terrible (I put 1000x1000 because I have just 4GB of RAM on my "regular" pc)
 
df <- as.data.frame(matrix(sample(1:5, 1e7, replace = TRUE), ncol = 1e4)) ?
Or am I missing something
ok, going home, cya lads
 
hi all, bye David
what's this thing y'all are benchmarking on?
 
@DavidArenburg nice evening :-)
I get an error for do.call(pmax...) (and pmin... as well)
 
3:48 PM
ah ok thanks
yeesh, that's a lot of benchmarkables in Tens' answer
 
hi @Frank
 
hiya Cath
 
Just killed my rstudio and Chrome at the same time running:
 
@Tens, I get:
 
@Frank you came late...you can join them now and show us some other approach :P
 
3:51 PM
Maybe I should turn this answer community wiki
 
@user2100721 heh, i think complete.cases is fast enough :)
fyi, similar q from yesterday:
-1
Q: fastest way to count the number of rows in a data frame that has at least one NA

ftxxWhen you have the data set, usually you want to see that is the fraction of rows that has at least one NA (or missing value) in the data set. In R, what I did is the following: TR = apply(my_data,1,anyNA) sum(TR)/length(TR) But I found that if my data set has 1 million rows, it takes some tim...

 
             expr         min          lq        mean      median          uq         max neval cld
     alexis_laz(df)    5.032864    5.041418    5.823552    5.054080    5.229963   12.327915    10 a
          David(df)    5.777459    5.826049    6.763159    6.317257    6.668168    9.989077    10 a
 GGrothendieck2(df)    5.995431    6.766718    7.262373    7.175116    7.735102    9.140458    10 a
 GGrothendieck5(df)   35.914074   38.091742   48.423107   43.997178   47.047077   99.019229    10 ab
@Tens, to make your answer "perfect", you also need a benchmark for "lots of columns" and probably more than 10 repeats
 
@Cath working on it
 
:-)
@Tensibai btw you mispelled "eight" (you put "height" instead) and now 989 is complaining ;-)
and I'm off
have a nice evening all :-) (or full day for some !!)
 
cya Cath, you too
 
4:22 PM
It takes forever to benchmark on a 1e5 columns by 1e3 rows df
 
@DavidArenburg hi! Yeah, work has been crazy lately, barely enough time for code/SO, mostly about answering emails, how about you? :)
 
@Tensibai if you know that one of the benchmarkables is gonna be a lot slower than the others, could drop it, only comparing the top contenders, eh
 
@Frank I've no idea what the result would be with 1e5 columns
 
ok
 

« first day (580 days earlier)