Cross Validated is a question and answer site for people interested in statistics, machine learning, data analysis, data mining, and data visualization. Join them; it only takes a minute:

Sign up
Here's how it works:
  1. Anybody can ask a question
  2. Anybody can answer
  3. The best answers are voted up and rise to the top

I have just carried out an analysis of my data using logistic regression however I am also required to have a descriptive Statistics part in my report. I honestly don't see the point in this and I was hoping that someone might be able to explain why it is necessary.

For example if I plot a histogram of one of my independent continuous variables and it shows normality or it shows skewness how will this add any value to the report?

My data consists of a dependent variable true or false of getting a job and the independent variable is grades in mid-term, grades in final exams, and male or female.

share|improve this question
    
If you can't see any value in plotting a histogram of your IVs then maybe you shouldn't do that, but is there any data which you've collected that you do think is of some value to the work you're presenting in the report? – Ian_Fin 20 hours ago
    
Hi Ian, I have added some more detail regarding my problem. I am fairly new to statistics and I was just wondering is there a general approach that we take before we carry out logistic regression. – user3223190 20 hours ago
up vote 22 down vote accepted

In my field, the descriptive part of the report is extremely important because it sets the context for the generalisability of the results. For example, a researcher wishes to identify the predictors of traumatic brain injury following motorcycle accidents in a sample from a hospital. Her dependent variable is binary and she had a series of independent variables. Multivariable logistic regression allowed her to produce the following findings:

  • no helmet use adjusted OR = 4.5 (95% CI 3.6, 5.5) compared to helmet use.
  • all other variables were not included in the final model.

To be clear, there were no issues with the modelling. We focus on the value that the descriptive statistics can add.

Without the descriptive statistics, a reader cannot put these findings in perspective. Why? Let me show you the descriptive statistics:

age, years, mean (SD)                  54 (2)
males, freq (%)                       490 (98)
blood alcohol level, %, mean (SD)    0.10 (0.01)
...

You can see from the above that her sample consisted of older, intoxicated males. With this information the reader is able say what, if any, these results can say about injuries in young males or injuries in non-intoxicated riders or in female riders.

Please don't ignore descriptive statistics.

share|improve this answer
3  
Nice example. Is it real or made-up? – amoeba 16 hours ago
3  
Thanks, @amoeba. The numbers and stats are real. However, I changed the topic into traumatic brain injury to protect the innocent. – Elmer Villanueva 8 hours ago
1  
So, drunk men riding motorcycles without helmets... Who would have thought you could wind up with a traumatic brain injury? – gung 8 hours ago
    
I enjoying a glass of nice Australian red at the time and Bob's your uncle... – Elmer Villanueva 7 hours ago

The point of providing descriptive statistics is to characterise your sample so that people in other centres or countries can assess whether your results generalise to their situation. So in your case tabulating the sex, grades and so on would be a beneficial addition to the logistic regression. It is not to enable people to check your assumptions although they may try to do that too.

share|improve this answer
    
Thank you mdewey, so when we do the various descriptive plot and if we notice normality or skewness why only merely comment on it. And so basically the descriptive statistics only real use to inform the reader of what data you are working with. Really sorry if this may seem elementary – user3223190 20 hours ago
    
That is the way it works in the health field which is the one with which I am most familiar. – mdewey 17 hours ago
7  
+1. At first I misread "in other centres or countries" as "in other centuries". – amoeba 16 hours ago

Another thing is to show how well behaved your variables are. If, for example, one of your variables is the salary, and you have interviewed exactly one billionaire, when you input his salary into the logistic regression is going to dominate over everything else, so you will likely learn to ignore the salary, regardless of how much actual information it may hold.

Some methods are more sensitive than others to skewness and extreme values, and logistic regression is rather on the sensitive side. Of course, the final proof is in the pudding, and you can compare the results obtained with the raw data, or with each feature transformed towards normality.

share|improve this answer

A descriptive part helps to understand the reader your dataset. In applied econ it is usually highly recommended as it may show the first potential flaws in your analysis.

You may use data from different sources to blow up your descriptives.

1 table should be enough. The one you attached is not very intuitive.

share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.