본문 바로가기
Statistics/DesignOExperiments

데이터분석 잘~ 하는 법 10가지

by bents 2021. 2. 22.

1. "질문"에 답할 수 있도록 데이터에 맞는 통계방법을 사용함.

- 비숙련자는 통계방법론을 바로 적용한다.

- 전문가는 과학적 분석목적에 집중한다.

step back and consider many aspects of data collection in the context of overall goals and may start by asking, “What would be the ideal outcome of your experiment, and how would you interpret it?”

 

"Which test should I use?” VS “Where are the differentiated genes?” 

After learning about the questions, statistical experts discuss with their scientific collaborators the ways that data might answer these questions and, thus, what kinds of studies might be most useful. Together, they try to identify potential sources of variability and what hidden realities could break the hypothesized links between data and scientific inferences; only then do they develop analytic goals and strategies. This is a major reason why collaborating with statisticians can be helpful, and also why the collaborative process works best when initiated early in an investigation.

2. 원하는 목표에 존재하는 "노이즈"(변동성/다양성)를 파악해 현상을 "단순화"해 한다. 

 

Variability는 다양한 형식으로 나타남. 이를 분포(theoretical quantities of interest)로 나타내자.

Probability distributions are used in statistical models, with the model specifying the way signal and noise get combined in producing the data we observe, or would like to observe. This fundamental step makes statistical inferences possible.

3. 반드시 데이터를 얻기전에 계획/설계를 하라

 

설계 단계에서 질문을 하면 분석 단계에서 골칫거리를 줄일 수 있습니다. 

신중한 데이터 수집은 분석을 크게 단순화하고 더 엄격하게(신뢰할 수 있게) 만들 수 있습니다

 

4. 데이터 품질을 걱정하라.

 

5. 분석기법을 적용한 이유/논리를 말하라.

 

6. 단순한 모델에서 시적해서 복잡도를 올리는 모델링을 하라.

The common assumption of independence is often incorrect and nearly always needs careful examination.
Large numbers of measurements,
interactions among explanatory variables,
nonlinear mechanisms of action,
missing data,
confounding,
sampling biases,
-> all require an increase in model complexity.

7. 모델의 불확실성을 평가하라.

적은 수의 샘플에서 많은 수의 측정을 수행하려면 표준 오차를 매우 신중하게 추정해야합니다

8. 모델의 가정을 검정하라.

 

9. 

 

10. 또 사용할 수 있도록 분석/결과도출하라.

- 쥬피터로 하면 됨...ㅎ

Modern reproducible research tools like Sweave, knitr, and iPython notebooks take this a step further
and combine the research report with the code.

 

출처 : journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004961