說明
一篇好的研究論文需要具備可重複性,只有結果是沒有意義的。你需要告訴別人,怎麼按圖索布才能得到你的分析結果。
- 其他研究者可以檢驗你的結果和過程是否嚴密科學
- 其他研究者可以在你的研究基礎上,在某些環節進行擴展性研究
- 其他研究者可以瞭解你整個分析的脈絡,更好的理解內容
以下內容是整理自coursera的Reproducible Research 課程的內容總結
一篇Reproducible文章包含的內容
- Tile/Author list
- Abstract
- Body/Results
- Supplementary Materials/the gory details
- Code/Data/really gory details
爲了確保Reproducible,你需要做的
- Are we doing good science ##好的數據、團隊、專注、興趣
- Was any part of this analysis done do by hand ##不要手工對數據做加工
- if so ,are those parts preciselydocument
- does the documentation match reality
- Have we taught a computer to do as much as possible ##將處理數據的操作植入電腦
- Dont point and click ##不要使用GUIs圖形用戶交互界面
- Are we using a version control system ##使用類似github這樣的版本控制來觀察優化的過程
- Have we documented our software enviroment ##記錄你的軟件環境(R sessionInfo)
- Have we saved any output that we cannot reconstruct from original data+code ##不要只保存任何結果
- How far back in the analysis pipeline can we go before our results are longer (automatically) reproducible
##分析從raw data到report的整個過程是如何實現的
Reproducible不適合的地方
- Reproducible research is important,but does not necessarily solve the critical question of whether a data analysis is trustworthy
- Reproducible research focuses on the most “downstream” aspect of research dissemination
- Evidence-based data analysis would provide standardized,best practices for given scientific areas and questions
- Gives reviewers an important tool without dramatically increasing the burden on them
- More effort should be put into improving the quality of “upstream” aspects of scientific research
一篇好的Reproducible論文
http://www.rpubs.com/rdpeng/13396
監視是否達成Reproducible的標準細節
- Has either a (1) valid RPubs URL pointing to a data analysis document for this assignment been submitted; or (2) a complete PDF file presenting the data analysis been uploaded?
- Is the document written in English?
- Does the analysis include description and justification for any data transformations?
- Does the document have a title that briefly summarizes the data analysis?
- Does the document have a synopsis that describes and summarizes the data analysis in less than 10 sentences?
- Is there a section titled “Data Processing” that describes how the data were loaded into R and processed for analysis?
- Is there a section titled “Results” where the main results are presented?
- Is there at least one figure in the document that contains a plot?
- Are there at most 3 figures in this document?
- Does the analysis start from the raw data file (i.e. the original .csv.bz2 file)?
- Does the analysis address the question of which types of events are most harmful to population health?
- Does the analysis address the question of which types of events have the greatest economic consequences?
- Do all the results of the analysis (i.e. figures, tables, numerical summaries) appear to be reproducible?
- Do the figure(s) have descriptive captions (i.e. there is a description near the figure of what is happening in the figure)?
- As far as you can determine, does it appear that the work submitted for this project is the work of the student who submitted it?