Difference between revisions of "Documentation/Labs/TransitionToGit"

From Slicer Wiki
Jump to: navigation, search
Line 57: Line 57:
 
== Removing binary data from history ==
 
== Removing binary data from history ==
  
 +
Motivation:
 +
* Speedup time required to checkout Slicer source code.
 +
* Quote "Because of the decentralized nature of Git, which means every developer has the full change history on his or her computer, changes in large binary files cause Git repositories to grow by the size of the file in question every time the file is changed and the change is committed. The growth directly affects the amount of data end users need to retrieve when they need to clone the repository." See https://www.perforce.com/blog/storing-large-binary-files-in-git-repositories
 +
 +
Tasks:
 
* Filter the history
 
* Filter the history
* Remove file above <= X MB
+
* Remove file above <= X MB that have a poor compression ratio. See [https://gist.github.com/jcfr/4348af13d2c8931daeab4ff9ab73e14b git_list_largest_file_from_history.sh] and [https://gist.github.com/jcfr/93fe51974d9db8ef55a6d3172c1de68d slicer_git_history_350_largest_files.txt]
* Integrate a NOTE with location of files and crypto hash
+
* Integrate a NOTE with location of files and checksum
  
 
Some stats:
 
Some stats:
Line 67: Line 72:
  
 
Todo:
 
Todo:
* Distribution of binary files across revision
+
* Distribution of binary files across revision.
 +
** Leverage CMake ExternalData module. Existing modules need to be updated to leverage it. Work-in-progress in [https://nbviewer.jupyter.org/github/jcfr/jupyter-notebooks/blob/master/02_Update_Slicer_CLI_buildsystem_to_download_test_data_from_midas.ipynb 02_Update_Slicer_CLI_buildsystem_to_download_test_data_from_midas.ipynb] notebook to update buildsystem.
 +
** Improve the SampleData module. See https://discourse.slicer.org/t/improving-testing-data-management-for-self-test/
 +
** Copy testing data to https://github.com/Slicer/SlicerTestingData and work with Kitware system admin to set redirect for existing datasets.
 
* experiment with history filter and compare with above stats
 
* experiment with history filter and compare with above stats

Revision as of 17:08, 8 May 2019

Home < Documentation < Labs < TransitionToGit

This page keep tracks of the progress done toward the conversion from (1) Slicer svn repository mirror onto github into (2) a only git repository.

Related issues

Authorship

Since contribution have been integrated using the svn dcommit --add-from-author, the history should be filtered to change the commit author as specified in the From: statement.

In addition, the following update should be considered:

  • r22869 -> Franklinwk <franklin.king@queensu.ca>
  • r23080 -> Co-authored-by: Andras Lasso <lasso@queensu.ca>
  • r23506 -> Andras Lasso <lasso@queensu.ca>
  • r23758 -> Andras Lasso <lasso@queensu.ca>
  • r24061 -> Alireza Mehrtash <mehrtash@bwh.harvard.edu>
  • r24309 -> Andras Lasso <lasso@queensu.ca>
  • r24631 -> Co-authored-by: Max Smolens <max.smolens@kitware.com>
  • r24720 -> Change author email to zhangfanmark@gmail.com
  • r25140 -> Hastings Greer <hastings.greer@kitware.com>
  • r25744 -> Add Suggested-by: Alexis Girault <alexis.girault@kitware.com>
  • r26841 -> From: Gregory C. Sharp <gregsharp.geo@yahoo.com>, Co-authored-by: Jean-Christophe Fillion-Robin <jchris.fillionr@kitware.com>
  • r26856 -> Adam Rankin <adam.rankin@gmail.com>
  • r26987 -> Fernando Perez-Garcia <fepegar@gmail.com>

Commit message

The following commit message could be fixed:

STYLE: Add test for CLI Node status update

This commit adds a test checking that CLI status has the expected
value following StatusModifiedEvent.

Possible status are the following:

    Idle
    Scheduled
    Running
    Cancelling
    Cancelled
    Completing
    Completed
    ErrorsMask
    CompletedWithErrors
    BusyMask

Note that the status is inconsistently update between synchronous and asynchronous mode. This
will be addressed in a follow up commit.

Removing binary data from history

Motivation:

  • Speedup time required to checkout Slicer source code.
  • Quote "Because of the decentralized nature of Git, which means every developer has the full change history on his or her computer, changes in large binary files cause Git repositories to grow by the size of the file in question every time the file is changed and the change is committed. The growth directly affects the amount of data end users need to retrieve when they need to clone the repository." See https://www.perforce.com/blog/storing-large-binary-files-in-git-repositories

Tasks:

Some stats:

  • Hina, Jc (from Kitware Carrboro): 3min30sec
  • Andrey (SPL): 2m13s
  • Steve (Cambridge): 21s (full history), 8s (with --depth=1)

Todo: