Difference between revisions of "Documentation/Labs/TransitionToGit"

From Slicer Wiki
Jump to: navigation, search
Line 7: Line 7:
 
* [http://na-mic.org/Mantis/view.php?id=2059 2059: Transition to Git]
 
* [http://na-mic.org/Mantis/view.php?id=2059 2059: Transition to Git]
  
= Authorship =
+
== Removing binary data from history ==
 +
 
 +
Motivation:
 +
* Speedup time required to checkout Slicer source code.
 +
* Quote "Because of the decentralized nature of Git, which means every developer has the full change history on his or her computer, changes in large binary files cause Git repositories to grow by the size of the file in question every time the file is changed and the change is committed. The growth directly affects the amount of data end users need to retrieve when they need to clone the repository." See https://www.perforce.com/blog/storing-large-binary-files-in-git-repositories
 +
 
 +
Tasks:
 +
* Filter the history
 +
** Fix
 +
** experiment with history filter and compare with above stats
 +
* Remove file above <= X MB that have a poor compression ratio. See [https://gist.github.com/jcfr/4348af13d2c8931daeab4ff9ab73e14b git_list_largest_file_from_history.sh] and [https://gist.github.com/jcfr/93fe51974d9db8ef55a6d3172c1de68d slicer_git_history_350_largest_files.txt]
 +
* Integrate a NOTE with location of files and checksum
 +
* Distribution of binary files across revision.
 +
** Leverage CMake ExternalData module. Existing modules need to be updated to leverage it. Work-in-progress in [https://nbviewer.jupyter.org/github/jcfr/jupyter-notebooks/blob/master/02_Update_Slicer_CLI_buildsystem_to_download_test_data_from_midas.ipynb 02_Update_Slicer_CLI_buildsystem_to_download_test_data_from_midas.ipynb] notebook to update buildsystem.
 +
** Improve the SampleData module. See https://discourse.slicer.org/t/improving-testing-data-management-for-self-test/
 +
** Copy testing data to https://github.com/Slicer/SlicerTestingData and work with Kitware system admin to set redirect for existing datasets.
 +
 
 +
Some stats:
 +
* Hina, Jc (from Kitware Carrboro): 3min30sec
 +
* Andrey (SPL): 2m13s
 +
* Steve (Cambridge): 21s (full history),  8s (with --depth=1)
 +
 
 +
== Authorship ==
  
 
Since contribution have been [[Slicer:git-svn#Integrating_topic_from_external_contributor|integrated]] using the <code>svn dcommit --add-from-author</code>, the history should be filtered to change the commit author as specified in the <code>From:</code> statement.
 
Since contribution have been [[Slicer:git-svn#Integrating_topic_from_external_contributor|integrated]] using the <code>svn dcommit --add-from-author</code>, the history should be filtered to change the commit author as specified in the <code>From:</code> statement.
Line 26: Line 48:
 
* [http://viewvc.slicer.org/viewvc.cgi/Slicer4?view=revision&revision=26987 r26987] -> <code>Fernando Perez-Garcia <fepegar@gmail.com></code>
 
* [http://viewvc.slicer.org/viewvc.cgi/Slicer4?view=revision&revision=26987 r26987] -> <code>Fernando Perez-Garcia <fepegar@gmail.com></code>
  
= Commit message =
+
== Commit message ==
  
 
The following commit message could be fixed:
 
The following commit message could be fixed:
Line 54: Line 76:
 
will be addressed in a follow up commit.
 
will be addressed in a follow up commit.
 
</pre>
 
</pre>
 
== Removing binary data from history ==
 
 
Motivation:
 
* Speedup time required to checkout Slicer source code.
 
* Quote "Because of the decentralized nature of Git, which means every developer has the full change history on his or her computer, changes in large binary files cause Git repositories to grow by the size of the file in question every time the file is changed and the change is committed. The growth directly affects the amount of data end users need to retrieve when they need to clone the repository." See https://www.perforce.com/blog/storing-large-binary-files-in-git-repositories
 
 
Tasks:
 
* Filter the history
 
** experiment with history filter and compare with above stats
 
* Remove file above <= X MB that have a poor compression ratio. See [https://gist.github.com/jcfr/4348af13d2c8931daeab4ff9ab73e14b git_list_largest_file_from_history.sh] and [https://gist.github.com/jcfr/93fe51974d9db8ef55a6d3172c1de68d slicer_git_history_350_largest_files.txt]
 
* Integrate a NOTE with location of files and checksum
 
* Distribution of binary files across revision.
 
** Leverage CMake ExternalData module. Existing modules need to be updated to leverage it. Work-in-progress in [https://nbviewer.jupyter.org/github/jcfr/jupyter-notebooks/blob/master/02_Update_Slicer_CLI_buildsystem_to_download_test_data_from_midas.ipynb 02_Update_Slicer_CLI_buildsystem_to_download_test_data_from_midas.ipynb] notebook to update buildsystem.
 
** Improve the SampleData module. See https://discourse.slicer.org/t/improving-testing-data-management-for-self-test/
 
** Copy testing data to https://github.com/Slicer/SlicerTestingData and work with Kitware system admin to set redirect for existing datasets.
 
 
Some stats:
 
* Hina, Jc (from Kitware Carrboro): 3min30sec
 
* Andrey (SPL): 2m13s
 
* Steve (Cambridge): 21s (full history),  8s (with --depth=1)
 

Revision as of 17:10, 8 May 2019

Home < Documentation < Labs < TransitionToGit

This page keep tracks of the progress done toward the conversion from (1) Slicer svn repository mirror onto github into (2) a only git repository.

Related issues

Removing binary data from history

Motivation:

  • Speedup time required to checkout Slicer source code.
  • Quote "Because of the decentralized nature of Git, which means every developer has the full change history on his or her computer, changes in large binary files cause Git repositories to grow by the size of the file in question every time the file is changed and the change is committed. The growth directly affects the amount of data end users need to retrieve when they need to clone the repository." See https://www.perforce.com/blog/storing-large-binary-files-in-git-repositories

Tasks:

Some stats:

  • Hina, Jc (from Kitware Carrboro): 3min30sec
  • Andrey (SPL): 2m13s
  • Steve (Cambridge): 21s (full history), 8s (with --depth=1)

Authorship

Since contribution have been integrated using the svn dcommit --add-from-author, the history should be filtered to change the commit author as specified in the From: statement.

In addition, the following update should be considered:

  • r22869 -> Franklinwk <franklin.king@queensu.ca>
  • r23080 -> Co-authored-by: Andras Lasso <lasso@queensu.ca>
  • r23506 -> Andras Lasso <lasso@queensu.ca>
  • r23758 -> Andras Lasso <lasso@queensu.ca>
  • r24061 -> Alireza Mehrtash <mehrtash@bwh.harvard.edu>
  • r24309 -> Andras Lasso <lasso@queensu.ca>
  • r24631 -> Co-authored-by: Max Smolens <max.smolens@kitware.com>
  • r24720 -> Change author email to zhangfanmark@gmail.com
  • r25140 -> Hastings Greer <hastings.greer@kitware.com>
  • r25744 -> Add Suggested-by: Alexis Girault <alexis.girault@kitware.com>
  • r26841 -> From: Gregory C. Sharp <gregsharp.geo@yahoo.com>, Co-authored-by: Jean-Christophe Fillion-Robin <jchris.fillionr@kitware.com>
  • r26856 -> Adam Rankin <adam.rankin@gmail.com>
  • r26987 -> Fernando Perez-Garcia <fepegar@gmail.com>

Commit message

The following commit message could be fixed:

STYLE: Add test for CLI Node status update

This commit adds a test checking that CLI status has the expected
value following StatusModifiedEvent.

Possible status are the following:

    Idle
    Scheduled
    Running
    Cancelling
    Cancelled
    Completing
    Completed
    ErrorsMask
    CompletedWithErrors
    BusyMask

Note that the status is inconsistently update between synchronous and asynchronous mode. This
will be addressed in a follow up commit.