CK: Collective Knowledge
This blog post gives a brief introduction into CK and its basic concepts. There is a ton of existing documentation out there in the CK wiki on GitHub. All of this documentation can easily feel overwhelming. This is why I wrote this deliberately short and lightweight introduction into some of the fundamental basic concepts of CK, which helped me a lot in understanding CK.
I assume that you have the CK tool installed on your machine, which you can easily check by running ck version
. If this returns an error you want to install CK by running pip install ck
1.
So what is CK?
To put it quite generic, CK is a tool which helps organise and work with stuff you care about. Stuff can be a lot of different things, such as research data, programs or scripts analysing this data, as well as the resulting data obtained by the analysis – just to give a typical research workflow as an example.
CK helps you to organise this stuff by assigning unique identifiers (so called ‘UIDs’) to every entry registered with ck. Entries are stored in repositories which facilitate sharing. A special type of entries are modules which implement the functionality of CK. CK comes with a set of built-in modules, but you can also write custom modules yourself.
Entries, repositories, and modules are the basic vocabulary of CK. Let’s start talking more about them.
CK Entries
CK tracks entities by assigning them unique identifiers. Each entry is stored in a separate directory and CK also stores additional metadata in form of a couple of JSON files for each entry. These file are stored in the .cm
subdirectory of the entry. There are three metadata files:
.cm/info.json
stores information like, who is the author or what is the license of the entry, etc..cm/meta.json
stores arbitrary meta information about the entry, which is used by the CK modules to process this entry. One important example are tags which are identifying words which can be used to filter out common entries..cm/desc.json
is indented for a documentary description of the entry, but currently mostly empty.
CK Repositories
In CK a repository is a collection of entries which are meant to be shared with other people. CK uses a tool called git
which makes it incredible easy to share repositories among team members or make them publicly available. Websites such as GitHub or Bitbucket can be used to host CK repositories online.
Ck stores all of the repositories in one central folder. On linux and macOS this is by default: $HOME/CK_REPOS
.
CK Modules
Modules in CK group entries as well as actions to operate on these entries. CK entries which are operated on by a particular module are put in a directory which has the same name as the module. For example:
- Programs, which are compiled and run by the
program
module, are put in a directory calledprogram
. - Datasets, which are extended by the
dataset
module, are put in a directory calleddataset
. - Experiments, which are added, browsed, and rerun by the
experiment
module, are put in a directory calledexperiment
.
This leads to a familiar directory structure where at the top-level directories are called after CK modules, e.g., program
, dataset
, and experiment
. At the second-level directories store the actual programs, datasets, and experiments you care about, e.g., program/my-awesome-program
, dataset/my-awesome-dataset
, and experiment/my-awesome-experiment
. These are themselves CK entries with their own metadata and UIDs.
Actions in CK are functionalities offered by modules to operate on CK entries. Let’s make a few concrete examples:
- The
program
module offers actions forcompile
ing andrun
ing programs. - The
dataset
module offers an action for adding new files into an existing dataset (add_file_to
). - The
experiment
module offers actions foradd
ing new experiments,browse
existing once, orrerun
experiments.
Every command line in CK has the same basic form to perform an action of a particular module:
ck action module
Therefore, we write: ck compile program
, ck add_file_to dataset
, ck rerun experiment
, and so on.
This style is deliberately designed so that the commands read like sentences. I call this ck action module
structure the grammar of CK.
CK commands which talk about particular entries specify them by using the following notation:
ck action module:entry
Sometimes it is required to help CK distinguish between entries in different repositories. In these cases we have to write:
ck action repository:module:entry
Many modules allow to specify additional options as command line flags. You can get a full list of supported actions by calling on a particular module:
ck help module
CK modules for managing repositories and modules
There exists CK modules for managing repositories and modules themselves. These are called repo
and module
and are briefly described here.
repo
Repositories are a central concept in CK (as we have seen above) which are managed by the repo
module.
Here are some things one can do with this module:
ck info repo
lists information about therepo
module itselfck help repo
lists all possible actions one can perform with a CK repositoryck list repo
lists all installed repositories
There are a number of things one can do with a particular repository. We take the ck-autotuning
repository as an example:
ck pull repo:ck-autotuning
installs or updates theck-autotuning
repository to the latest version on the remote server (It is performing agit pull
on the GitHub repository: https://github.com/ctuning/ck-autotuning)ck info repo:ck-autotuning
lists information about theck-autotuning
repositoryck find repo:ck-autotuning
lists the path where theck-autotuning
repository is installed
module
Modules are managed by a module called module
.
Similarly to the actions on repositories one can:
ck info module
lists information about themodule
module itselfck help module
lists all possible actions one can perform with a CK moduleck list module
list all installed modules, across all installed repositories
To list only the modules of a particular repository, for example ck-autotuning
one can execute:
ck list module --repo_uoa=ck-autotuning
The --repo_uoa=ck-autotuning
part is an input argument passed to the list
action of the module module
. To list all the possible input arguments of an action call:
ck action module --help
.
So for example: ck list module --help
. This will print a description of the action and which input arguments it will process and what output it will return.
Common CK actions
There are some actions which can be used on every module. These are called common actions. You can list all common actions by running: ck help
.
Furthermore, you can always call ck action module --help
to get learn about the input arguments and return values of an action.
Many of the common actions are for managing ck entries, the most important of them are:
ck add module:entry
adds a new ck entry calledentry
to the module namedmodule
.ck cp module1:entry1 module2:entry2
copies ck entry calledentry1
frommodule1
intoentry2
inmodule2
.ck find ***module***:***entry***
prints the path of the ck entry namedentry
from modulemodule
.ck mv module1:entry1 module2:entry2
moves ck entry calledentry1
frommodule1
toentry2
inmodule2
.ck rm module:entry
removes (deletes) an existing ck entry calledentry
from the module namedmodule
.
Where to go from here?
I only scratched the surface of CK. I haven’t talked about the meta data format (which is JSON
) and the implementation of your own custom modules (which is commonly done in Python).
As I said in the beginning, there is plenty of documentation available on the CK wiki. It is incredible useful to keep the vocabulary (entries, repositories, modules) and the grammar (ck action module
) of CK in mind while reading these documents and start playing around with CK.
The two most appropriate starting points are the Getting Started Guide and the Portable Workflows page.
For seeing how to implement you own workflow with CK following an example, read the Getting Started Guide.
For learning how to implement portable workflows with CK, by
- Describing and detecting existing software
- Setting up software environment
- Automating installation of a missing software
- and more …
read the corresponding sections in the Portable Workflows page.
Also, ask questions on the CK mailing list. The community is very much open to answer your questions!
-
If you have troubles installing CK this way you find more information in the CK wiki. ↩