Making a Theory FAIR

1 Creating a FAIR Theory

This tutorial takes the user through distinct steps involved in making a theory FAIR. It uses the R-package theorytools and other specific software and platforms. As open science infrastructure is an area of active development, the approach proposed here should not be considered definitive, but rather, as one proposal for a FAIR-compliant implementation of theory using infrastructure available at the time of writing. The steps described in this tutorial are largely automated by the function theorytools::create_fair_theory(); expert users might use this function directly.

library(theorytools)

1.0.1 Time to Complete

Estimated time to complete: 45-60 minutes.

1.0.2 Learning Goals

1.1 Running Example: the Empirical Cycle

Given that we based our argument for the importance of FAIR theory on the empirical cycle, we use it as an example for this tutorial. The empirical cycle is a model of cumulative knowledge production through scientific research, described in De Groot and Spiekerman (1969) (p. 28):

Phase 1: ‘Observation’: collection and grouping of empirical materials; (tentative) formation of hypotheses.
Phase 2: ‘Induction’: formulation of hypotheses.
Phase 3: ‘Deduction’: derivation of specific consequences from the hypotheses, in the form of testable predictions.
Phase 4: ‘Testing’: of the hypotheses against new empirical materials, by way of checking whether or not the predictions are fulfilled.
Phase 5: ‘Evaluation’: of the outcome of the testing procedure with respect to the hypotheses or theories stated, as well as with a view to subsequent, continued or related, investigations.

2 Completing All Steps Manually

2.1 Creating a Project Folder

In the spirit of modular publishing, this tutorial assumes that you’re creating your FAIR theory as a standalone project. While it is possible for a theory to be implemented in a programming language like R, it often is not - the empirical cycle described above is implemented in plain text. Therefore, we will not create an R project (with an .Rproj file et cetera), but just a regular nondescript project. This starts with creating an empty project folder. If we want to create a new folder called empirical_cycle in the existing folder c:/theories/, we can call:

project_path <- file.path("c:/theories", "empirical_cycle")
dir.create(project_path)

2.2 Version Controlling The Project Folder

We use ‘Git’ to version control the project folder. If you do not already have ‘Git’ installed on your computer, install it now. You can verify that ‘Git’ is installed and working by running:

worcs::check_git()
#> ℹ Check if Git is available on the command line.✔ Check if Git is available on the command line. ... done
#> ℹ Checking if libgit2 is properly installed, required for connecting to Git rem…✔ Checking if libgit2 is properly installed, required for connecting to Git rem…
#> ℹ Initiating Git repository.✔ Initiating Git repository. ... done
#> ℹ Git user is configured.✔ Git user is configured. ... done
#> ℹ Adding files with `gert::git_add()`.✔ Adding files with `gert::git_add()`. ... done
#> ℹ Committing with `gert::git_commit()`.✔ Committing with `gert::git_commit()`. ... done

If this function shows a green checkmark, you can initialize version control in your project repository by running:

gert::git_init(path = project_path)

2.3 Connecting to a Remote (‘GitHub’) Repository

To make your FAIR theory accessible to collaborators and discoverable by the wider community, you must connect your local ‘Git’ repository to a remote repository on a platform like ‘GitHub’.

Before proceeding, ensure you have a ‘GitHub’ account. Academics may qualify for a free upgrade. To authorize ‘R’ to interact with your ‘GitHub’ account, run usethis::create_github_token(), which takes you to a website to create a personal access token (PAT). Copy it, then run gitcreds::gitcreds_set() and paste the PAT when asked. If you still experience problems try usethis::gh_token_help() for help.

To check that you are ready to proceed, run:

worcs::check_github()
#> ℹ Active project has a remote repository that requires PAT authentication.
#> ℹ Check for PAT.✔ Check for PAT. ... done

If you see a green checkmark, you can create a new repository on ‘GitHub’ directly from ‘R’:

worcs::git_remote_create("empirical_cycle", private = FALSE)

This command will create a new public repository on ‘GitHub’ and link it to your local repository. The private = FALSE argument ensures the repository is public by default.

Alternatively, you may have already created a remote repository on the ‘GitHub’ website. Either way, assuming the name of that repository is empirical_cycle, you can connect it to your project folder as follows:

worcs::git_remote_connect(project_path, remote_repo = "empirical_cycle")

2.4 Adding a Shareable Theory File to the Repository

Your theory should be represented as a digital artifact, such as a structured plain-text document or a machine-readable file (e.g., ‘DOT’, ‘JSON’, ‘YAML’, ‘R’ code). At this point, we offer two alternatives.

2.4.1 Creating a Plain-Text Theory

You could simply copy De Groot’s implementation of the empirical cycle into a plain text file, like so:

writeLines(
  c("*Phase 1:* 'Observation': collection and grouping of empirical materials;
    (tentative) formation of hypotheses.",
    "*Phase 2:* 'Induction': formulation of hypotheses.", 
    "*Phase 3:* 'Deduction': derivation of specific consequences
    from the hypotheses, in the form of testable predictions.",
    "*Phase 4:* 'Testing': of the hypotheses against new empirical materials,
    by way of checking whether or not the predictions are fulfilled.",
    "*Phase 5:* 'Evaluation': of the outcome of the testing procedure
    with respect to the hypotheses or theories stated, as well as
    with a view to subsequent, continued or related, investigations."
), file.path(project_path, "theory.txt"))

2.4.2 Further Formalizing the Empirical Cycle

If we compare it to the levels of theory formalization (Guest and Martin 2021), De Groot’s theory is either at the “theory” or “specification” level. It consists of a series of natural language statements. We can increase the level of formalization, and present an “implementation” in the human- and machine-readable DOT language:

theory <- 
"digraph {

  observation;
  induction;
  deduction;
  test;
  evaluation;
  
  observation -> induction;
  induction -> deduction;
  deduction -> test;
  test -> evaluation;
  evaluation -> observation;
  
}"

This language describes the model as a directed graph. Note that the code has been organized so that the first half describes an ontology of the entities the theory postulates, and the second half describes their proposed interrelations. This follows the first two properties of good theory according to Meehl (Meehl 1990).

We can now write this implementation of the empirical cycle to a text file, say empirical_cycle.dot.

cat(theory, file = file.path(project_path, "empirical_cycle.dot"), sep = "\n")

2.5 Adding a LICENSE File to the Repository

A license ensures that others know how they can legally reuse your work. For FAIR theory, we recommend using the CC0 (Creative Commons Zero) license, which places your work in the public domain. Add a license file to your repository:

worcs::add_license_file(path = project_path, license = "cc0")
#> ℹ Writing license file✔ Writing license file ... done

2.6 Adding a README File to the Repository

A README file describes the repository’s contents and purpose, making it easier for others to understand and reuse your theory. The theorytools package contains a function to generate a README file with appropriate sections for FAIR theory, which can be used like so:

theorytools::add_readme_fair_theory(title = "The Empirical Cycle",
                                    path = project_path)
#> ℹ Creating README.md✔ Creating README.md ... done

We encourage users to edit the resulting README.md file, in particular, to add relevant information about X-interoperability.

2.7 Adding ‘Zenodo’ Metadata to the Repository

‘Zenodo’ uses metadata files to archive and index repositories. Create a .zenodo.json file with metadata about your theory so that it is indexed appropriately:

add_zenodo_json_theory(
  path = project_path,
  title = "The Empirical Cycle",
  keywords = c("philosophy of science", "methodology")
)
#> ℹ Add 'Zenodo' metadata✔ Add 'Zenodo' metadata ... done

2.8 Pushing These Changes to the Remote Repository

Version control requires adding files to be tracked to the repository (gert::git_add()), committing changes to those files (gert::git_commit()), and pushing them to the remote repository (gert::git_push()). The worcs function worcs::git_update() combines these three actions, acting like a kind of “quick-save” function:

worcs::git_update("First commit of my theory", repo = project_path)
#> ℹ Identify local 'Git' repository at "C:\\Users\\vanlissa\\AppData\\Local\\Temp…✔ Identify local 'Git' repository at "C:\\Users\\vanlissa\\AppData\\Local\\Temp…
#> ℹ Adding files to staging area of 'Git' repository.✔ Adding files to staging area of 'Git' repository. ... done
#> ℹ Committed staged files to 'Git' repository.✔ Committed staged files to 'Git' repository. ... done
#> ℹ Push local commits to remote repository.✖ Push local commits to remote repository. ... failed

2.9 Check Your ‘GitHub’ Repository

Navigate to your repository on ‘GitHub’ and check that all committed files, including the theory file, license, README, and ‘Zenodo’ metadata, are now visible in the remote repository (green box in the image below).

Furthermore, the repository visibility must be set to “Public” to ensure that ‘Zenodo’ can discover and archive it. If you created the repository programmatically as shown above, it should already be public (see red box in the image above). If necessary, change the visibility setting to Public by clicking on “Settings” > “General” > “Change repository visibility.”

2.10 Login to ‘Zenodo’

Head over to zenodo.org. ‘Zenodo’ is a platform where you can permanently archive your code and other project elements. ‘Zenodo’ does this by assigning projects a Digital Object Identifier (DOI), which also helps to make the work more citable. This is different to ‘GitHub’, which acts as a place where the actual work on a project takes place, rather than long-term archiving of it. At ‘GitHub’, content can be modified, deleted, rewritten, and irreversibly changed, which makes it a bit concerning to be used for longer lasting referencing purposes. ‘Zenodo’ offers more security and permanence for research outputs.

If you already have a ‘Zenodo’ account, this is easy. If not, follow the steps to create one — you can login using your ‘GitHub’ account.

2.11 Authorize ‘GitHub’ to connect with ‘Zenodo’

On the ‘Zenodo’ website authorize it to connect to your ‘GitHub’ account in the ‘Using ’GitHub’’ section. Here, ‘Zenodo’ will redirect you to ‘GitHub’ to ask for permissions to use ‘webhooks’ on your repositories. You want to authorize ‘Zenodo’ here with the permissions it needs to form those links.

2.12 Select the Repository to Archive

If you have got this far, this means that ‘Zenodo’ is now authorized to configure the repository webhooks that it needs to archive the repository and issue it a DOI. To do this, on the ‘Zenodo’ website navigate to the ‘GitHub’ repository listing page and simply “flip the switch” next to your repository. If your repository does not show up in the list, you may need to press the ‘Syncronize now’ button. At the time of writing, we noticed that it can take quite a while (hours?) for ‘Zenodo’ to detect new ‘GitHub’ repositories. If so, take a break or come back to this last step tomorrow!

2.13 Optional: Check repository settings

If you were successful, you have now set up a new webhook between ‘Zenodo’ and your repository.

Optionally, you can verify this. In ‘GitHub’, click on the settings for your repository, and the Webhooks tab on the left hand side menu. This should display the new ‘Zenodo’ webhook configured to ‘Zenodo’. Note, it may take a little time for the webhook listing to show up.

2.14 Create a New Release

To archive a repository on ‘Zenodo’, you must create a new release. You can do this using the following code:

worcs::git_release_publish(repo = project_path)

If you have not previously published any releases, this function will assume that you want to use semantic versioning for both the release tag and the release title. This means that the first release will be labeled with version number “0.1.0”. Each subsequent release will automatically increment the trailing digit, i.e.: “0.1.1”, “0.1.2”. If you make a major change to the theory, you may want to manually increment the middle digit like so:

worcs::git_release_publish(repo = project_path,
                           tag_name = "0.2.0",
                           release_name = "0.2.0")

2.15 Verify on ‘Zenodo’

To verify that your release was archived on ‘Zenodo’ and assigned a DOI, you need to visit the Uploads tab.

2.16 Entering Meta-Data

We can further document our ‘Zenodo’ archive as a FAIR theory by adding some extra information on ‘Zenodo’. On ‘Zenodo’ click the Upload tab in the main menu, where you should find your newly uploaded repository.

Click the orange Edit button, and verify/supply the following information:

To save these changes, click ‘Publish’.

2.17 Verifying That ‘Zenodo’ Mints a DOI for Your Theory

After publishing a release, ‘Zenodo’ will archive the repository and mint a DOI. Verify this by checking the ‘Zenodo’ entry for your repository, where the DOI will be displayed. Include this DOI in any citations or references to your theory to enhance its discoverability and reusability.

The ‘GitHub’/‘Zenodo’ integration will assign one “mother-DOI” to the project, as well as a unique DOI to each version/release of the FAIR theory. This enables users to refer to and cite specific versions of the theory. The list of authors for the citation is automatically determined by the ‘GitHub’ user account names used by the repository - this can be edited on ‘Zenodo’, as explained above. DOIs used in ‘Zenodo’ are registered through the DataCite service.

Pro-tip: Check the Citation field on the ‘Zenodo’ page, and copy-paste it into the README file of your ‘GitHub’ repo to make cross-linking even easier (or refer users to the ‘Zenodo’ page to find the citation, which obviates the need to manually update this information). Click the DOI badge in the Details field to get instructions on how to add a clear highlighted DOI badge to your ‘GitHub’ repository, for users to see and make use of your DOI:

DOI

2.18 CONGRATULATIONS!

Your FAIR theory is now archived in ‘Zenodo’, and with a DOI that can be versioned to reflect updates to the repository version through time. You should be able to see details of this on the ‘GitHub’ ‘Zenodo’ page for your repository. This also means that your archived projects can get picked up by other indexing services and search engines that use DOIs too.

Providing a long-term archive and a DOI for your work is required for others to be able to properly cite it, as this provides basic citation metadata. For Open Science, it is important to be able to comprehensively cite the resources that you use in your research, including theory, and this workflow enables that to happen, in line with best practices. Making theory FAIR also helps elevate the standard of theory to that of the standard of other research outputs, like papers and software.

Pro-tip: Is your research funded by an EU grant? Now you can directly connect your FAIR theory to your grant by updating the grant section of the metadata on the project’s ‘Zenodo’ record. This massively helps to increase its discoverability!

2.19 Checklist for citing your project

So now you have a sustainably archived ‘GitHub’ repository in ‘Zenodo’ that is ready to be re-used and cited! Before continuing, make sure that you have:

3 Optional: Everything in One Step

The function theorytools::create_fair_theory() automates most of the preceding steps, up to step 2.8. Assuming you have already created a shareable theory file called theory.txt which resides in the currently active directory (getwd()), you can create your FAIR theory as follows:

create_fair_theory(
  path = file.path("c:/theories", "empirical_cycle"),
  title = "The Empirical Cycle, Again",
  theory_file = "theory.txt",
  remote_repo = "empirical_cycle2",
  add_license = "cc0")

You should still complete steps 2.12 - 2.17 manually.

4 References

This tutorial is partly adapted from Module 5, Task 2 of Tennant et al. (2018).

De Groot, Adriaan D., and J. A. A. Spiekerman. 1969. Methodology: Foundations of Inference and Research in the Behavioral Sciences. De Gruyter Mouton. https://doi.org/10.1515/9783112313121.
Guest, Olivia, and Andrea E. Martin. 2021. “How Computational Modeling Can Force Theory Building in Psychological Science.” Perspectives on Psychological Science 16 (4): 789–802. https://doi.org/10.1177/1745691620970585.
Meehl, Paul E. 1990. “Appraising and Amending Theories: The Strategy of Lakatosian Defense and Two Principles That Warrant It.” Psychological Inquiry 1 (2): 108–41. https://doi.org/10.1207/s15327965pli0102_1.
Tennant, Jon, Simon Worthington, Tania Allard, Philipp Zumstein, Daniel S. Katz, Alexander Morley, Stephan Druskat, et al. 2018. OpenScienceMOOC/Module-5-Open-Research-Software- and-Open-Source: Second Release.” Zenodo. https://doi.org/10.5281/zenodo.1434288.