diff --git a/.gitignore b/.gitignore
index bcc1406..4d83dbe 100644
--- a/.gitignore
+++ b/.gitignore
@@ -4,3 +4,6 @@
# Additional
__pycache__
+# MkDocs build output
+site/
+
diff --git a/.markdownlint.json b/.markdownlint.json
index 8d238b9..dd39775 100644
--- a/.markdownlint.json
+++ b/.markdownlint.json
@@ -1,5 +1,8 @@
{
"MD007": { "indent": 4 },
"no-hard-tabs": false,
- "MD013": false
-}
\ No newline at end of file
+ "MD013": false,
+ "MD026": { "punctuation": ".,;:" },
+ "MD040": false,
+ "MD046": false
+}
diff --git a/.markdownlintignore b/.markdownlintignore
new file mode 100644
index 0000000..50e524a
--- /dev/null
+++ b/.markdownlintignore
@@ -0,0 +1,2 @@
+docs/wiki-guide/HF_*_Template*.md
+mkdocs.yaml
diff --git a/CITATION.cff b/CITATION.cff
index 76eeb1d..10bb984 100644
--- a/CITATION.cff
+++ b/CITATION.cff
@@ -1,4 +1,4 @@
-abstract: "Imageomics-focused guide to collaborative work, including GitHub and Hugging Face workflows."
+abstract: "Template guide site for collaborative work, including GitHub and Hugging Face workflows."
authors:
- family-names: "Campolongo"
given-names: "Elizabeth G."
@@ -6,17 +6,26 @@ authors:
- family-names: "Thompson"
given-names: "Matthew J."
orcid: "https://orcid.org/0000-0003-0583-8585"
-- family-names: "Zoe"
- given-names: "Duan"
+- family-names: Zhang
+ given-names: Net
+ orcid: "https://orcid.org/0000-0003-2664-451X"
+- family-names: "Duan"
+ given-names: "Zoe"
orcid: "https://orcid.org/0000-0002-8547-5907"
- family-names: "Bradley"
given-names: "John"
orcid: "https://orcid.org/0000-0003-3858-848X"
+- family-names: Eyriay
+ given-names: Iuliia
+ orcid: "https://orcid.org/0009-0007-1597-8684"
+- family-names: Taylor
+ given-names: Graham
+ orcid: "https://orcid.org/0000-0001-5867-3652"
- family-names: "Lapp"
given-names: "Hilmar"
orcid: "https://orcid.org/0000-0001-9107-0714"
cff-version: 1.2.0
-date-released: "2024-11-DD"
+date-released: "2025-06-DD" # to update on release
identifiers:
- description: "The GitHub release URL of tag v1.0.0."
type: url
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
new file mode 100644
index 0000000..76864a9
--- /dev/null
+++ b/CONTRIBUTING.md
@@ -0,0 +1,234 @@
+# Contributing to the Imageomics Guide
+
+Thank you for your interest in contributing to the Imageomics Guide!
+
+This document outlines the standards and guidelines for contributing to the Imageomics Guide. Before you begin, please review the information provided here.
+
+First, is your contribution specific to Imageomics, or would it be more broadly applicable? If more general, please consider instead directing the update or suggestion to the [Collaborative Distributed Science Guide](https://github.com/Imageomics/Collaborative-distributed-science-guide); updates to the template repository will be incorporated both here and in other other guides developed from it. If it _is_ Imageomics-specific, please continue to review this document—we look forward to your input!
+
+## Overview
+
+The Imageomics Guide is built with [MkDocs Material](https://squidfunk.github.io/mkdocs-material/) and deployed via GitHub Pages. All documentation is written in Markdown and follows specific formatting standards to ensure consistent rendering and maintainability.
+
+## Getting Started
+
+### Local Development Setup
+
+1. Clone the repository
+2. Set up a virtual environment (recommended):
+
+ ```bash
+ python -m venv .venv
+ source .venv/bin/activate # On Windows: .venv\Scripts\activate
+ ```
+
+ For more detailed environment setup options (including conda), see our [Virtual Environments guide](docs/wiki-guide/Virtual-Environments.md).
+
+3. Install dependencies:
+
+ ```bash
+ pip install -r requirements.txt
+ ```
+
+4. Serve the site locally:
+
+ ```bash
+ mkdocs serve
+ ```
+
+5. View the site at
+
+### Testing Changes
+
+Always test your changes locally with `mkdocs serve` before submitting a PR to ensure:
+
+- Content renders correctly
+- Links work properly
+- Formatting appears as intended
+- No build errors occur
+
+## Documentation Standards
+
+### Markdown Formatting
+
+#### Indentation for Nested Lists
+
+- **Use 4 spaces for nested list items** (not 2 spaces)
+- This requirement exists due to Python-Markdown compatibility issues with MkDocs
+- 2-space indentation causes nested lists to not render properly in the final HTML
+
+**Correct:**
+
+```markdown
+- [ ] Main item
+ - [ ] Nested item
+ - [ ] Another nested item
+```
+
+**Incorrect:**
+
+```markdown
+- [ ] Main item
+ - [ ] Nested item (will not render as nested)
+```
+
+#### General Formatting
+
+- Remove trailing whitespace
+- Use consistent line breaks
+- Follow the project's `.markdownlint.json` configuration
+- Ensure proper heading hierarchy (don't skip heading levels)
+
+### License Format Requirements
+
+#### Hugging Face YAML Frontmatter
+
+When specifying licenses in Hugging Face dataset/model card YAML sections, **always use lowercase**:
+
+**Correct:**
+
+```yaml
+license: cc0-1.0
+```
+
+**Incorrect:**
+
+```yaml
+license: CC0-1.0 # Will cause issues with Hugging Face platform
+```
+
+This is a platform-specific requirement for Hugging Face compatibility.
+
+#### License References in Text
+
+In prose text, you may use standard capitalization (e.g., "CC0", "MIT"), but YAML frontmatter must be lowercase.
+
+### File Organization
+
+- Documentation content goes in `docs/`
+- Wiki-style guides go in `docs/wiki-guide/`
+- Images and assets are organized in subdirectories within `docs/`
+- Templates use descriptive filenames with consistent naming patterns
+
+### Custom Macros
+
+The project includes custom MkDocs macros defined in `main.py`:
+
+- `include_file_as_code()` - Embeds file content as code blocks
+- When using macros, ensure proper syntax and test rendering locally
+
+## Contribution Process
+
+1. **Create an issue** describing the change (for significant additions)
+2. **Create a feature branch** from `dev`
+3. **Make your changes** following the standards above
+4. **Test locally** with `mkdocs serve`
+5. **Run linting (OPTIONAL)** to ensure formatting consistency
+ - See instructions in [Linting](#linting)
+6. **Submit a pull request** with:
+ - Clear description of changes
+ - Reference to related issue
+ - Screenshots if UI changes are involved
+
+### Pull Request Guidelines
+
+- Keep PRs focused on a single topic when possible
+- Follow commit message conventions (see below)
+- Update navigation in `mkdocs.yaml` if adding new pages
+- Ensure all links work correctly
+- Test that the site builds without errors
+
+### Commit Message Guidelines
+
+The most important aspects of good commit messages are that they should be **descriptive** and **atomic** (each commit should represent a single logical change). Additionally:
+
+- **Keep the first line short**: Limit the subject line to 50 characters or less
+- **Use the imperative mood**: "Add feature" not "Added feature" or "Adds feature"
+- **Separate subject from body**: Use a blank line between the subject line and detailed description
+
+#### Conventional Commits Recommendation
+
+We recommend following the [Conventional Commits](https://www.conventionalcommits.org/) format for commit messages:
+
+**Format:** `type(scope): description`
+
+**Common types:**
+
+- `feat`: New feature or content addition
+- `fix`: Bug fix or correction
+- `docs`: Documentation updates
+- `style`: Formatting changes (no content changes)
+- `refactor`: Code/content restructuring without changing functionality
+- `chore`: Maintenance tasks, tooling updates
+
+**Examples:**
+
+```bash
+feat(fair-guide): add data repository checklist
+fix(templates): correct license format in HF dataset card
+docs(contributing): add conventional commit guidelines
+style(checklists): fix markdown formatting and indentation
+chore: update mkdocs dependencies
+```
+
+**Scope** is optional but helpful for larger changes. Use the guide section or file type being modified.
+
+**Note:** Since we use squash merges, strict adherence to this format isn't required, but descriptive and atomic commits help maintain a clear project history.
+
+## Quality Assurance
+
+### Linting
+
+The project uses [markdownlint](https://github.com/DavidAnson/markdownlint) with configuration in `.markdownlint.json`. Key settings:
+
+- 4-space indentation for lists (`MD007`).
+- No hard tab restrictions disabled.
+- Line length restrictions disabled (`MD013`).
+- Restrict punctuation in headers (`MD026`); allow `!` and `?`.
+- Allowed code blocks without language specification (`MD040`).
+- Allow fenced code blocks, as this commonly errors when indented (see [discussion](https://github.com/DavidAnson/markdownlint/issues/327)).
+
+For faster PR review, you may want to run linting locally; we do have a PR Action in place as well. First install markdownlint, then run
+
+```console
+markdownlint -c .markdownlint.json -f docs/wiki-guide/
+```
+
+The `-f` resolves simple formatting issues, and alerts will be raised for more complicated linter style rules (e.g., referencing a link as `[here](URL)` will produce the line: `.md:191:2 MD059/descriptive-link-text Link text should be descriptive [Context: "[here]"]`).
+
+### Content Review
+
+When reviewing content:
+
+- Verify accuracy of technical information
+- Check for consistency with existing guides
+- Ensure proper cross-referencing between related pages
+- Validate that external links are current and working
+
+## Platform-Specific Considerations
+
+### Hugging Face Integration
+
+- Dataset and model card templates must follow HF specifications
+- YAML frontmatter formatting is critical for platform compatibility
+- License identifiers must match HF's expected format
+
+### MkDocs/Python-Markdown
+
+- Nested list rendering requires specific indentation
+- Some Markdown extensions may behave differently than GitHub Flavored Markdown
+- Always test complex formatting locally
+
+## Getting Help
+
+- Open an [issue](https://github.com/Imageomics/Imageomics-guide/issues) for questions or problems
+- Reference existing guides and templates for examples
+- Check the [MkDocs Material documentation](https://squidfunk.github.io/mkdocs-material/) for advanced features
+
+## Code of Conduct
+
+All contributors must adhere to our [Code of Conduct](docs/CODE_OF_CONDUCT.md) and organizational principles of engagement.
+
+---
+
+Thank you for helping improve the Imageomics Guide! Your contributions help make collaborative scientific computing more accessible and effective.
diff --git a/README.md b/README.md
index bf7b3b2..734f65f 100644
--- a/README.md
+++ b/README.md
@@ -2,22 +2,39 @@
Welcome to the Imageomics Guide!
-Just joining or starting a new project?
-Checkout the [Imageomics Guide](https://imageomics.github.io/Imageomics-guide/) for guidance on conventions and best practices.
+Just joining or starting a new project?
+Check out the [Imageomics Guide](https://imageomics.github.io/Imageomics-guide/) for guidance on conventions and best practices.
## About the Guide
-This guide started as an Institute-internal wiki, focused on providing guidance and best practices for collaborative and interdisciplinary (computer science + biology) work. Recognizing that the topics and suggestions are broadly applicable to anyone working in similar or adjacent fields, we moved the vast majority to this [guide](https://imageomics.github.io/Imageomics-guide/). To increase accessibility for those less familiar with GitHub, we generated the [web site](https://imageomics.github.io/Imageomics-guide/) from our Markdown documents (which used to be wiki pages) with [Material for MkDocs](https://squidfunk.github.io/mkdocs-material/).
+This guide started as an Imageomics Institute-internal wiki, focused on providing guidance and best practices for collaborative and interdisciplinary (computer science + biology) work. Recognizing that the topics and suggestions are broadly applicable to anyone working in similar or adjacent fields, we moved the vast majority to this [guide](https://imageomics.github.io/Imageomics-guide/). To increase accessibility for those less familiar with GitHub, we generated the website from our Markdown documents (which used to be wiki pages) with [Material for MkDocs](https://squidfunk.github.io/mkdocs-material/).
-Please feel free to open an [issue](https://github.com/Imageomics/Imageomics-guide/issues) with any questions regarding the content fo this guide or if you would like to contribute to the [Glossary](https://imageomics.github.io/Imageomics-guide/wiki-guide/Glossary-for-Imageomics/) or [Helpful Tools page](https://imageomics.github.io/Imageomics-guide/wiki-guide/Helpful-Tools-for-your-Workflow/).
+Please feel free to open an [issue](https://github.com/Imageomics/Imageomics-guide/issues) with any questions regarding the content of this guide.
+
+## Contributing
+
+If you'd like to contribute to this guide, please read our [Contributing Guidelines](CONTRIBUTING.md) for information about our standards, development workflow, and submission process.
### Testing
+
To test this site locally, first clone this repository, then create an environment with `requirements.txt`
+
```
pip install -r requirements.txt
```
+
and run `mkdocs serve`:
+
```
mkdocs serve
```
-Then the site will run at http://127.0.0.1:8000/Imageomics-guide/.
+
+Then the site will run at .
+
+### History
+
+This guide houses the information needed to get started with and use institute resources readily available to all members. However, most of its content is applicable to anyone working more broadly in the field of [_imageomics_](https://imageomics.github.io/Imageomics-guide/wiki-guide/Glossary-for-Imageomics.md/#imageomics) or adjacent fields of computer and data science, and it is tailored to help domain scientists bridging that gap. We further expanded development to include a more general template guide, the [Collaborative Distributed Science Guide](https://imageomics.github.io/Collaborative-distributed-science-guide/), for others wishing to develop a similar organization-specific guide (please see the [template repository](https://github.com/Imageomics/Collaborative-distributed-science-guide) for more information). This solution was born out of the desire to do so for the [AI and Biodiversity Change (ABC) Global Center](http://abcresearchcenter.org) while limiting duplicative updates between guides (Imageomics and ABC share some team members on this project).
+
+## Acknowledgments
+
+This work was supported by both the [Imageomics Institute](https://imageomics.org) and the [AI and Biodiversity Change (ABC) Global Center](http://abcresearchcenter.org). The Imageomics Institute is funded by the US National Science Foundation's Harnessing the Data Revolution (HDR) program under [Award #2118240](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2118240) (Imageomics: A New Frontier of Biological Information Powered by Knowledge-Guided Machine Learning). The ABC Global Climate Center is funded by the US National Science Foundation under [Award No. 2330423](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2330423&HistoricalAwards=false) and Natural Sciences and Engineering Research Council of Canada under [Award No. 585136](https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=782440). This guide draws on research supported by the Social Sciences and Humanities Research Council. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, Natural Sciences and Engineering Research Council of Canada, or Social Sciences and Humanities Research Council.
diff --git a/docs/CODE_OF_CONDUCT.md b/docs/CODE_OF_CONDUCT.md
index b096ac8..b27ef6d 100644
--- a/docs/CODE_OF_CONDUCT.md
+++ b/docs/CODE_OF_CONDUCT.md
@@ -6,13 +6,13 @@ To this end, we agree as individuals and as a group to:
- **Listen to understand.** When one person talks, others listen.
- **Speak to be understood.** We use lay terms and are patient with people who are not experts in our specific field. We are all learning, no matter who we are.
-- Embrace **“Yes and…”** Focus on possibilities instead of obstacles. Be inclusive of other people’s ideas. Honor divergence.
+- Embrace **“Yes and…”** Focus on possibilities instead of obstacles. Be inclusive of other people’s ideas. Honor divergence.
- **Take space / make space.** Those who tend to talk a lot are intentional about letting others talk first, while those who tend to hold back are intentional about contributing.
-- **Beware of blind spots.** We do not know what we do not know. We are vigilant for differences among our experiences and positions.
-- **Respect time.** When a session is over, we need to move on. There is designated time for in-depth follow up and continuing conversations.
-- **Care** for each other. We bring our full selves to the community, and we look out for each other wholeheartedly.
+- **Beware of blind spots.** We do not know what we do not know. We are vigilant for differences among our experiences and positions.
+- **Respect time.** When a session is over, we need to move on. There is designated time for in-depth follow up and continuing conversations.
+- **Care** for each other. We bring our full selves to the community, and we look out for each other wholeheartedly.
-We abide by these principles in all Imageomics spaces, including but not limited to digital and in-person meetings, formal and informal gatherings, online discussion forums and chat spaces, and field and lab work.
+We abide by these principles in all Imageomics spaces, including but not limited to digital and in-person meetings, formal and informal gatherings, online discussion forums and chat spaces, and field and lab work.
Acts of misconduct are prohibited. Those found to engage in misconduct will be subject to dismissal from the project and further actions as directed by the guidelines of the employers and the place of incidence.
@@ -25,11 +25,19 @@ If you believe you have experienced or witnessed misconduct in an Imageomics set
Privacy will be protected to the greatest extent possible.
## VALUES
+
### TRANSPARENCY
+
We ensure our efforts are clear about assumptions, uncertainty, and limits, and provide open sources of information, processes, and discovery.
+
### ACCOUNTABILITY
+
We are responsible, individually and collectively, for the outcomes we produce and ensure, to the best of our abilities, that the methods outcome matches intended use.
+
### COLLABORATION
+
We create and nurture collaborative environments and welcome, value, and affirm all members of our community. We also consider how and for whom solutions are created and promote the heterogeneity of perspectives in the creation process. We actively engage others’ perspectives, recognize everyone’s potential to contribute new ideas, and work together to find creative solutions to complex problems.
+
### SAFETY
+
We ensure our practices are ethical and impartial to the best of our ability. We address ethical issues when we discover them and practice good data governance. We strive to enhance practices while openly addressing those that harm people or the environment.
diff --git a/docs/index.md b/docs/index.md
index a77dfa6..326bed3 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -1,23 +1,25 @@
-# Welcome to the Imageomics Institute Guide!
+# Welcome to the Imageomics Guide!
-This website hosts guides to Imageomics workflows, documentation, and general best-practices for collaborative science. We aim to provide a helpful resource for a broad base of scientists working in the field of [_imageomics_](wiki-guide/Glossary-for-Imageomics.md/#imageomics) and beyond.
+This website hosts Imageomics-focused guides to FAIR (findable, accessible, interoperable, reusable) and reproducible workflows, documentation, and general best-practices for collaborative science. We aim to provide a helpful resource for scientists working in [_imageomics_](wiki-guide/Glossary-for-Imageomics.md/#imageomics) and related interdisciplinary fields.
-It houses the information needed to get started with and use institute resources readily available to all members. However, most of this guide is applicable to anyone working more broadly in the field of imageomics or adjacent fields of computer and data science, and it is tailored to help domain scientists bridging that gap.
+This guide houses the information needed to get started with and use Imageomics Institute resources readily available to all members. However, most of this guide is applicable to anyone working more broadly in the field of imageomics or adjacent fields of computer and data science, and it is tailored to help domain scientists bridging that gap.
## Highlights
There are many pages of useful information contained in this guide covering a range of topics from project management and workflows, to repositories and archives, to a glossary of _imageomics-related_ terms for improved interdisciplinary communication.
### Just starting a project?
+
Check out our guides to get your project off on the right foot!
-- [The GitHub Repo Guide](wiki-guide/GitHub-Repo-Guide.md): This page reviews expected and suggested GitHub repository contents, as well as structural considerations.
+- [The GitHub Repo Guide](wiki-guide/GitHub-Repo-Guide.md): This page reviews expected and suggested GitHub repository contents, as well as structural considerations.
- [The Hugging Face Repo Guide](wiki-guide/Hugging-Face-Repo-Guide.md): Analogous expected and suggested repository contents for Hugging Face repositories; there are notable differences from GitHub in both content and structure.
- [FAIR Guide](wiki-guide/FAIR-Guide.md): Guide to producing FAIR digital products, from metadata collection through product documentation and publication. This builds on the content in both the GitHub and Hugging Face Repository Guides, providing checklists to ensure [code](wiki-guide/Code-Checklist.md), [data](wiki-guide/Data-Checklist.md), and [model](wiki-guide/Model-Checklist.md) repositories are FAIR. The latter two closely follow our [HF Templates](wiki-guide/About-Templates.md).
### Project repo up, what's next?
+
Check out our workflow guides for how to interact with your new repo:
- [The GitHub Workflow](wiki-guide/The-GitHub-Workflow.md): This page mainly focuses on branching and the PR process.
@@ -25,6 +27,7 @@ Check out our workflow guides for how to interact with your new repo:
- [The Hugging Face Workflow](wiki-guide/The-Hugging-Face-Workflow.md): Analogous workflow directions for Hugging Face; there are notable differences from GitHub in how this process works practically, though the concept is the same.
### Project management or organization got you down?
+
Discover new tools to help:
- [Guide to GitHub Projects](wiki-guide/Guide-to-GitHub-Projects.md): This page focuses on GitHub's project management tool, Projects, which integrates issues and pull requests into a unified task board to keep tabs on how your project is progressing. Labels, milestones, and assignee tags provide improved organization, and allow for more focused views.
@@ -33,39 +36,34 @@ Discover new tools to help:
- [Virtual Environments](wiki-guide/Virtual-Environments.md): Summary of `conda` and `pip` environments: how to make, use, and share them.
-
## Collaborative Infrastructure We Use
- GitHub
- - [Institute Code Repositories](https://github.com/Imageomics) where we store our code (software + tools).
+ - [Imageomics Code Repositories](https://github.com/Imageomics), where we store our code (software + tools).
- GitHub's [Docs](https://docs.github.com/en)
- [Repositories](https://docs.github.com/en/repositories)
- [GitHub Projects](https://docs.github.com/en/issues/planning-and-tracking-with-projects)
- Hugging Face
- - [Imageomics Organization page](https://huggingface.co/imageomics) where we store our datasets and models (and their metadata).
+ - [Imageomics Organization Page](https://huggingface.co/imageomics), where we store our datasets and models (and their metadata).
- Additionally, use [Hugging Face Spaces](https://huggingface.co/docs/hub/spaces) to run demos of models and other projects.
- Hugging Face's [Docs](https://huggingface.co/docs)
- [Model Hub](https://huggingface.co/docs/hub/models-the-hub)
- [Datasets](https://huggingface.co/docs/hub/datasets-overview)
### Collaborative Infrastructure Diagram
-
+
## Imageomics Branding (Logos)
-We have two versions of the logo, a [fish](logos/Imageomics_logo_fish.png) and a [butterfly](logos/Imageomics_logo_butterfly.png), which should be used for scientific posters, conference, workshop, and meeting marketing materials, etc. Choice of logo is based on user preference.
+We have two versions of the logo, a [fish](https://github.com/Imageomics/Imageomics-guide/blob/main/docs/logos/Imageomics_logo_fish.png) and a [butterfly](https://github.com/Imageomics/Imageomics-guide/blob/main/docs/logos/Imageomics_logo_butterfly.png), which should be used for scientific posters, conference, workshop, and meeting marketing materials, etc. Choice of logo is based on user preference.
{: style="width:45%"}
{: style="width:45%"}
-
## Other pages of note
+
- [Glossary for Imageomics](wiki-guide/Glossary-for-Imageomics.md): Collection of terms used in imageomics. The goal is to ensure all participating domains are represented, thus facilitating interdisciplinary communication. This is a group effort, please check it out and add terms you think should be there!
- [Command Line Cheat Sheet](wiki-guide/Command-Line-Cheat-Sheet.md): Collection of useful bash and git commands with some git tips.
-
-
-
-
!!! question "[Questions, Comments, or Concerns?](https://github.com/Imageomics/Imageomics-guide/issues)"
diff --git a/docs/stylesheets/extra.css b/docs/stylesheets/extra.css
index 1a2a978..05db76f 100644
--- a/docs/stylesheets/extra.css
+++ b/docs/stylesheets/extra.css
@@ -9,3 +9,17 @@
.md-footer-generator {
order: 3;
}
+
+/*
+Make logo larger to be legible,
+based on this suggestion: https://github.com/squidfunk/mkdocs-material/discussions/2933#discussioncomment-1168075
+*/
+ .md-header__button.md-logo {
+ margin: 0;
+ padding: 1;
+}
+
+.md-header__button.md-logo img, .md-header__button.md-logo svg {
+ height: 2.8rem;
+ width: 2.8rem;
+}
diff --git a/docs/wiki-guide/About-Digital-Product-Policies.md b/docs/wiki-guide/About-Digital-Product-Policies.md
new file mode 100644
index 0000000..0502c6a
--- /dev/null
+++ b/docs/wiki-guide/About-Digital-Product-Policies.md
@@ -0,0 +1,3 @@
+# Imageomics Digital Product Policies
+
+This section contains the Imageomics digital policy documents. The Digital Products Release and Licensing Policy sets expectations for the publication of all digital products (e.g., data, models, code). The Digital Product Life Cycle is a project life cycle guide intended to clarify the best practices—described throughout this guide—that will aid in compliance with the Digital Products Release and Licensing Policy. It is also generally a good reference for project planning.
diff --git a/docs/wiki-guide/About-Templates.md b/docs/wiki-guide/About-Templates.md
index 0531296..5a707ba 100644
--- a/docs/wiki-guide/About-Templates.md
+++ b/docs/wiki-guide/About-Templates.md
@@ -1,15 +1,13 @@
# Using Dataset and Model Card Templates
-We provide Dataset and Model Card templates for both Imageomics and ABC, adapted from Hugging Face's templates. The Imageomics and ABC templates include guidance and examples for the various metadata sections, reference information for Hugging Face's particular flavor of markdown, and the appropriate NSF & NSERC grant acknowledgment.
+We provide Imageomics-specific Dataset and Model Card templates, adapted from Hugging Face's templates. These templates include guidance and examples for the various metadata sections, reference information for Hugging Face's particular flavor of markdown, and the appropriate NSF grant acknowledgment.
-To use a template for a new dataset or model repository on Hugging Face (HF), simply copy and paste the contents of the appropriate template ([Dataset Card](HF_DatasetCard_Template_mkdocs.md) or [Model Card](HF_ModelCard_Template_mkdocs.md)) into your `README.md` file.[^1]
+To use a template for a new dataset or model repository on Hugging Face (HF), simply copy and paste the contents of the appropriate template ([Dataset Card](HF_DatasetCard_Template_mkdocs.md) or [Model Card](HF_ModelCard_Template_mkdocs.md)) into your `README.md` file.[^1]
Then, follow the descriptions under each section to fill in the appropriate information. This is meant to be an iterative process throughout the life of your project, so do not worry if you cannot answer all parts at the beginning—that's to be expected!
[^1]: The templates can also be added to your repository thorugh the website user interface (UI): Navigate to the "Model/Dataset Card" tab on your repo, select "Create Model/Dataset Card", copy and paste the template contents into the `README.md` file, and add your content.
-
!!! tip "Practice makes perfect!"
If you have never filled out a dataset card before, or are unsure of how to find the answers to fill in the sections, we ran a [workshop](https://github.com/Imageomics/data-workshop-AH-2024) to help familiarize our members with this process. In particular, the portion where we walked through filling out part of a dataset card as we did exploratory data analysis (EDA) was recorded and is available on the [Imageomics YouTube Channel](https://www.youtube.com/@ImageomicsInstitute/videos). Read the [story of the workshop](https://github.com/Imageomics/data-workshop-AH-2024/#story-of-the-workshop) and clone the [repo](https://github.com/Imageomics/data-workshop-AH-2024) to follow along with the 1 hour and 15 minute lesson!
!!! note "Note"
The Dataset and Model cards have incorporated some of Hugging Face's January 2024 updates (following their [Dataset Card Overhaul](https://github.com/huggingface/huggingface_hub/commit/6dd7ee829bd1b1216663a9993c1943c29b64690a)). It doesn't appear they will be updated more and we do not currently anticipate further large updates on our end as our overall template formats have diverged. Nevertheless, you may wish to check HF for extra information or tagging updates ([HF Dataset Card](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/datasetcard_template.md), [HF Model Card](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md)).
-
diff --git a/docs/wiki-guide/Code-Checklist.md b/docs/wiki-guide/Code-Checklist.md
index b3c3569..12d98d2 100644
--- a/docs/wiki-guide/Code-Checklist.md
+++ b/docs/wiki-guide/Code-Checklist.md
@@ -19,7 +19,7 @@ This checklist provides an overview of essential and recommended elements to inc
- [ ] Acknowledge source code dependencies and contributors.
- [ ] Reference related datasets used in training or evaluation.
- [ ] **Requirements File**: Provide a [file detailing software requirements](GitHub-Repo-Guide.md/#software-requirements-file), such as a `requirements.txt` or `pyproject.toml` for Python dependencies.
-- [ ] **Gitignore File**: GitHub has premade `.gitignore` files ([here](https://github.com/github/gitignore)) tailored to particular languages (eg., [R](https://github.com/github/gitignore/blob/main/R.gitignore) or [Python](https://github.com/github/gitignore/blob/main/Python.gitignore)), operating systems, etc.
+- [ ] **Gitignore File**: GitHub has premade `.gitignore` files (see [github/gitignore](https://github.com/github/gitignore)) tailored to particular languages (eg., [R](https://github.com/github/gitignore/blob/main/R.gitignore) or [Python](https://github.com/github/gitignore/blob/main/Python.gitignore)), operating systems, etc.
- [ ] **CITATION CFF**: This facilitates citation of your work, follow guidance provided in the [Repo Guide](GitHub-Repo-Guide.md/#citation).
### Data-Related
@@ -81,7 +81,7 @@ The [Repo Guide](GitHub-Repo-Guide.md/) provides general guidance on repository
### Documentation
- [ ] **API Documentation**: Generate API documentation (e.g., [`MkDocs`](https://www.mkdocs.org) for Python or wiki pages in the repo).
-- [ ] **Docstrings**: Add comprehensive docstrings for all functions, classes, and modules. These can be incorporated to help generate documentation. Note that generative AI tools with access to your code, such as GitHub Copilot, can be quite accurate in generating these, especially if you are using type annotations.
+- [ ] **Docstrings**: Add comprehensive docstrings for all functions, classes, and modules. These can be incorporated to help generate documentation. Note that generative AI tools with access to your code, such as GitHub Copilot, can be quite accurate in generating these, especially if you are using type annotations.
- [ ] **Example Scripts**: Include example scripts for common use cases.
- [ ] **Configuration Files**: Use `yaml`, `json`, or `ini` for configuration settings.
diff --git a/docs/wiki-guide/Command-Line-Cheat-Sheet.md b/docs/wiki-guide/Command-Line-Cheat-Sheet.md
index 4e0250b..9c1ad45 100644
--- a/docs/wiki-guide/Command-Line-Cheat-Sheet.md
+++ b/docs/wiki-guide/Command-Line-Cheat-Sheet.md
@@ -1,8 +1,13 @@
# Command Line Cheat Sheet
-
+
+
See also [GitHub's Markdown Guide](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax).
## Useful bash and git
+
+
+
| Command | Action |
| --- | --- |
| ` -h` | print the help documentation for a command, showing usage information and options |
@@ -11,12 +16,14 @@ See also [GitHub's Markdown Guide](https://docs.github.com/en/get-started/writin
| `pwd` | current working directory |
| `ls` | list everything in current directory (use `-a` to also show **a**ll files including hidden, `-l` for a **l**ong list including permissions and ownership info, `-1` ("dash one") to display the output with **1** item on each line) |
| `wc -l ` | use the **w**ord **c**ount command with the `-l` **l**ines option to list the number of lines in a file |
-| `du /`| calculate and show how much **d**isk **u**sage is consumed by a directory (use `-h` to make it **h**uman-readable, i.e. report in MB, GB or whatever units are most appropriate, and `-s` for **s**ummary of all the contents together rather than each item individually) |
+| `du /`| calculate and show how much **d**isk **u**sage is consumed by a directory (use `-h` to make it **h**uman-readable, i.e. report in MB, GB or whatever units are most appropriate, and `-s` for **s**ummary of all the contents together rather than each item individually) |
| ++ctrl+r++ | search for command (will pop up `bck-i-search:`) |
| `rm ` | remove a file (or folder with `-r`). Beware when using `rm -rf ` to **f**orce the **r**ecursive removal of all contents in a folder, which cannot be undone unless there is a backup. |
-| ` | ` | The "pipe" operator (++pipe++) feeds the output of the first command (`cmd1`) to the input of the second command (`cmd2`). For example, show the total number of files in a directory with `ls -1 | wc -l`|
+| ` | ` | The "pipe" operator (++pipe++) feeds the output of the first command (`cmd1`) to the input of the second command (`cmd2`). For example, show the total number of files in a directory with `ls -1 | wc -l` |
+
### Git-Specific
+
| Command | Action |
| --- | --- |
| `git log` | list of commits with author, date, time (type `q` to leave) |
@@ -30,10 +37,11 @@ See also [GitHub's Markdown Guide](https://docs.github.com/en/get-started/writin
| `git branch -d ` | delete branch |
!!! tip "Pro tip: Simplify your git history"
- - Use `git mv` to rename a file so that it is tracked as a rename (with or without changes).
+ - Use `git mv` to rename a file so that it is tracked as a rename (with or without changes).
- If you rename a file then `git add` its parent directory, the diff will show the deletion of the original file and addition of a "completely new" file, even if nothing has changed. This makes reviewing changes much more complicated than necessary.
#### Usual Process
+
After making changes to a file on a branch, check the status of your current working branch (with `git status`). Then, you "add" the file, state what is new about the file ("commit the change"), and `push` the file from your local copy of the repo to the remote copy:
```bash
@@ -42,7 +50,6 @@ git add
git commit -m "Changed x,y,z"
git push
-
```
!!! tip "Pro tip: Check the stage"
@@ -50,6 +57,7 @@ git push
!!! note Note
If you need to update your branch with changes from the remote `main`, first switch to the branch, then set pull from `main` instead of the current branch, as below.
+
```bash
git checkout
diff --git a/docs/wiki-guide/Digital-Product-Lifecycle.md b/docs/wiki-guide/Digital-Product-Lifecycle.md
new file mode 100644
index 0000000..7153ac6
--- /dev/null
+++ b/docs/wiki-guide/Digital-Product-Lifecycle.md
@@ -0,0 +1,55 @@
+# Digital Product Life Cycle
+
+The Imageomics Institute is committed to FAIR and Reproducible data, software, ML models, and computational workflows, as demonstrated and defined by the [Digital Products Release and Licensing Policy](Digital-products-release-licensing-policy.md) that the Institute adopted. To achieve full and consistent adherence to this policy, promote integration of requisite best practices into the research project lifecycle, and to ensure the limited data science support resources of the Institute for assistance are utilized efficiently, the Digital Product Life Cycle aims to establish a life cycle framework with designated, regular, interspersed points at which research project teams are expected to engage with the Institute’s data science support team about digital artifacts and practices supporting adherence to our digital product commitments.
+
+Although most of the engagement from the side of research teams is expected to (and arguably should) primarily involve NextGens, responsibility for awareness of this policy and a team’s commitment to follow it lies with project[^1] PI(s). By following these guidelines, it will be easier to meet these requirements before paper submission deadlines without requiring major revisions on a truncated schedule (i.e., most—if not all—of the FAIR requirements will have been resolved prior to conference submissions). Below is a project life cycle diagram outlining the expected process of this policy, followed by an enumeration of the expectations organized by development phase.
+
+
+**Figure 1:** _Visual representation of the AI/ML project life cycle underpinning this policy. After the Setup phase, both Exploration and Model Development are ongoing iterative processes that build to the ultimate goal of a published paper, following Publication Preparation where the work of the previous iterative phases is reviewed and polished for **F**indability, **A**ccessibility **I**nteroperability, **R**eusability (FAIR) and Reproducibility. Key Stages for Project Consultations are highlighted, along with Helpful Resources to guide work at each stage and checklists to ensure FAIR and Reproducibility are always in mind._
+
+## Responsibilities and Actions
+
+The following adds additional context and direction to supplement the diagram, organized by project lifecycle stage.
+
+### Setup Phase
+
+* NextGens and/or project[^1] PIs schedule a project consultation with the Senior Data Scientist. This will include scope and intended data usage for improved research convergence and to ensure projects start with all available resources in mind.
+* In GitHub project repo, create an issue for each of the repositories for the digital products with the appropriate checklist:
+ * **Code and workflows:** GitHub Repository ([Code checklist](Code-Checklist.md)).
+ * **Datasets:** Hugging Face Dataset Repository ([Data checklist](Data-Checklist.md)).
+ * For already published data usage, see the [Metadata Checklist](Metadata-Checklist.md).
+ * **ML Models:** Hugging Face Model Repository ([Model checklist](Model-Checklist.md)).
+
+### Exploration Phase
+
+* Maintain record of any and all data utilized (source, license, citation, etc.).
+ * See [Data Sources Template](https://docs.google.com/spreadsheets/d/1r4-_Ytg2bwGMxLpYrk4GVhx61JSOYXANsSFjryNmsDE/edit?usp=drive_link).
+* Document exploration of data.
+ * This establishes an understanding of what the data is and how it can be used. For an example and guidance, consider the exploration and documentation done in the [Data Workshop](https://github.com/Imageomics/data-workshop-AH-2024).
+* Record processing steps applied—maintained in a well-documented code repository (following [GitHub Guidance](GitHub-Repo-Guide.md))—and update Dataset Card(s) with information and links back to GitHub repository.
+* Establish and update contributor list—follow the [Imageomics Author Guide](https://docs.google.com/spreadsheets/d/1GwlCukfoQPL8JI2yyWRD3g4uiMTO3tlGNE_qeb_xBCs/edit?usp=sharing).[^2]
+ * Authors and author order for the paper and codebase (and/or dataset) may differ, all should be discussed.
+
+### Model Development Phase
+
+* Maintain a record of any and all base models utilized (source, license, citation, etc.).
+* Record model experiments—scripts or Jupyter Notebooks, _documented_[^3] and maintained in GitHub for version control as different approaches are tried.
+* Document model experiments and evaluation—record results of various tests performed and overall evaluation and comparison of these runs in Model Card(s) with links back to GitHub repository.
+* Add all code used to generate figures to the project GitHub repository; including documentation for reproduction (e.g., package requirements, data info, instructions).
+* Review (and revise as necessary) the Author/Contributor list(s).
+
+### Preparing for Publication Phase
+
+* Project components should align with FAIR and Reproducibility principles:
+ * Completed and fully documented GitHub Repository for code (recall [Code checklist](Code-Checklist.md)).
+ * Completed and fully documented Hugging Face Dataset Repository for data products (recall [Data checklist](Data-Checklist.md)).
+ * If using an already published dataset, all requisite metadata and provenance information included (recall [Metadata checklist](Metadata-Checklist.md)). Specifically, ensure that all attribution requirements and/or expectations have been appropriately met.
+ * Completed and fully documented Hugging Face Model Repository for ML models (recall [Model checklist](Model-Checklist.md)).
+* Schedule Review by Senior Data Scientist of data, model, and code repositories 3 weeks prior to camera-ready deadline (approval required for DOI generation).
+* Review (and revise as necessary) the Author/Contributor list(s).
+
+[^1]: Here we use the term project at a smaller scale to mean any endeavor resulting in a digital product (dataset, ML model, code) and/or paper (e.g., for the purposes of this policy [SST](https://github.com/Imageomics/SST) is a _project_, while Butterflies is not).
+
+[^2]: Contributor lists should be started as early as possible and are subject to change as a project progresses; this is expected and the reason to review during each phase of development.
+
+[^3]: Notebooks allow for Markdown explanations and descriptions throughout your process and the demonstration of results without requiring others to run your code.
diff --git a/docs/wiki-guide/Digital-products-release-licensing-policy.md b/docs/wiki-guide/Digital-products-release-licensing-policy.md
index 15316f6..670237b 100644
--- a/docs/wiki-guide/Digital-products-release-licensing-policy.md
+++ b/docs/wiki-guide/Digital-products-release-licensing-policy.md
@@ -10,7 +10,7 @@ This means the following policy applies for digital products of the Imageomics I
2. Code is to be released under an [OSI-approved open source license](https://opensource.org/licenses/), or to the public domain (for example, by applying a [CC-Zero](https://creativecommons.org/publicdomain/zero/1.0/) waiver).
- - This should be in a well-documented GitHub repository that follows the format specified in the [Institute GitHub Repo Guide](GitHub-Repo-Guide.md).
+ - This should be in a well-documented GitHub repository that follows the format specified in the [Institute GitHub Repo Guide](GitHub-Repo-Guide.md).
- If associated with a publication, code should be versioned with a release linked to a DOI that can be referenced in the publication.
diff --git a/docs/wiki-guide/GitHub-Repo-Guide.md b/docs/wiki-guide/GitHub-Repo-Guide.md
index bc8b0a7..86f6b71 100644
--- a/docs/wiki-guide/GitHub-Repo-Guide.md
+++ b/docs/wiki-guide/GitHub-Repo-Guide.md
@@ -2,16 +2,14 @@
Just joining or starting a new project and need a repository to store your work? You've come to the right place! Below we have compiled guidance on conventions and best practices for maintaining a shared (or shareable) repository of your work.
-
## Setting up a New Organization Repository
!!! note "Note"
We recommend doing development in a public repo, or at least publishing the repo in which development was done at the time of publication/release. However, if you're looking to have a public-facing repo _and_ a private repo for development, please be sure to read our guidance on the [Two Repo Problem](Two-Repo-Problem.md) before proceeding.
-
-
## Standard Files
-For each repository, include the following files in the root directory as soon as possible; they can (and should) be instantiated when you create a new repository.
+
+For each repository, include the following files in the root directory as soon as possible; they can (and should) be instantiated when you create a new repository.
* [README.md](#readme)
* [LICENSE.md](#license)
@@ -21,24 +19,27 @@ For each repository, include the following files in the root directory as soon a
More [recommendations](#recommended-files) are discussed below.
### README
-The README.md file is what everyone will notice first when they open your repository on GitHub. When creating your repo be sure to include a brief description, as this will populate the `About` field in the top right of your repo, as well as start your README with some text.
+
+The README.md file is what everyone will notice first when they open your repository on GitHub. When creating your repo be sure to include a brief description, as this will populate the `About` field in the top right of your repo, as well as start your README with some text.
Once you've created your repo, populate your README (you can do this by clicking on the file "README.md", then clicking the pencil at the top left to edit). Editing your README in the browser allows you to preview the formatting of the file before committing changes. The content of your README may vary based on the purpose or goal of your repo, but there are key elements that should always be included.
-- Summary of the repo:
- - This could be a simple explanation of what the package or tool developed in your repo is intended to do,
- - Or an abstract describing your research.
-- Detailed documentation on how to access and use the project software (User Guide).
- - Including installation of [dependencies](Virtual-Environments.md).
- - If your tool requires input be in a particular format, this would be included in the README. It would also help to include an example file demonstrating the format.
-- Information about the sources you've used (links and what they were used for), such as:
- - Tools from other repos
- - Data for analysis
+* Summary of the repo:
+ * This could be a simple explanation of what the package or tool developed in your repo is intended to do,
+ * Or an abstract describing your research.
+* Detailed documentation on how to access and use the project software (User Guide).
+ * Including installation of [dependencies](Virtual-Environments.md).
+ * If your tool requires input be in a particular format, this would be included in the README. It would also help to include an example file demonstrating the format.
+* Information about the sources you've used (links and what they were used for), such as:
+ * Tools from other repos
+ * Data for analysis
For more inspiration on making an awesome README, check out [this list](https://github.com/matiassingers/awesome-readme).
### LICENSE
-#### 1. Select a license.
+
+#### 1. Select a license
+
Alongside the appropriate stakeholders, select a license that is [Open Source Initiative](https://opensource.org/licenses) (OSI) compliant.
!!! note "Remember"
@@ -46,36 +47,43 @@ Alongside the appropriate stakeholders, select a license that is [Open Source In
For more information on how to choose a license and why it matters, see [Choose A License](https://choosealicense.com) and [A Quick Guide to Software Licensing for the Scientist-Programmer](https://doi.org/10.1371/journal.pcbi.1002598) by A. Morin, et al.
-#### 2. Add LICENSE.md to the repository.
-Once a license has been chosen, add a LICENSE.md file to the root of the repository. An easy way to do this is using a GitHub-provided [license template](https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/adding-a-license-to-a-repository). Do not forget to update necessary fields in the template.
+#### 2. Add LICENSE.md to the repository
+
+Once a license has been chosen, add a LICENSE.md file to the root of the repository. An easy way to do this is using a GitHub-provided [license template](https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/adding-a-license-to-a-repository). Do not forget to update necessary fields in the template.
### GITIGNORE
-The `.gitignore` file is an important tool for maintaining a clean repository by ensuring that git will not track temp files of any and all your collaborators (no pesky `pycache` or `.DS_Store` files floating around).
-GitHub has premade `.gitignore` files which can be selected from a dropdown when creating a repo. They are available for review [here](https://github.com/github/gitignore) and are generally tailored to particular languages (eg., [R](https://github.com/github/gitignore/blob/main/R.gitignore) or [Python](https://github.com/github/gitignore/blob/main/Python.gitignore)), operating systems, etc. The initial choice can be updated as needed. In particular, we recommend selecting a template based on the primary language used for your work.
+The `.gitignore` file is an important tool for maintaining a clean repository by ensuring that git will not track temp files of any and all your collaborators (no pesky `pycache` or `.DS_Store` files floating around).
+
+GitHub has premade `.gitignore` files which can be selected from a dropdown when creating a repo. They are available for review at [github/gitignore](https://github.com/github/gitignore) and are generally tailored to particular languages (eg., [R](https://github.com/github/gitignore/blob/main/R.gitignore) or [Python](https://github.com/github/gitignore/blob/main/Python.gitignore)), operating systems, etc. The initial choice can be updated as needed. In particular, we recommend selecting a template based on the primary language used for your work.
+
+If you or anyone on your team uses a Mac (or if you intend to encourage outside collaboration on this repo), add
-If you or anyone on your team uses a Mac (or if you intend to encourage outside collaboration on this repo), add
```
# Mac system
.DS_Store
```
+
at the end of the `.gitignore` file.
### Software Requirements File
+
It is also advisable to include a machine-readable file with minimal software requirements for your project. For Python projects, this often takes the form of a `requirements.txt` file containing the packages and their versions that were used (eg., `pandas==2.0.1`). If you use `conda`, you may instead opt for an `environment.yml`. These are essential to ensuring the reproducibility and interoperability of your work (by yourself and others). Note that they should _**not**_ be listed in the README.
-For more information on managing these environments and generating such files programmatically, see the wiki entry [Virtual Environments](Virtual-Environments.md).
+For more information on managing these environments and generating such files programmatically, see the wiki entry [Virtual Environments](Virtual-Environments.md).
## Recommended Files
Though the following files are not included in every repository and do not have a simple selection process integrated into GitHub, they are extremely important (if not essential) to maintaining FAIR principles and reproducibility in projects, as well as ensuring proper attribution for your work.
### CONTRIBUTING
+
If you are looking to open your project to more public contributions, it is a good idea to include contributing guidelines. This could take the form of a "CONTRIBUTING.md" file or a subsection of your README.
Contributing guidelines are important to maintain consistency across the way people work on a project. It is important to establish conventions about the important things while avoiding excessive constraints and bureaucracy that would make contributing a pain. Important things include efficient and effective communication.
### CITATION
+
Make it easier for people to cite your project by including a [CITATION.cff file](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-citation-files); you can copy-paste the template below.
As with journal publications, we expect to be cited when someone uses our code. To facilitate proper attribution, GitHub will automatically read a [CITATION.cff file](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-citation-files) and display a link to "cite this repository". Providing this file is as simple as filling your information into one of their example files and uploading it to your repo. More examples and information about the Citation File Format can be found on the [citation-file-format repo](https://github.com/citation-file-format/citation-file-format), including helpful [related tools](https://github.com/citation-file-format/citation-file-format#tools-to-work-with-citationcff-files-wrench).
@@ -138,36 +146,38 @@ references:
### Formatting and Naming Conventions
-**Dates and Times**
+#### Dates and Times
For interoperability and to avoid ambiguity, [dates and times should be reported](https://dataoneorg.github.io/Education/bestpractices/describe-formats-for) in [ISO 8601 format](https://en.wikipedia.org/wiki/ISO_8601).
- - For dates, this means `YYYY-MM-DD` (for ISO 8601 compliance, the dashes are required).
- - For times, use `THHMMSS` in 24-hour format.
- - For example, the moment when there were 60 seconds left before New Year 2000 would be `1999-12-31T235900`.
-
-**Branches**
-
- - Primary branch: `main`
- - Other branches follow the pattern `category/reference/description`:
- - **category**: `feature`, `bugfix`, `experiment`
- - `feature` is for new functionality
- - `bugfix` is for fixing errors
- - `experiment` is for more open-ended work
- - the associated issue (if no issue, put `no-ref`), formatted as `issue-NN`
- - description: brief description, e.g., `solve-world-hunger`
- - Example: `git branch feature/issue-1/general-ai`
+* For dates, this means `YYYY-MM-DD` (for ISO 8601 compliance, the dashes are required).
+* For times, use `THHMMSS` in 24-hour format.
+* For example, the moment when there were 60 seconds left before New Year 2000 would be `1999-12-31T235900`.
+
+#### Branches
+
+* Primary branch: `main`
+* Other branches follow the pattern `category/reference/description`:
+ * **category**: `feature`, `bugfix`, `experiment`
+ * `feature` is for new functionality
+ * `bugfix` is for fixing errors
+ * `experiment` is for more open-ended work
+ * the associated issue (if no issue, put `no-ref`), formatted as `issue-NN`
+ * description: brief description, e.g., `solve-world-hunger`
+* Example: `git branch feature/issue-1/general-ai`
-**Commits**
+#### Commits
To combine human- and computer-readability into commit messages, follow the [Conventional Commits specification](https://www.conventionalcommits.org/en/v1.0.0/#summary).
### Workflow
-Do not conduct routine work in the `main` branch. Only do one thing on a branch at a time. Prune a branch once its purpose is fulfilled and it is merged (i.e., delete it).
+
+Do not conduct routine work in the `main` branch. Only do one thing on a branch at a time. Prune a branch once its purpose is fulfilled and it is merged (i.e., delete it).
For more information on creating, merging, and deleting branches, see the [GitHub Workflow Guide](The-GitHub-Workflow.md).
## General Repository Structure
+
In addition to the [standard files](#standard-files) recommended for every repo, you will likely have some code, notebooks, and data. For an easily accessible and readable repo, it is good to organize these files within a clear directory (folder) structure, such as
```
@@ -177,27 +187,28 @@ Project_Directory
- src
- data
```
-
+
!!! note "Note"
Depending on the size of your data, `data` may only be local on your machine in which case it is good to include instructions to access the data where appropriate.
***
-# Working on GitHub
-After the initial creation of a repo on the GitHub website, there are two primary modes of interacting with it.
+
+## Working on GitHub
+
+After the initial creation of a repo on the GitHub website, there are two primary modes of interacting with it.
1. Through git on the Command Line
This requires a `bash` or `zsh` shell on your computer. On Mac you can use terminal, while Windows requires installing git and a bash emulator.
-
+
2. Through the GitHub Desktop App, [GitHub Desktop](https://desktop.github.com/)
- GitHub provides documentation to get started on [Mac](https://docs.github.com/en/desktop/overview/getting-started-with-github-desktop?platform=mac) or [Windows](https://docs.github.com/en/desktop/overview/getting-started-with-github-desktop?platform=windows), as well as extensive documentation on use cases we discuss throughout the wiki [here](https://docs.github.com/en/desktop/contributing-and-collaborating-using-github-desktop).
-
+ GitHub provides documentation to get started on [Mac](https://docs.github.com/en/desktop/overview/getting-started-with-github-desktop?platform=mac) or [Windows](https://docs.github.com/en/desktop/overview/getting-started-with-github-desktop?platform=windows), as well as extensive documentation on use cases we discuss throughout the wiki [GitHub's Guide to contributing and collaborating using GitHub desktop](https://docs.github.com/en/desktop/contributing-and-collaborating-using-github-desktop).
+
!!! note "Note"
- The bulk of our step-by-step guides will outline interaction through the command line, but the same principles apply to using GitHub Desktop.
+ The bulk of our step-by-step guides will outline interaction through the command line, but the same principles apply to using GitHub Desktop.
-
-## Cloning a Repository
+### Cloning a Repository
Navigate to the main ("<> Code") page of your repository and click the green button at the top right corner (as shown below) and copy the link (for command line) or select "Open with GitHub Desktop". For command line interaction, navigate within the `bash` shell to the directory where you would like to place your local copy of the repo (`cd `), then clone the repo into that folder (`git clone `), this will generate a local copy of the repo on your computer.
@@ -207,7 +218,8 @@ Navigate to the main ("<> Code") page of your repository and click the green but
If you would like a specific branch, use `git clone -b `.
-## Workflow Summary
+### Workflow Summary
+
Generally, repositories are organized around an Imageomics Project/Topic/Team, eg., butterflies. These broader topics may contain various projects organized under a GitHub [Team](https://github.com/orgs/Imageomics/teams) focused on that topic. Both [projects](https://github.com/orgs/Imageomics/projects?query=is%3Aopen) and [repositories](https://github.com/orgs/Imageomics/repositories) may be linked to teams, providing an organizational structure upon which to plan and manage tasks while maintaining a clear link/connection to the work being done on those tasks. Note that a project may encapsulate multiple repositories just as a repository may be referenced by multiple projects.
-Ideally, each task will be linked to an issue in the relevant repository. Team members may then be assigned tasks, and asynchronous discussions about the task can be recorded on its issue page in the repository. To accomplish the task, a new branch should be created following the [branch naming conventions](#formatting-and-naming-conventions); do not work directly on the `main` branch. Once the task is completed, a pull request can be opened to merge the changes into the main branch (see the [GitHub Workflow Guide](The-GitHub-Workflow.md) and the [PR Guide](The-GitHub-Pull-Request-Guide.md) for more details on this process). Reviewers may be assigned to each pull request to ensure compatibility and that the proposed solution functions as expected/needed; this is an opportunity for more dialogue.
+Ideally, each task will be linked to an issue in the relevant repository. Team members may then be assigned tasks, and asynchronous discussions about the task can be recorded on its issue page in the repository. To accomplish the task, a new branch should be created following the [branch naming conventions](#formatting-and-naming-conventions); do not work directly on the `main` branch. Once the task is completed, a pull request can be opened to merge the changes into the main branch (see the [GitHub Workflow Guide](The-GitHub-Workflow.md) and the [PR Guide](The-GitHub-Pull-Request-Guide.md) for more details on this process). Reviewers may be assigned to each pull request to ensure compatibility and that the proposed solution functions as expected/needed; this is an opportunity for more dialogue.
diff --git a/docs/wiki-guide/Glossary-for-Imageomics.md b/docs/wiki-guide/Glossary-for-Imageomics.md
index 0e06b68..55957cc 100644
--- a/docs/wiki-guide/Glossary-for-Imageomics.md
+++ b/docs/wiki-guide/Glossary-for-Imageomics.md
@@ -1,25 +1,25 @@
# Imageomics Glossary
-This glossary is designed as a resource for members of the Imageomics Institute from various backgrounds to familiarize themselves with key terms and concepts encountered in our work.
+This glossary is designed as a resource for members of the Imageomics Institute from various backgrounds to familiarize themselves with key terms and concepts encountered in our work.
-It includes concepts in biology, ecology, genetics, machine learning and artificial intelligence, computer science, and software engineering.
+It includes concepts in biology, ecology, genetics, machine learning and artificial intelligence, computer science, and software engineering.
Definitions are not meant to be comprehensive. Ideally, they will be tailored to our institute's context.
It is meant to be a collaborative effort, so please [contribute](https://github.com/Imageomics/Imageomics-guide/issues) terms you would like defined, definitions you know, or corrections for errors you notice!
## A
-#### Application Programming Interface (API)
+### Application Programming Interface (API)
-#### Autoencoder
-
+### Autoencoder
## B
## C
-#### CARE Principles for Indigenous Data Governance
+### CARE Principles for Indigenous Data Governance
+
"People and purpose-oriented" to complement [FAIR Principles](#fair-data-principles).
**C**ollective Benefit
@@ -32,14 +32,15 @@ It is meant to be a collaborative effort, so please [contribute](https://github.
For more information, see [CARE Principles for Indigenous Data Governance](https://www.gida-global.org/care).
-#### Contrastive Language-Image Pre-training (CLIP)
+### Contrastive Language-Image Pre-training (CLIP)
## D
-#### Decoder
+### Decoder
-#### Dimensionality Reduction
-Used in machine learning and data analysis to refer to a set of methods used to reduce the number of variables or features under consideration to a smaller subset with the greatest explanatory power without drastically reducing the accuracy of the model or analysis. The purpose is to exclude irrelevant, redundant, and noisy information, thereby improving computational complexity and model interpretability.
+### Dimensionality Reduction
+
+Used in machine learning and data analysis to refer to a set of methods used to reduce the number of variables or features under consideration to a smaller subset with the greatest explanatory power without drastically reducing the accuracy of the model or analysis. The purpose is to exclude irrelevant, redundant, and noisy information, thereby improving computational complexity and model interpretability.
That is, it seeks to preserve the "most important" variables or features of the data based on some quantitative metric, such as variance, while removing "less important" variables or features. This is especially helpful when using high-dimensional data such as images or genomes.
@@ -48,23 +49,22 @@ Dimensionality reduction techniques can be subdivided into two main categories:
- [Feature Extraction](#feature-extraction)
- [Feature Selection](#feature-selection)
-#### Docker
-
+### Docker
## E
-#### Ecology
+### Ecology
-#### Epoch (in machine learning)
+### Epoch (in machine learning)
+### Encoder
-#### Encoder
+### Experiment (in machine learning)
+## F
-#### Experiment (in machine learning)
+### FAIR Data Principles
-## F
-#### FAIR Data Principles
**F**indable -- metadata and data easily found by both humans and machines
**A**ccessible -- clear indication of how to access data once it is found.
@@ -75,32 +75,37 @@ Dimensionality reduction techniques can be subdivided into two main categories:
For more information, see [FAIR principles](https://www.go-fair.org/fair-principles/).
-#### Feature
+### Feature
+
In machine learning and data science, a feature is a single measurable property or characteristic of the phenomenon under observation. With tabular data, a feature is a column in the dataset used by a model to make predictions. In genomics, a feature could be, for example, gene expression levels, the presence (or absence) of certain genetic variants (such as [SNPs](#single-nucleotide-polymorphism-snp), insertions and deletions (indels), and others), or epigenetic markers.
#### Feature Extraction
+
A set of [dimensionality reduction](#dimensionality-reduction) techniques used to map raw data to a smaller set of features. Example techniques include [PCA](#principal-component-analysis-pca), [MDS](#multidimensional-scaling-mds), [t-SNE](#t-distributed-stochastic-neighbor-embedding-t-sne), [autoencoders](#autoencoder), and Fourier or wavelet transforms.
The key difference from feature selection is that feature extraction generates a new set of features from the original dataset by projecting or mapping the data into a new feature space rather than selecting from existing features.
#### Feature Selection
+
A method to select a subset of relevant features for use in model construction.
-The key difference from feature extraction is that feature selection does not generate new features but rather identifies the most meaningful existing features in a dataset by excluding redundant or irrelevant features. For example, in genomics, feature selection would involve selecting the most important gene(s) relevant to a certain phenotype among thousands of genes.
+The key difference from feature extraction is that feature selection does not generate new features but rather identifies the most meaningful existing features in a dataset by excluding redundant or irrelevant features. For example, in genomics, feature selection would involve selecting the most important gene(s) relevant to a certain phenotype among thousands of genes.
#### Feature Space
-
## G
-#### Genome-Wide Association Study (GWAS)
+### Genome-Wide Association Study (GWAS)
## H
-#### Hyperparameter Tuning
+
+### Hyperparameter Tuning
+
The process of selecting the best hyperparameters for a machine learning model by minimizing the [loss function](#loss-function). This can be done through [experiments](#experiment-in-machine-learning) or in some cases, using optimization techniques. Hyperparameters are parameters that are set by the researcher before training and are not learned during the training process. Some examples of common hyperparameters are [learning rate](#learning-rate), number of [epochs](#epoch-in-machine-learning), number of clusters (k) in [k-means clustering](#k-means-clustering), and many others.
## I
-#### Imageomics
+
+### Imageomics
i-'mi-j**ə**-'**ō**-miks
@@ -109,27 +114,28 @@ A new scientific field in which computational (machine learning) tools built aro
## J
## K
-#### K-Means Clustering
+### K-Means Clustering
## L
-#### Latent Space
-
-#### Learning Rate
+### Latent Space
+### Learning Rate
-#### Loss Function
-
+### Loss Function
## M
-#### Multidimensional Scaling (MDS)
+
+### Multidimensional Scaling (MDS)
## N
-#### Nucleotide
-The fundamental building blocks of DNA and RNA. A nucleotide is composed of a base and a sugar-phosphate backbone.
-Bases for DNA: adenine (A), guanine (G), cytosine (C), and thymine (T).
+### Nucleotide
+
+The fundamental building blocks of DNA and RNA. A nucleotide is composed of a base and a sugar-phosphate backbone.
+
+Bases for DNA: adenine (A), guanine (G), cytosine (C), and thymine (T).
Bases for RNA: adenine (A), guanine (G), cytosine (C), and uracil (U).
@@ -142,55 +148,55 @@ The bases A, G, and C are the same molecule for DNA and RNA. T and U are incorpo
A DNA or RNA molecule consists of a chain of the four relevant nucleotides in a sequence, where the order of A, G, C, and T in the DNA sequence determines the "blueprint" for the organism, and the order and length of A, G, C, and U in an RNA sequence determines the purpose and function of the RNA molecule, which can be a messenger RNA (mRNA) that encodes a protein, a microRNA (miRNA) which are short RNAs that help regulate gene expression by binding to other mRNAs, and many others.
## O
-#### Ontology
+### Ontology
## P
-#### Phenotype
-
-
-#### Phylogeny
-
-#### Pre-training
+### Phenotype
+### Phylogeny
-#### Principal Component Analysis (PCA)
+### Pre-training
+### Principal Component Analysis (PCA)
## Q
## R
## S
-#### Single Nucleotide Polymorphism (SNP)
-A SNP (pronounced "snip") is a variation in the [nucleotide](#nucleotide) present at a single position in a DNA sequence among individuals in a species. For example, a SNP may be the replacement of a cytosine (C) by a thymine (T) at the same location in a stretch of DNA, where C is observed in a subset of individuals and T is observed in the others.
-#### Snakemake
+### Single Nucleotide Polymorphism (SNP)
+
+A SNP (pronounced "snip") is a variation in the [nucleotide](#nucleotide) present at a single position in a DNA sequence among individuals in a species. For example, a SNP may be the replacement of a cytosine (C) by a thymine (T) at the same location in a stretch of DNA, where C is observed in a subset of individuals and T is observed in the others.
+### Snakemake
-#### Subspecies
+### Subspecies
+### Supervised Learning
-#### Supervised Learning
As opposed to [unsupervised learning](#unsupervised-learning), supervised learning methods learn from labeled data. That is, it is trained using input data that is labeled with corresponding outputs, such as the input of an image and the output of a classification.
## T
-#### Taxonomy
-
-#### t-Distributed Stochastic Neighbor Embedding (t-SNE)
+### Taxonomy
+### t-Distributed Stochastic Neighbor Embedding (t-SNE)
-#### Trait
+### Trait
-#### Transfer Learning
+### Transfer Learning
## U
-#### Unsupervised Learning
+
+### Unsupervised Learning
+
As opposed to [supervised learning](#supervised-learning), unsupervised learning detects patterns or structures within the input data without any labels. Clustering and dimensionality reduction techniques are some examples.
## V
+
VLMs (Vision-Language Models)
## W
@@ -200,5 +206,5 @@ VLMs (Vision-Language Models)
## Y
## Z
-#### Zero-Shot Prediction
+### Zero-Shot Prediction
diff --git a/docs/wiki-guide/Guide-to-GitHub-Projects.md b/docs/wiki-guide/Guide-to-GitHub-Projects.md
index c86dc83..5f132aa 100644
--- a/docs/wiki-guide/Guide-to-GitHub-Projects.md
+++ b/docs/wiki-guide/Guide-to-GitHub-Projects.md
@@ -1,23 +1,25 @@
# Guide to GitHub Projects
-When starting a new project, it can be helpful to have a shared tracker or project board to keep track of who is responsible for which tasks, what has and has not yet been done, which tasks are necessary for various goals of the project, and so on. Note that many of these items are also helpful when working on a project by oneself. GitHub provides a very useful tool for just this purpose: [GitHub Projects](https://docs.github.com/en/issues/planning-and-tracking-with-projects/learning-about-projects/about-projects). GitHub projects can be linked with one or more GitHub repos to automatically keep track of issues and PRs associated with your project.
+When starting a new project, it can be helpful to have a shared tracker or project board to keep track of who is responsible for which tasks, what has and has not yet been done, which tasks are necessary for various goals of the project, and so on. Note that many of these items are also helpful when working on a project by oneself. GitHub provides a very useful tool for just this purpose: [GitHub Projects](https://docs.github.com/en/issues/planning-and-tracking-with-projects/learning-about-projects/about-projects). GitHub projects can be linked with one or more GitHub repos to automatically keep track of issues and PRs associated with your project.
-## Some advantages of working with GitHub Projects:
- - Different view options that sync automatically.
- - Easy to see who's doing what and keep track of progress.
+## Some advantages of working with GitHub Projects
+
+- Different view options that sync automatically.
+- Easy to see who's doing what and keep track of progress.
- Profile images show up for assignees to various tasks.
- Clicking on an assignees profile image will show only that person's assigned tasks (similarly for labels and milestones attached to tasks).
- - More columns/categories can be added for different aspects of the project.
- - Multiple repos can be linked to a single project.
+- More columns/categories can be added for different aspects of the project.
+- Multiple repos can be linked to a single project.
- Access to issues _is_ controlled by repo access permissions; only the existence of issues is universal to the project.
- - Closing an issue will automatically move the task to "Done".
- - Tasks can be reordered within their columns/categories to keep most pressing tasks at the top.
+- Closing an issue will automatically move the task to "Done".
+- Tasks can be reordered within their columns/categories to keep most pressing tasks at the top.
## Interacting with GitHub Projects
-To help you get started working with [GitHub Projects](https://docs.github.com/en/issues/planning-and-tracking-with-projects/learning-about-projects/about-projects), we have a [General Project Template](https://github.com/orgs/Imageomics/projects/31/views/1) with both a [Taskboard](https://github.com/orgs/Imageomics/projects/31/views/1) and [Table](https://github.com/orgs/Imageomics/projects/31/views/2) view initialized, along with label and milestone displays turned on.
-Both of these views will automatically stay updated so that each member of the project can utilize whichever version they find most informative.
-Issues can be added directly to the project board/table or on the repo. If added through the repo, they must be linked to the project and have status assigned. Milestones must be created on the repo (under the Issues tab, select "Milestones" to create one), similarly for labels.
+To help you get started working with [GitHub Projects](https://docs.github.com/en/issues/planning-and-tracking-with-projects/learning-about-projects/about-projects), we have a [General Project Template](https://github.com/orgs/Imageomics/projects/31/views/1) with both a [Taskboard](https://github.com/orgs/Imageomics/projects/31/views/1) and [Table](https://github.com/orgs/Imageomics/projects/31/views/2) view initialized, along with label and milestone displays turned on.
+Both of these views will automatically stay updated so that each member of the project can utilize whichever version they find most informative.
+
+Issues can be added directly to the project board/table or on the repo. If added through the repo, they must be linked to the project and have status assigned. Milestones must be created on the repo (under the Issues tab, select "Milestones" to create one), similarly for labels.
!!! note "Note"
Issues on a project board that are linked to a repository to which a user does not have access will not be visible to them, even if they have access to the project. They will show up (for that user) as unidentified issues with no status.
diff --git a/docs/wiki-guide/HF_DatasetCard_Template_ABC.md b/docs/wiki-guide/HF_DatasetCard_Template_ABC.md
deleted file mode 100644
index 3a4ce39..0000000
--- a/docs/wiki-guide/HF_DatasetCard_Template_ABC.md
+++ /dev/null
@@ -1,271 +0,0 @@
----
-license: cc0-1.0
-language:
-- en
-pretty_name:
-task_categories: # ex: image-classification, see key list at https://github.com/huggingface/huggingface.js/blob/main/packages/tasks/src/pipelines.ts
-tags:
-- biology
-- image
-- animals
-- CV
-size_categories: # ex: n<1K, 1K
-
-
-
-
-
-# Dataset Card for [dataset pretty_name]
-
-
-
-## Dataset Details
-
-### Dataset Description
-
-- **Curated by:** list curators (authors for _data_ citation, moved up)
-- **Language(s) (NLP):** [More Information Needed]
-
-- **Homepage:**
-- **Repository:** [related project repo]
-- **Paper:**
-
-
-
-[More Information Needed]
-
-
-
-
-### Supported Tasks and Leaderboards
-[More Information Needed]
-
-
-
-
-## Dataset Structure
-
-
-
-
-
-### Data Instances
-[More Information Needed]
-
-
-
-### Data Fields
-[More Information Needed]
-
-
-### Data Splits
-[More Information Needed]
-
-
-## Dataset Creation
-
-### Curation Rationale
-[More Information Needed]
-
-
-### Source Data
-
-
-
-#### Data Collection and Processing
-[More Information Needed]
-
-
-#### Who are the source data producers?
-[More Information Needed]
-
-
-
-### Annotations
-
-
-#### Annotation process
-[More Information Needed]
-
-
-#### Who are the annotators?
-[More Information Needed]
-
-
-### Personal and Sensitive Information
-[More Information Needed]
-
-
-
-## Considerations for Using the Data
-[More Information Needed]
-
-
-### Bias, Risks, and Limitations
-[More Information Needed]
-
-
-
-
-### Recommendations
-[More Information Needed]
-
-
-## Licensing Information
-[More Information Needed]
-
-
-
-## Citation
-[More Information Needed]
-
-**BibTeX:**
-
-
-
-
-
-## Acknowledgements
-
-This work was supported by the [AI and Biodiversity Change (ABC) Global Center](http://abcresearchcenter.org/), which is funded by the US National Science Foundation under [Award No. 2330423](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2330423&HistoricalAwards=false) and Natural Sciences and Engineering Research Council of Canada under [Award No. 585136](https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=782440). This dataset draws on research supported by the Social Sciences and Humanities Research Council. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, Natural Sciences and Engineering Research Council of Canada, or Social Sciences and Humanities Research Council.
-
-Ce travail a été soutenu par le centre de recherche [AI and Biodiversity Change (ABC)](http://abcresearchcenter.org/), financé conjointement par la National Science Foundation des États-Unis ([Financement #2330423](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2330423&HistoricalAwards=false)) et par le Conseil de recherches en sciences naturelles et en génie du Canada ([Financement #85136](https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=782440)). Ce jeu de données repose également en partie sur des travaux de recherche financés par le Conseil de recherches en sciences humaines du Canada. Les opinions, conclusions ou recommandations exprimées dans ce document sont celles de(s) auteur(s) et ne reflètent pas nécessairement celles de la National Science Foundation, du Conseil de recherches en sciences naturelles et en génie du Canada, ou du Conseil de recherches en sciences humaines du Canada.
-
-
-
-## Glossary
-
-
-
-## More Information
-
-
-
-## Dataset Card Authors
-
-[More Information Needed]
-
-## Dataset Card Contact
-
-[More Information Needed--optional]
-
diff --git a/docs/wiki-guide/HF_DatasetCard_Template_mkdocs.md b/docs/wiki-guide/HF_DatasetCard_Template_mkdocs.md
index 146d712..3c72672 100644
--- a/docs/wiki-guide/HF_DatasetCard_Template_mkdocs.md
+++ b/docs/wiki-guide/HF_DatasetCard_Template_mkdocs.md
@@ -1,8 +1,8 @@
# Dataset Card Template
-Below are the Dataset Card templates for Imageomics and ABC. You can download or copy the appropriate dataset card content and paste it into a new Markdown file to create a README for your dataset.
+Below is the Dataset Card template for Imageomics. You can download or copy the dataset card content and paste it into a new Markdown file to create a README for your dataset.
-
+
Imageomics
Download template from GitHub
@@ -10,12 +10,3 @@ Below are the Dataset Card templates for Imageomics and ABC. You can download or
{{ include_file_as_code("docs/wiki-guide/HF_DatasetCard_Template_Imageomics.md") }}
-
-
-ABC
-
-Download template from GitHub
-
-{{ include_file_as_code("docs/wiki-guide/HF_DatasetCard_Template_ABC.md") }}
-
-
\ No newline at end of file
diff --git a/docs/wiki-guide/HF_ModelCard_Template_ABC.md b/docs/wiki-guide/HF_ModelCard_Template_ABC.md
deleted file mode 100644
index 1ddf79d..0000000
--- a/docs/wiki-guide/HF_ModelCard_Template_ABC.md
+++ /dev/null
@@ -1,286 +0,0 @@
----
-license: # See note below on choosing a license.
-language:
-- en
-library_name: # Allows for Inference API widget on sidebar of model card
-tags:
-- biology
-- CV
-- images
-- animals
-datasets: # Adds link if on HF & shows up on sidebar. Ex: Imageomics/TreeOfLife-10M
-metrics: # key list: https://hf.co/metrics
----
-
-
-
-
-
-
-
-
-# Model Card for [Model Name]
-
-
-
-## Model Details
-
-### Model Description
-
-
-
-- **Developed by:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed -- choose a license (see above notes)]
-- **Fine-tuned from model:** [More Information Needed]
-
-### Model Sources
-
-
-
-- **Repository:** [Project Repo]
-- **Paper:** [More Information Needed--optional]
-- **Demo:** [More Information Needed--encouraged]
-
-## Uses
-
-
-
-### Direct Use
-
-
-
-[More Information Needed]
-
-### Downstream Use
-
-
-
-[More Information Needed]
-
-### Out-of-Scope Use
-
-
-
-[More Information Needed]
-
-## Bias, Risks, and Limitations
-
-
-
-[More Information Needed]
-
-### Recommendations
-
-
-
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-
-## How to Get Started with the Model
-
-Use the code below to get started with the model.
-
-
-
-[More Information Needed]
-
-## Training Details
-
-### Training Data
-
-
-
-[More Information Needed]
-
-### Training Procedure
-
-
-
-#### Preprocessing
-
-[More Information Needed--encouraged]
-
-
-#### Training Hyperparameters
-
-- **Training regime:** [More Information Needed]
-
-#### Speeds, Sizes, Times
-
-
-
-[More Information Needed]
-
-## Evaluation
-
-
-
-[More Information Needed]
-
-### Testing Data, Factors & Metrics
-
-#### Testing Data
-
-
-
-[More Information Needed]
-
-#### Factors
-
-
-
-[More Information Needed]
-
-#### Metrics
-
-
-
-[More Information Needed]
-
-### Results
-
-[More Information Needed]
-
-#### Summary
-
-[More Information Needed]
-
-## Model Examination
-
-
-
-[More Information Needed]
-
-## Environmental Impact
-
-
-
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://doi.org/10.48550/arXiv.1910.09700).
-
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-
-## Technical Specifications
-[More Information Needed--optional]
-
-### Model Architecture and Objective
-
-[More Information Needed]
-
-### Compute Infrastructure
-
-[More Information Needed]
-
-#### Hardware
-
-[More Information Needed: hardware requirements]
-
-#### Software
-
-[More Information Needed]
-
-## Citation
-
-
-
-**BibTeX:**
-
-[More Information Needed]
-
-
-
-## Acknowledgements
-
-This work was supported by the [AI and Biodiversity Change (ABC) Global Center](http://abcresearchcenter.org/), which is funded by the US National Science Foundation under [Award No. 2330423](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2330423&HistoricalAwards=false) and Natural Sciences and Engineering Research Council of Canada under [Award No. 585136](https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=782440). This model draws on research supported by the Social Sciences and Humanities Research Council. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation, Natural Sciences and Engineering Research Council of Canada, or Social Sciences and Humanities Research Council.
-
-Ce travail a été soutenu par le centre de recherche [AI and Biodiversity Change (ABC)](http://abcresearchcenter.org/), financé conjointement par la National Science Foundation des États-Unis ([Financement #2330423](https://www.nsf.gov/awardsearch/showAward?AWD_ID=2330423&HistoricalAwards=false)) et par le Conseil de recherches en sciences naturelles et en génie du Canada ([Financement #85136](https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=782440)). Ce jeu de données repose également en partie sur des travaux de recherche financés par le Conseil de recherches en sciences humaines du Canada. Les opinions, conclusions ou recommandations exprimées dans ce document sont celles de(s) auteur(s) et ne reflètent pas nécessairement celles de la National Science Foundation, du Conseil de recherches en sciences naturelles et en génie du Canada, ou du Conseil de recherches en sciences humaines du Canada.
-
-## Glossary
-
-
-
-## More Information
-
-
-
-## Model Card Authors
-
-[More Information Needed]
-
-## Model Card Contact
-
-[More Information Needed--optional]
-
\ No newline at end of file
diff --git a/docs/wiki-guide/HF_ModelCard_Template_mkdocs.md b/docs/wiki-guide/HF_ModelCard_Template_mkdocs.md
index c63cd11..66471ef 100644
--- a/docs/wiki-guide/HF_ModelCard_Template_mkdocs.md
+++ b/docs/wiki-guide/HF_ModelCard_Template_mkdocs.md
@@ -1,8 +1,8 @@
# Model Card Template
-Below are the Model Card templates for Imageomics and ABC. You can download or copy the appropriate model card content and paste it into a new Markdown file to create a README for your model repo.
+Below is the Model Card templates for Imageomics. You can download or copy the model card content and paste it into a new Markdown file to create a README for your model repo.
-
+
Imageomics
Download template from GitHub
@@ -11,12 +11,3 @@ Below are the Model Card templates for Imageomics and ABC. You can download or c
{{ include_file_as_code("docs/wiki-guide/HF_ModelCard_Template_Imageomics.md") }}
-
-
-ABC
-
-Download template from GitHub
-
-{{ include_file_as_code("docs/wiki-guide/HF_ModelCard_Template_ABC.md") }}
-
-
\ No newline at end of file
diff --git a/docs/wiki-guide/Handling-API-Keys.md b/docs/wiki-guide/Handling-API-Keys.md
index afc27d8..bee6a63 100644
--- a/docs/wiki-guide/Handling-API-Keys.md
+++ b/docs/wiki-guide/Handling-API-Keys.md
@@ -9,13 +9,17 @@ If you are using a web service with API keys, there are a few things to keep in
- Unique for different environments
## Key Storage
+
Our recommended way of storing and using API is within `.env` (dotenv) files.
A `.env` file is a simple text file that stores key-value pairs that set local environment variables. Its contents would look something like the following:
+
```
RESOURCE_API_KEY=your_api_key
```
+
For instance, if your API key for OpenAI is `sk-AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz`, you would put the following in your `.env` file.
+
```
OPENAI_API_KEY=sk-AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz
```
@@ -29,7 +33,9 @@ OPENAI_API_KEY=sk-AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz
The `.env` file is a simple text file, so you can use any text editor to create and edit it.
## Key Usage
+
If you are using Python, the `dotenv` package will enable to use this approach. First, install with [pip](https://pypi.org/project/python-dotenv/) or [conda](https://anaconda.org/conda-forge/python-dotenv). In your work, the following will get you access to your API key as a Python variable `RESOURCE_API_KEY` (you may name it whatever you like; the Python variable may be different from the environment variable):
+
```python { py linenums="1" }
import os
from dotenv import load_dotenv
@@ -40,7 +46,8 @@ RESOURCE_API_KEY = os.getenv("RESOURCE_API_KEY")
```
## Keys for a Shared Resource
+
If you are part of a group with access to the same API:
-- Create a unique API key for each application you use and for each environment you work in.
+- Create a unique API key for each application you use and for each environment you work in.
- Avoid sharing API keys with other users or between different applications/scripts.
diff --git a/docs/wiki-guide/Helpful-Tools-for-your-Workflow.md b/docs/wiki-guide/Helpful-Tools-for-your-Workflow.md
index c09a7a7..a4d6a72 100644
--- a/docs/wiki-guide/Helpful-Tools-for-your-Workflow.md
+++ b/docs/wiki-guide/Helpful-Tools-for-your-Workflow.md
@@ -11,26 +11,29 @@ This makes it easier to see the differences between versions as you work through
Notebooks can be [paired](https://github.com/mwouts/jupytext#paired-notebooks) individually, or you can set a [global config](https://jupytext.readthedocs.io/en/latest/config.html) in your notebooks folder to generate a pairing automatically. Unfortunately, this automated pairing only works if you use Jupyter Lab (i.e., run notebooks through the terminal), not if you work in VS Code or other IDEs. [Manual pairing](https://github.com/mwouts/jupytext/blob/main/docs/faq.md#can-i-use-jupytext-with-jupyterhub-binder-nteract-colab-saturn-or-azure) code is given below.
-#### Jupytext commands in terminal for VS Code:
+#### Jupytext commands in terminal for VS Code
+
```bash
jupytext --set-formats ipynb,py:percent .ipynb # Pair a notebook to a py script
jupytext --sync .ipynb # Sync the two representations
```
#### But wait! ...There's another way to automate it!
-There is a [jupytext pre-commit hook](https://jupytext.readthedocs.io/en/latest/using-pre-commit.html) that can be used to sync your paired files automatically when updating your GitHub repo. To learn more about pre-commit hooks in general, see the [git docs on pre-commit hooks](https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks).
-
+There is a [jupytext pre-commit hook](https://jupytext.readthedocs.io/en/latest/using-pre-commit.html) that can be used to sync your paired files automatically when updating your GitHub repo. To learn more about pre-commit hooks in general, see the [git docs on pre-commit hooks](https://git-scm.com/book/en/v2/Customizing-Git-Git-Hooks).
## Ruff
[Ruff](https://github.com/astral-sh/ruff) is a fast python formatter and linter. You can install it with `pip install ruff` or `conda install ruff` in your virtual/conda environment. They also have extensions for [VS Code](https://github.com/astral-sh/ruff-vscode) and [other editors supporting LSP](https://github.com/astral-sh/ruff-lsp).
To format a file, run:
+
```bash
ruff format
```
+
and to lint it run
+
```bash
ruff check
```
diff --git a/docs/wiki-guide/Hugging-Face-Repo-Guide.md b/docs/wiki-guide/Hugging-Face-Repo-Guide.md
index cf4c84a..126640b 100644
--- a/docs/wiki-guide/Hugging-Face-Repo-Guide.md
+++ b/docs/wiki-guide/Hugging-Face-Repo-Guide.md
@@ -2,11 +2,11 @@
Need a repository to store your data or model? You've come to the right place! Below we have compiled guidance on conventions and best practices for maintaining a shared (or shareable) Hugging Face repository of your work.
-
## Setting up a New Organization Repository
### Standard Files
-For each repository, include the following files in the root directory as soon as possible; a license can (and should) be instantiated when you create a new repository, and the standard `.gitattributes` will be generated for you. On the [Imageomics HF](https://huggingface.co/imageomics) select `New` and pick which type of repository you need.
+
+For each repository, include the following files in the root directory as soon as possible; a license can (and should) be instantiated when you create a new repository, and the standard `.gitattributes` will be generated for you. On the [Imageomics HF](https://huggingface.co/imageomics) select `New` and pick which type of repository you need.
- [README.md](#readme)
- [LICENSE.md](#license)
@@ -14,12 +14,15 @@ For each repository, include the following files in the root directory as soon a
- [.gitattributes](#gitattributes)
#### README
-The README.md file is generally referred to as either a Dataset or Model Card and is what everyone will notice first when they open your repository on Hugging Face. Choose the appropriate Imageomics-specific HF template ([model](HF_ModelCard_Template_mkdocs.md) or [dataset](HF_DatasetCard_Template_mkdocs.md)) to get started. Be sure to include a brief description and as much information as possible at the beginning. You can update this file as you go, so don't remove the recommended sections prior to completion. The templates include descriptions of many fields, Imageomics grant information, citation formatting, and some notes on HF-flavored markdown to get you started.
-Once you've created your repo, populate your README (you can do this online by selecting "Create Dataset/Model Card" and pasting in the appropriate Imageomics HF template, then filling in your info). Editing your README in the browser allows you to preview the formatting of the file before committing changes.
+The README.md file is generally referred to as either a Dataset or Model Card and is what everyone will notice first when they open your repository on Hugging Face. Choose the appropriate Imageomics-specific HF template ([model](HF_ModelCard_Template_mkdocs.md) or [dataset](HF_DatasetCard_Template_mkdocs.md)) to get started. Be sure to include a brief description and as much information as possible at the beginning. You can update this file as you go, so don't remove the recommended sections prior to completion. The templates include descriptions of many fields, Imageomics grant information, citation formatting, and some notes on HF-flavored markdown to get you started.
+
+Once you've created your repo, populate your README (you can do this online by selecting "Create Dataset/Model Card" and pasting in the appropriate Imageomics HF template, then filling in your info). Editing your README in the browser allows you to preview the formatting of the file before committing changes.
#### LICENSE
-##### 1. Select a license.
+
+##### 1. Select a license
+
Alongside the appropriate stakeholders, select a license that is [Open Source Initiative](https://opensource.org/licenses) (OSI) compliant.
!!! note "Remember"
@@ -27,33 +30,43 @@ Alongside the appropriate stakeholders, select a license that is [Open Source In
For more information on how to choose a license and why it matters, see [Choose A License](https://choosealicense.com) and [A Quick Guide to Software Licensing for the Scientist-Programmer](https://doi.org/10.1371/journal.pcbi.1002598) by A. Morin, et al.
-##### 2. Add LICENSE.md to the repository.
-Once a license has been chosen (if not initialized with one), add the appropriate license label in the `yaml` portion of the README (the web UI generates a dropdown of recommendations under "Edit dataset/model card").
+##### 2. Add LICENSE.md to the repository
+
+Once a license has been chosen (if not initialized with one), add the appropriate license label in the `yaml` portion of the README (the web UI generates a dropdown of recommendations under "Edit dataset/model card").
#### gitignore
-As with GitHub, the `.gitignore` file is an important tool for maintaining a clean repository by ensuring that git will not track temp files of any and all your collaborators (no pesky `pycache` or `.DS_Store` files floating around).
-The same [options for GitHub](https://github.com/github/gitignore) are usable here, and if you or anyone on your team uses a Mac (or if you intend to encourage outside collaboration on this repo), add
+As with GitHub, the `.gitignore` file is an important tool for maintaining a clean repository by ensuring that git will not track temp files of any and all your collaborators (no pesky `pycache` or `.DS_Store` files floating around).
+
+The same [options for GitHub](https://github.com/github/gitignore) are usable here, and if you or anyone on your team uses a Mac (or if you intend to encourage outside collaboration on this repo), add
+
```
# Mac system
.DS_Store
```
+
at the end of the `.gitignore` file.
#### gitattributes
+
The `.gitattributes` file determines file patterns to be tracked by [`git LFS`](https://git-lfs.com/) (Git Large File Storage). The preset `gitattributes` file includes many binary file types, but you may need to add particular files if they get too large (eg., a large CSV, but do **NOT** store all CSV files with `git LFS`, just add the particular one or pattern). Pattern-matching can be done using `*`. You can either add the file (and appropriate pattern description) to the `.gitattributes` file, or add it in the command line:
+
```
git lfs track "my-big-list.csv"
```
+
Then add and commit the `.gitattributes` file as described below.
## Hugging Face Pull Requests With Local Edits
-Hugging Face also has a pull request (PR) feature, though the process is a bit different from GitHub.
+
+Hugging Face also has a pull request (PR) feature, though the process is a bit different from GitHub.
As with GitHub, you can interact through the web browser or a command line interface (eg., terminal on Mac). However, instead of the `create new branch` option, there is a `create new pull request` option. It is still preferable to avoid committing everything directly to main. To make further changes to the particular PR created on the browser, one must first clone the repo:
+
```
git clone
```
+
Then, navigate to that folder `cd `, and fetch the PR files:
```
@@ -62,6 +75,7 @@ git checkout pr/
```
You can then make your updates, add and commit them, then push those back to the remote. Note that the push is the one line that differs from GitHub and must be used each time:
+
```
git add
git commit -m ""
@@ -71,4 +85,5 @@ git push origin pr/:refs/pr/
For more information on Hugging Face Pull Requests and Discussions, see their [documentation](https://huggingface.co/docs/hub/repositories-pull-requests-discussions).
## Templates for Model and Dataset Cards
+
See [About Templates](About-Templates.md) for guidelines on using templates for these important pieces of documentation.
diff --git a/docs/wiki-guide/Technical-Infrastructure.md b/docs/wiki-guide/Technical-Infrastructure.md
index fe0ed53..82e49fd 100644
--- a/docs/wiki-guide/Technical-Infrastructure.md
+++ b/docs/wiki-guide/Technical-Infrastructure.md
@@ -1,6 +1,7 @@
# Compute Infrastructure We Use
## Overview
+
Overall [Infrastructure Chart](https://docs.google.com/spreadsheets/d/1JSOi5pp2Y8Utj_npzKcYmvAxGncgmvP2gait5H0oYKk/edit?usp=sharing) with system specifications and notes.
- [The Ohio Supercomputing Center (OSC)](https://www.osc.edu/): Large compute resource accessible through `ssh` or OnDemand (web) platform.
@@ -15,7 +16,7 @@ Overall [Infrastructure Chart](https://docs.google.com/spreadsheets/d/1JSOi5pp2Y
- Used sparingly for urgent deadlines when other compute is not available (generally hasn't been available at those times either, though) or to host projects that cannot be hosted effectively through a Hugging Face Space.
- [AWS usage guidelines (_internal_)](https://github.com/Imageomics/internal-guidelines/wiki/AWS-@-Imagomics)
-# Other Compute Resources We've Used or Considered
+## Other Compute Resources We've Used or Considered
- OpenAI Researcher Access Program ([_internal_ info](https://github.com/Imageomics/internal-guidelines/wiki/OpenAI-Researcher-Access-Program))
- NAIRR Pilot Program
diff --git a/docs/wiki-guide/The-GitHub-Pull-Request-Guide.md b/docs/wiki-guide/The-GitHub-Pull-Request-Guide.md
index 6ece8d7..f5e069c 100644
--- a/docs/wiki-guide/The-GitHub-Pull-Request-Guide.md
+++ b/docs/wiki-guide/The-GitHub-Pull-Request-Guide.md
@@ -6,19 +6,22 @@ This guide is divided into three essential sections to help you effectively mana
- [Review a Pull Request](#2-review-a-pull-request): Learn the best practices for providing constructive feedback, identifying potential issues, and ensuring code quality during the review process.
- [Respond to a Pull Request Review](#3-respond-to-a-pull-request-review): Understand how to address reviewer feedback, make necessary changes, and ensure your pull request meets the required standards for approval.
-By following these steps, you will contribute to a smooth and efficient workflow, ensuring collaboration and quality in your project.
-
+By following these steps, you will contribute to a smooth and efficient workflow, ensuring collaboration and quality in your project.
## 1. Create a Pull Request
-Before creating a pull request, first, please follow [the GitHub Workflow](The-GitHub-Workflow.md) to create and push your branch.
+
+Before creating a pull request, first, please follow [the GitHub Workflow](The-GitHub-Workflow.md) to create and push your branch.
### 1.1 Navigate to the Repository's Main Page
+
On GitHub, go to the main page of the repository where you’ve pushed your branch.
### 1.2 Select Your Branch
+
From the "Branch" menu, choose the branch that contains your changes (the one you just pushed).
### 1.3 Click 'Compare & pull request'
+
You’ll see a button labeled Compare & pull request. Click this to begin the process of creating a pull request for your changes.
{ loading=lazy, width="800" }
@@ -26,6 +29,7 @@ You’ll see a button labeled Compare & pull request. Click this to begin the pr
///
### 1.4 Add Title and Description
+
In the pull request form, type a descriptive title for your PR. Provide a detailed description of the changes you've made, why they are important, and any other relevant information.
{ loading=lazy, width="800" }
@@ -49,42 +53,46 @@ In the pull request form, type a descriptive title for your PR. Provide a detail
### 2.2 Select a Pull Request
-In the list of pull requests, click the pull request that you'd like to review.
+In the list of pull requests, click the pull request that you'd like to review.
{ loading=lazy, width="800" }
/// caption
///
### 2.3 Review Changes
+
In the pull request page, click **Files changed** so as to see the changes.
{ loading=lazy, width="600" }
/// caption
///
-2.3.1 by clicking { loading=lazy, width="20"}, you can choose the unified or split view.
+2.3.1 by clicking { loading=lazy, width="20"}, you can choose the unified or split view.
{ loading=lazy, width="600" }
/// caption
///
### 2.4 Add Comments or Suggestions
+
When hovering over the lines of code, you can click the blue comment icon to add your review comments.
{ loading=lazy, width="800" }
/// caption
///
-2.4.1 If you'd like to add a comment on multiple lines, please click the line number of the first line you want to add comments and drag down to select a range of lines.
+2.4.1 If you'd like to add a comment on multiple lines, please click the line number of the first line you want to add comments and drag down to select a range of lines.
### 2.5 Suggest Changes
-If you'd like to suggest a specific change to the lines, click { loading=lazy, width="20"}, and then edit the text within the suggestion block.
+
+If you'd like to suggest a specific change to the lines, click { loading=lazy, width="20"}, and then edit the text within the suggestion block.
{ loading=lazy, width="600" }
/// caption
///
### 2.6 Comment on a File
+
If you'd like to comment on a file, click { loading=lazy, width="20"} at the right top of the file, then add your comments.
{ loading=lazy, width="500" }
@@ -92,13 +100,15 @@ If you'd like to comment on a file, click { loading=lazy, width="600" }
/// caption
///
### 2.8 Start or Add to a Review
+
When you're done, click Start a review. If you have already started a review, please click Add review comment.
!!! note "Notice"
All line comments are pending and only visible to you. You can edit the comments when needed. If you'd like to abandon your review, please go to in **Review changes** and click **Abandon review**
@@ -122,11 +132,13 @@ Click Review changes, and then type comments to summarize your proposed changes.
- Select Request changes: Provide feedback indicating that revisions are needed before the changes can be approved.
### 2.11 Click Submit review
+
Current review round is done; this publishes your comments and suggestions. Then the PR can either be merged or updated (depending on approval or comments). We generally expect that whoever submits the PR will merge once all feedback has been incorporated or otherwise addressed.
## 3. Respond to a Pull Request Review
### 3.1 Navigate to the Repository's Main Page
+
Navigate to your repository name, click **Pull requests**
{ loading=lazy, width="600" }
@@ -138,6 +150,7 @@ Navigate to your repository name, click **Pull requests**
After receiving feedback on your pull request, you can apply the changes in one of two ways: either by committing each change individually or by grouping several changes into a single commit. The method you choose depends on whether you prefer fine-grained control over the commit history or a more streamlined approach.
#### 3.2.1 Apply a change in its own commit
+
If you agree with at suggested change, qpply it by creating a separate commit for it. This approach helps keep your commit history clear and each change traceable.
{ loading=lazy, width="600" }
@@ -145,6 +158,7 @@ If you agree with at suggested change, qpply it by creating a separate commit fo
///
#### 3.2.2 Add multiple suggestions to a batch of changes
+
If you plan to include multiple changes in one commit, you can add suggestions to a batch. Once you've collected all the desired suggestions, click "Commit suggestions" to apply them in one go.
{ loading=lazy, width="600" }
@@ -152,13 +166,17 @@ If you plan to include multiple changes in one commit, you can add suggestions t
///
### 3.3 Add Commit Message
+
In the commit message field, enter a brief, descriptive message that clearly explains the changes made to the file(s).
### 3.4 Click Commit changes
+
After entering your commit message, click the "Commit changes" button to finalize and save your modifications to the repository. This step ensures that your changes are recorded and can be reviewed or merged into the main codebase.
### 3.5 Re-requesting a Review
+
If you’ve addressed all the requested changes and your pull request requires further review, re-request a review by notifying the reviewers. This action prompts them to evaluate your updated code and provide feedback or approval.
### 3.6 Out-of-scope Suggestion
+
If the suggested change falls outside the scope of your pull request, create a new issue to address the feedback separately. Issues can be created directly from a PR comment.
diff --git a/docs/wiki-guide/The-GitHub-Workflow.md b/docs/wiki-guide/The-GitHub-Workflow.md
index 1b55f12..8748ebe 100644
--- a/docs/wiki-guide/The-GitHub-Workflow.md
+++ b/docs/wiki-guide/The-GitHub-Workflow.md
@@ -2,7 +2,7 @@
Thank you for contributing!
-This document outlines guidelines for collaboratively contributing to a repository (repo).
+This document outlines guidelines for collaboratively contributing to a repository (repo).
This workflow is ideal for when:
- You are a member of the Imageomics Institute and have write access to the repository you're contributing to.
@@ -14,13 +14,16 @@ It follows a branch and pull request (PR) based workflow, which provides a contr
Importantly, this workflow suggests that **_contributions are created through PRs_** rather than directly committing to or merging into the `main` branch.
## Contribute as an Imageomics member with write access
-### 1. Clone the repo to your machine.
+
+### 1. Clone the repo to your machine
+
```sh
git clone https://github.com/Imageomics/.git
cd
```
-### 2. Create a new branch.
+### 2. Create a new branch
+
For example, if you want to add a feature to your code that simulates human vision, you could name the branch `feature/simulate-vision`.
!!! tip "Pro tip"
@@ -30,15 +33,19 @@ For example, if you want to add a feature to your code that simulates human visi
git branch feature/simulate-vision
git checkout feature/simulate-vision
```
+
or to create and switch to the new branch with a single command:
+
```sh
git checkout -b feature/simulate-vision
```
-### 3. Make your desired changes.
+### 3. Make your desired changes
+
For example, imagine you created three new files, each simulating a component of the human visual system: `retina.py`, `occipital.py`, and `visual_cortex.py`.
-### 4. Stage and commit changes to the new branch.
+### 4. Stage and commit changes to the new branch
+
Commit frequently with each commit based on a logical self-contained change using descriptive commit messages.
!!! tip "Pro tip"
@@ -50,25 +57,31 @@ git add retina.py occipital.py visual_cortex.py
git commit -m "Implement the retina, occipital, and visual cortex components of the human visual system."
```
-### 5. Update your local `main` branch.
+### 5. Update your local `main` branch
+
Ensure your local `main` branch is up-to-date with the remote to incorporate any changes other collaborators may have made.
!!! tip "Pro tip"
If you're unsure what branch you should have checked out, remember that the branch being merged to or committed to should be the branch that is active. Check with `git branch` and look for `*` next to what's active.
+
```sh
git checkout main
git pull origin main
```
-### 6. Merge changes made to `main` to your new branch.
+### 6. Merge changes made to `main` to your new branch
+
If updates were pulled into your local `main` branch, merge them into your new branch.
+
```sh
git checkout feature/simulate-vision
git merge main
```
-### 7. Push your new branch to the remote.
+### 7. Push your new branch to the remote
+
This should contain any updates made by others as well as your new changes. The first time this is done for a branch, you will need to map the branch on your local 'downstream' repo to the corresponding branch on the remote 'upstream' repo. Following this, simply push.
+
```sh
git push --set-upstream origin HEAD # to auto-match upstream branch name to your current branch name
# or
@@ -77,39 +90,45 @@ git push --set-upstream origin feature/simulate-vision # to specify the upstream
git push # subsequent pushes for this branch once the remote tracking branch is set
```
-### 8. Make changes, commit, and push with this branch as needed.
+### 8. Make changes, commit, and push with this branch as needed
+
Repeat steps 3-7 until results are in a state suitable to merge with the project's `main` branch.
-### 9. Open a Pull Request.
-On the GitHub repo page, click the `Pull requests` tab, click the `New pull requests` button, select the new branch you pushed as the head branch and keep the base branch as `main` (where you want to merge your changes into). Click `Create pull request`.
+### 9. Open a Pull Request
+
+On the GitHub repo page, click the `Pull requests` tab, click the `New pull requests` button, select the new branch you pushed as the head branch and keep the base branch as `main` (where you want to merge your changes into). Click `Create pull request`.
-You can also set the PR to draft status for visibility and discussion of ongoing work.
+You can also set the PR to draft status for visibility and discussion of ongoing work.
If you like doing everything from the command line, you can consider using the [GitHub CLI](https://cli.github.com/) for this step.
!!! tip "Pro tip"
Keep PRs small and manageable for review; the scope should be focused on the task, feature, or bug fix associated with the branch.
-### 10. Verify the repositories and branches in the PR.
-**Base Repository:** The original repo you are contributing into.
+### 10. Verify the repositories and branches in the PR
-**Head Repository:** The repo you are contributing from, which is the same as the base repo unless you are working from a fork.
+**Base Repository:** The original repo you are contributing into.
-**Base Branch:** `main` (or the branch you want to merge your changes into)
+**Head Repository:** The repo you are contributing from, which is the same as the base repo unless you are working from a fork.
+
+**Base Branch:** `main` (or the branch you want to merge your changes into)
**Compare Branch:** Your new branch with changes.
-### 11. Title and describe the PR.
+### 11. Title and describe the PR
+
Create a brief title describing the primary issue addressed in the PR.
In the PR description, give a consolidated overview of the motivation for the change(s) and description of choices made. It should briefly summarize the holistic effect resulting from the component commits.
Assign appropriate reviewer(s) and/or link the PR to a project.
-### 12. Submit the PR.
+### 12. Submit the PR
+
Click `Create pull request` to submit.
For more details and guidance on the GitHub pull request process, please see our [GitHub Pull Request Guide](The-GitHub-Pull-Request-Guide.md).
-### 13. Clean up branches.
+### 13. Clean up branches
+
After a branch is merged and a PR is closed, delete the branch from the remote and your local repository to keep things tidy.
!!! tip "Pro tip"
@@ -121,7 +140,8 @@ After a branch is merged and a PR is closed, delete the branch from the remote a
git fetch --prune # optionally, this removes any references to deleted remote branches
```
-### 14. Update your local main branch before starting new work.
+### 14. Update your local main branch before starting new work
+
```sh
git pull
```
diff --git a/docs/wiki-guide/The-Hugging-Face-Dataset-Upload-Guide.md b/docs/wiki-guide/The-Hugging-Face-Dataset-Upload-Guide.md
index fea4a61..a16b312 100644
--- a/docs/wiki-guide/The-Hugging-Face-Dataset-Upload-Guide.md
+++ b/docs/wiki-guide/The-Hugging-Face-Dataset-Upload-Guide.md
@@ -1,6 +1,7 @@
# Hugging Face Dataset Guide
## Create a New Dataset Repository
+
When creating a new dataset repository, you can make the dataset **Public** (accessible to anyone on the internet) or **Private** (accessible only to members of the organization).
{ loading=lazy, width=800 }
@@ -8,11 +9,13 @@ When creating a new dataset repository, you can make the dataset **Public** (acc
///
## Upload a Dataset with the Web Interface
+
In the Files and versions tab of the Dataset card, you can choose to add file in the hugging web interface.
{ loading=lazy }
## Upload a Dataset with HfApi
+
``` py linenums="1"
from huggingface_hub import login
@@ -31,8 +34,11 @@ api.upload_file (
```
## Upload a Dataset with Git
+
### If the Dataset is Less Than 5GB
+
Navigate to the folder for the repository:
+
```
# Clone the repository
git clone https://huggingface.co/datasets/username/repo-name
@@ -43,27 +49,34 @@ git commit -m 'comments'
git push
```
+
### If the Dataset is Larger Than 5GB
+
#### Install Git LFS
-Follow instructions at https://git-lfs.com/
+
+Follow instructions at
#### Install the Hugging Face CLI
+
```
brew install huggingface-cli
pip install -U "huggingface_hub[cli]"
```
#### Enable the repository to upload large files
+
```
huggingface-cli lfs-enable-largefiles
```
#### Initialize Git LFS
+
```
git lfs install
```
#### Track large files (e.g., .csv files)
+
```
# Adds a line to .gitattributes, which Git uses to determine files managed by LFS
git lfs track "*.csv"
@@ -72,6 +85,7 @@ git commit -m "Track large files with Git LFS"
```
#### Add, commit, and push the files
+
```
git add
git commit -m 'comments'
diff --git a/docs/wiki-guide/The-Hugging-Face-Workflow.md b/docs/wiki-guide/The-Hugging-Face-Workflow.md
index 222d2d9..8419117 100644
--- a/docs/wiki-guide/The-Hugging-Face-Workflow.md
+++ b/docs/wiki-guide/The-Hugging-Face-Workflow.md
@@ -1,12 +1,15 @@
# Hugging Face Workflow Guide
## Hugging Face Pull Requests With Local Edits
-Hugging Face also has a pull request (PR) feature, though the process is a bit different from GitHub.
+
+Hugging Face also has a pull request (PR) feature, though the process is a bit different from GitHub.
As with GitHub, you can interact through the web browser or a command line interface (eg., terminal on Mac). However, instead of the `create new branch` option, there is a `create new pull request` option. It is still preferable to avoid committing everything directly to main. To make further changes to the particular PR created on the browser, one must first clone the repo:
+
```
git clone
```
+
Then, navigate to that folder `cd `, and fetch the PR files:
```
@@ -15,6 +18,7 @@ git checkout pr/
```
You can then make your updates, add and commit them, then push those back to the remote. Note that the push is the one line that differs from GitHub and must be used each time:
+
```
git add
git commit -m "
- Enter the public repo name
- Click the checkbox for `Add a README file`
@@ -45,6 +49,7 @@ After this step you should see a repo with commits similar to the following:
///
#### 2. Update Main Branch of Public Repo
+
Make changes to the [README](GitHub-Repo-Guide.md#readme) and [`.gitignore`](GitHub-Repo-Guide.md#gitignore) in the public repo such that no further changes will be needed until the private repo is merged.
After this step you should see a repo with at least 2 commits similar to the following:
@@ -54,23 +59,24 @@ After this step you should see a repo with at least 2 commits similar to the fol
///
#### 3. Add Branch Protections to Public Repo
-Once your repository is set up, only changes to the `ghpages` branch are recommended; establish branch protections on both `main` and `ghpages` that require review and approval (see [When to think about branch protections](When-to-think-about-branch-protections.md) for more information).
+
+Once your repository is set up, only changes to the `gh-pages` branch are recommended; establish branch protections on both `main` and `gh-pages` that require review and approval (see [When to think about branch protections](When-to-think-about-branch-protections.md) for more information).
There are two issues at play here:
1. There is potential to introduce merge conflicts when bringing in the development repo to merge with the `main` branch if it has been changed. Hence, it is important that you avoid making changes to the `main` branch after spin-off.
-2. The `ghpages` branch will generate the website for the publication. Hence, it is a "published" branch, requiring regular checks with protections like the `main` branch.
-
+2. The `gh-pages` branch will generate the website for the publication. Hence, it is a "published" branch, requiring regular checks with protections like the `main` branch.
#### 4. Create Private Repo
-First create a private repo __without__ commits.
-Visit https://github.com/organizations/Imageomics/repositories/new
+First create a private repo **without** commits.
+
+Visit
- Enter the private repo name (ex: `-dev`)
-- __DO NOT__ check `Add a README file`
-- __DO NOT__ Choose a license
-- __DO NOT__ select a .gitignore template
+- **DO NOT** check `Add a README file`
+- **DO NOT** Choose a license
+- **DO NOT** select a .gitignore template
- Click `Create repository`
After this step you should see a repo without any commits with a box similar to the following:
@@ -80,32 +86,39 @@ After this step you should see a repo without any commits with a box similar to
///
#### 5. Push initial changes from public to private
+
In the following example we will clone the private repo: `johnbradley/research-project-x-private`.
And pull commits from the public repo: `johnbradley/research-project-x`.
##### 5a. Clone Private Repo
+
```console
git clone git@github.com:johnbradley/research-project-x-private.git
```
Output will have a warning similar to the following:
+
```
Cloning into 'research-project-x-private'...
warning: You appear to have cloned an empty repository.
```
##### 5b. Pull Commits to Private Repo
+
Switch to the private repo directory.
+
```console
cd research-project-x-private
```
Add a new remote repo named `upstream` that points to the public GitHub repo.
+
```console
git remote add upstream git@github.com:johnbradley/research-project-x.git
```
Pull commits from the public repo.
+
```console
git pull upstream main
```
@@ -113,11 +126,12 @@ git pull upstream main
!!! note "Note"
Running `git remote -v` will confirm where a standard git push (or git pull) will send (or receive) commits from.
-
##### 5c. Push Commits to Private Repo on GitHub
+
```
git push
```
+
After the above command you should be able to see commits in the private repo similar to the following:
{ loading=lazy, width=600 }
@@ -127,19 +141,23 @@ After the above command you should be able to see commits in the private repo si
Now you're ready to work on development in the private repo following the standard [GitHub Workflow](The-GitHub-Workflow.md) with the private repo as your remote.
### Merge Private to Public
+
Once your changes are done on the private repo (i.e., when you're ready to make your project public) you can push the changes to the public repo.
For this example the public repo will be at `johnbradley/research-project-x` and the private will be at `johnbradley/research-project-x-private`.
A branch named `v1` will be created on the public repo with changes from the private repo.
#### Create a branch on Public with Private commits
+
Clone the public repo, cd into the directory.
+
```console
git clone git@github.com:johnbradley/research-project-x.git
cd research-project-x
```
Ensure we are on the main branch and up to date with GitHub:
+
```console
git checkout main
git pull
@@ -147,41 +165,47 @@ git pull
Create a branch named `v1`. Checkout the branch.
This branch will hold the private repo changes.
+
```console
git branch v1
git checkout v1
```
Add an upstream remote pointing at the private repo.
+
```console
git remote add upstream git@github.com:johnbradley/research-project-x-private.git
```
Pull main branch changes from private repo into `v1` branch.
+
```console
git pull upstream main
```
At this point you could rebase the commits to reduce them to meaningful commits. However, keep in mind that this would result in different commit histories on the public and private repos after pushing `v1`, which may impact the ability to use this strategy for a `v2`. It would be preferable to use this strategy in [pull requests (PRs)](The-GitHub-Workflow.md#9-open-a-pull-request) during development.
-
Push `v1` branch to the public repo.
+
```console
git push --set-upstream origin v1
```
#### Next Steps
+
At this point the main branch of the public repo should match the main branch of the private repo.
Additional changes should be made only to the private repo, preferably using a branch.
See [Github-Workflow](The-GitHub-Workflow.md) for more details.
When you are ready to release a new version of the code in the private repo follow the [Merge Private to Public instructions](#merge-private-to-public) again using a new version branch name (eg. `v2`).
-
+***
## _What if I already have mismatched repos?_
+
If you find yourself with two repositories that have misaligned histories, please read the following and reach out to the Imageomics Informatics Team so we can help.
### Resolving Mismatched Public/Private Repos
+
If you already have a public and private repo with unrelated histories resolving this can be challenging.
Three approaches to resolve merging disparate public/private repos are documented here.
@@ -191,6 +215,7 @@ Three approaches to resolve merging disparate public/private repos are documente
- Cherry Pick - use when the same commits exist in both repos with different hashes.
### Merge
+
Merge commits from the `main` branch of the private repo into the `main` branch of the public repo.
!!! warning "Warning"
@@ -200,11 +225,13 @@ Merge the main branch of the private repo with the main branch of the public rep
As far as maintaining history this is the safest approach. Often this approach results in merge conflicts.
Merging conflicts can take time to manually resolve and is challenging to learn.
The allow unrelated histories flag is necessary for this approach:
+
```
git merge --allow-unrelated-histories ...
```
### Reset
+
Replace all commits on the `main` branch of the public rep with commits from the `main` branch of the private repo.
!!! danger "Danger"
@@ -212,11 +239,13 @@ Replace all commits on the `main` branch of the public rep with commits from the
This option is only safe to do when releasing the first version of a version on the public repo.
After setting up the remote for upstream run a command similar to the following:
+
```
git reset --hard upstream/main
```
### Cherry Pick
+
This method is used when the same commits exist in both repos with different hashes.
This requires finding which commits are in the private repo but not in the public repo.
@@ -224,6 +253,7 @@ This requires finding which commits are in the private repo but not in the publi
If the commits you cherry-pick have commits in common with different hashes this will result in merge conflicts and duplicated commits.
After fetching your upstream branch you can cherry pick a range of commits to add like so:
+
```
git cherry-pick ..
```
diff --git a/docs/wiki-guide/Virtual-Environments.md b/docs/wiki-guide/Virtual-Environments.md
index 97f6bc1..5d16944 100644
--- a/docs/wiki-guide/Virtual-Environments.md
+++ b/docs/wiki-guide/Virtual-Environments.md
@@ -1,9 +1,11 @@
# Managing Dependencies and Environments
+
Recording dependencies and environment information is crucial for reproducibility and interoperability across different platforms. There are many options for this, and sometimes it is appropriate to use multiple within the same project.
The goal is to make it as easy as possible for others (including your future self) to run the code.
## Conda Environments
+
The following example commands will get you set up with a Conda environment that can be tracked and shared.
- Install [Miniconda](https://docs.conda.io/en/latest/miniconda.html).
@@ -12,22 +14,25 @@ The following example commands will get you set up with a Conda environment that
- Install packages you need: `conda install -c conda-forge python=3.9 pandas matplotlib`
- `-c conda-forge` specifies the channel to install from. ([more information](https://docs.conda.io/projects/conda/en/latest/user-guide/concepts/channels.html))
- You can specify the version of a package or omit this to get the latest available. ([more information](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-pkgs.html#id2))
-- Once the needed packages are installed, export the environment to a file:
+- Once the needed packages are installed, export the environment to a file:
+
```bash
conda env export --no-builds --from-history | grep -v "prefix" > environment.yml
```
+
!!! info "Command breakdown"
- `--no-builds` and `--from-history` flags will cause the environment file to only specify the packages and versions that you manually installed. This may help with cross-platform compatibility by giving conda the flexibility to find compatible sub-dependencies on another system.
- `| grep -v "prefix"` eliminates your system-specific environment storage location (what is called the `prefix`) from being added to the file
- - If you want to add the actual package versions that were installed (if you did not specify during installation) to the `environment.yml` file, you can check those and copy-paste them in manually with `conda env list`.
+ - If you want to add the actual package versions that were installed (if you did not specify during installation) to the `environment.yml` file, you can check those and copy-paste them in manually with `conda env list`.
- Don't forget to also add and track this new file with git!
- To install the dependencies somewhere else from this file, use `conda env create -f environment.yml`.
## Pip Virtual Environment
+
For virtual environments using `pip` to install packages (Python environments), use `python -m pip freeze` to print a list of packages (and their versions) installed in the environment.
!!! info "Command extension"
- - `python -m pip freeze > requirements.txt` will populate a `requirements.txt` file with all these packages and versions listed (eg., `pandas==2.0.1`).
- - **Note:** This will _not_ give only minimum software requirements, but will also print _all_ dependencies.
+ - `python -m pip freeze > requirements.txt` will populate a `requirements.txt` file with all these packages and versions listed (eg., `pandas==2.0.1`).
+ - **Note:** This will _not_ give only minimum software requirements, but will also print _all_ dependencies.
- Install this machine-readable file with `pip install -r requirements.txt` when in the appropriate folder.
- For more information, see the [pip documentation](https://pip.pypa.io/en/stable/cli/pip_freeze/).
diff --git a/docs/wiki-guide/When-to-think-about-branch-protections.md b/docs/wiki-guide/When-to-think-about-branch-protections.md
index ce9f1ed..02ef0d5 100644
--- a/docs/wiki-guide/When-to-think-about-branch-protections.md
+++ b/docs/wiki-guide/When-to-think-about-branch-protections.md
@@ -4,7 +4,7 @@ Is your project going public? Are you releasing a package or tool for general us
## What are branch protections and why do we need them?
-Branch protections are essentially a more formalized implementation of contributing guidelines for your repository. This could be anything from requiring a pull request before pushing or merging updates to `main`, to requiring approval by particular parties before merging a pull request. For more information on branch protections, see GitHub's docs on [branch protection rules](https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/managing-protected-branches/about-protected-branches).
+Branch protections are essentially a more formalized implementation of contributing guidelines for your repository. This could be anything from requiring a pull request before pushing or merging updates to `main`, to requiring approval by particular parties before merging a pull request. For more information on branch protections, see GitHub's docs on [branch protection rules](https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/managing-protected-branches/about-protected-branches).
Generally speaking, once the set of potential users exceeds that of repository developers (i.e., the repo goes public), it is wise to apply branch protections, especially for the `main` branch of the repo. The primary purpose is to--at a minimum--alert developers of changes prior to their implementation. For more information on potential branch protection rules, see GitHub's [docs](https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/managing-protected-branches/managing-a-branch-protection-rule).
@@ -14,7 +14,6 @@ From your repository, navigate to "Settings" and select "Branches" from the left
The example below shows the addition of branch protection rules for `main` that require a pull request and that it be approved prior to merging. It also will remove approval if other changes are added that require approval.
-
### Example Branch Protection Rules for `main`
{ loading=lazy }
@@ -23,7 +22,7 @@ The example below shows the addition of branch protection rules for `main` that
## How to Implement Rulesets (Newer Version of Branch Protections)
-From your repository, navigate to "Settings" and select "Rules" from the left toolbar. Click on "New ruleset" and select the type you wish to create ("New branch ruleset" is the ruleset equivalent to branch protections).
+From your repository, navigate to "Settings" and select "Rules" from the left toolbar. Click on "New ruleset" and select the type you wish to create ("New branch ruleset" is the ruleset equivalent to branch protections).
{ loading=lazy }
/// caption
@@ -35,7 +34,7 @@ Here we have selected "New branch ruleset", and named it "published-branch", as
/// caption
///
-We choose to apply these to the default branch (`main` or `master`).
+We choose to apply these to the default branch (`main` or `master`).
{ loading=lazy }
/// caption
@@ -44,7 +43,7 @@ We choose to apply these to the default branch (`main` or `master`).
As with branch protections, it is also possible to set the rules for branches matching a particular pattern (eg., type `*release*` to apply the rules to any branch containing the word `release`). We will do this for `gh-pages`.
{ loading=lazy }
-/// caption
+/// caption
///
You can also edit branch rulesets from this page.
diff --git a/docs/wiki-guide/Why-use-the-Institute-GitHub.md b/docs/wiki-guide/Why-use-the-Institute-GitHub.md
index 84f02b8..2f5509c 100644
--- a/docs/wiki-guide/Why-use-the-Institute-GitHub.md
+++ b/docs/wiki-guide/Why-use-the-Institute-GitHub.md
@@ -2,31 +2,30 @@
The [Imageomics GitHub organization](https://github.com/Imageomics) exists to facilitate collaboration and version control among team members working on projects within the institute and make them available to the research community. You are encouraged to take advantage of the benefits of using this GitHub organization for your institute projects!
-## Centralization
+## Centralization
This is the main aspect that leads to other benefits. Whether you are running your code on your own computer, a GPU server, a supercomputing cluster, AWS, or the Matrix, maintaining a git repository of this code hosted on the institute GitHub org keeps everyone's work that is otherwise scattered around in a single place. Some of the benefits derived from this are ...
### Collaboration
-- You know where your team's work is, and your team knows where your work is.
+- You know where your team's work is, and your team knows where your work is.
- Code is simpler to find, share, access, review, and manage.
-- Access privileges can be granted and managed through [teams](https://github.com/orgs/Imageomics/teams), which Institute staff can administer for you, rather than access having to be managed on a per-individual basis.
-- Progress can be communicated readily, and help can be solicited when needed, including through [teams](https://github.com/orgs/Imageomics/teams).
+- Access privileges can be granted and managed through [Imageomics teams](https://github.com/orgs/Imageomics/teams), which Institute staff can administer for you, rather than access having to be managed on a per-individual basis.
+- Progress can be communicated readily, and help can be solicited when needed, including through [teams](https://docs.github.com/en/organizations/organizing-members-into-teams/about-teams).
### Knowledge Sharing
-- Projects documented by a well-written README file are much more accessible than combing through old Zoom recordings to find out or remember what someone else is working on.
-- When new members join, they can get up-to-speed quickly.
-- When members move on to new roles, their work is preserved and more easily continued and built upon.
-- Good practices can diffuse through and between teams more quickly.
+- Projects documented by a well-written README file are much more accessible than combing through old Zoom recordings to find out or remember what someone else is working on.
+- When new members join, they can get up-to-speed quickly.
+- When members move on to new roles, their work is preserved and more easily continued and built upon.
+- Good practices can diffuse through and between teams more quickly.
- These points enhance the research capacity and productivity of individuals as well as the overall institute.
### Visibility + Impact
-- Work hosted under the Imageomics GitHub organization is directly associated with and contributes to the institute's brand, showcasing the collective contributions of our teams and enhancing the visibility and impact of their work within the broader community.
+- Work hosted under the Imageomics GitHub organization is directly associated with and contributes to the institute's brand, showcasing the collective contributions of our teams and enhancing the visibility and impact of their work within the broader community.
- Your profile is featured alongside repositories you contribute to, providing opportunities for networking with those who find your work valuable.
-
## Professional Development
Despite its rough edges, the common standard for version control and code management is git. You'll get a competitive edge with experience using it on a team to collaborate.
@@ -42,4 +41,4 @@ While we encourage you to host your institute-related work on the Imageomics Git
- Personal projects or work not directly tied to the institute.
- Projects developed prior to joining the institute where transferring ownership might be complex or undesirable.
-We strongly encourage you to keep your institute-related work centrally organized in the Imageomics GitHub organization to maximize the benefits for you and your fellow researchers!
\ No newline at end of file
+We strongly encourage you to keep your institute-related work centrally organized in the [Imageomics GitHub organization](https://github.com/Imageomics) to maximize the benefits for you and your fellow researchers!
diff --git a/docs/wiki-guide/Why-use-the-Institute-Hugging-Face.md b/docs/wiki-guide/Why-use-the-Institute-Hugging-Face.md
index 8775ff1..2aa6f7e 100644
--- a/docs/wiki-guide/Why-use-the-Institute-Hugging-Face.md
+++ b/docs/wiki-guide/Why-use-the-Institute-Hugging-Face.md
@@ -54,5 +54,4 @@ While we encourage you to host your institute-related work on the Imageomics Hug
- Personal projects or work not directly tied to the institute.
- Projects developed prior to joining the institute where transferring ownership might be complex or undesirable.
-
-We strongly encourage you to leverage the Imageomics Hugging Face organization for your institute-related projects. This will help you and your fellow researchers maximize collaboration, knowledge sharing, and the overall impact of our collective work.
+We strongly encourage you to leverage the [Imageomics Hugging Face organization](https://huggingface.co/imageomics) for your institute-related projects. This will help you and your fellow researchers maximize collaboration, knowledge sharing, and the overall impact of our collective work.
diff --git a/docs/wiki-guide/images/digital-product-lifecycle/project_lifecycle-formal.png b/docs/wiki-guide/images/digital-product-lifecycle/project_lifecycle-formal.png
new file mode 100644
index 0000000..5442fba
Binary files /dev/null and b/docs/wiki-guide/images/digital-product-lifecycle/project_lifecycle-formal.png differ
diff --git a/docs/wiki-guide/images/index/382108831-1173cd79-db94-4326-8b6e-dcbdeb8939cd.png b/docs/wiki-guide/images/index/382108831-1173cd79-db94-4326-8b6e-dcbdeb8939cd.png
deleted file mode 100644
index 110d709..0000000
Binary files a/docs/wiki-guide/images/index/382108831-1173cd79-db94-4326-8b6e-dcbdeb8939cd.png and /dev/null differ
diff --git a/docs/wiki-guide/images/index/collaborative-infrastructure-diagram.png b/docs/wiki-guide/images/index/collaborative-infrastructure-diagram.png
new file mode 100644
index 0000000..a71b268
Binary files /dev/null and b/docs/wiki-guide/images/index/collaborative-infrastructure-diagram.png differ
diff --git a/mkdocs.yaml b/mkdocs.yaml
index c83a882..36419d9 100644
--- a/mkdocs.yaml
+++ b/mkdocs.yaml
@@ -1,11 +1,10 @@
site_name: "Imageomics Guide"
-site_description: "A guide to collaborative work for Imageomics, including GitHub and Hugging Face workflows."
+site_description: "An Imageomics-focused guide to collaborative work, including GitHub and Hugging Face workflows."
site_author: "Imageomics Institute"
site_url: "https://Imageomics.github.io/Imageomics-guide/"
-edit_uri: view/main/docs
repo_url: https://github.com/Imageomics/Imageomics-guide
-edit_uri: blob/main/docs/
+edit_uri: edit/main/docs/
theme:
name: material
@@ -60,13 +59,13 @@ extra:
link: https://imageomics.org
copyright: >
- This work was supported by both the Imageomics Institute and the AI and Biodiversity Change (ABC) Global Center.
+ This guide was developed alongside the Collaborative Distributed Science Guide, developed by the Imageomics Institute and the AI and Biodiversity Change (ABC) Global Center.
+
+ This work was supported by the Imageomics Institute.
The Imageomics Institute is funded by the US National Science Foundation's Harnessing the Data Revolution (HDR) program under Award #2118240 (Imageomics: A New Frontier of Biological Information Powered by Knowledge-Guided Machine Learning).
- The ABC Global Center is funded by the US National Science Foundation under Award No. 2330423 and the Natural Sciences and Engineering Research Council of Canada under Award No. 585136.
-
- Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation or the Natural Sciences and Engineering Research Council of Canada.
+ Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
plugins:
- glightbox
@@ -96,6 +95,7 @@ markdown_extensions:
title: 📖 On This Page
nav:
+# These are all relative links within the repo, only update if adding or deleting pages
- Home: index.md
- GitHub Guide:
- "Repo Guide": wiki-guide/GitHub-Repo-Guide.md
@@ -124,7 +124,9 @@ nav:
- Command Line Cheat Sheet: wiki-guide/Command-Line-Cheat-Sheet.md
- Code of Conduct: CODE_OF_CONDUCT.md
- Digital Product Policy:
+ - "About Digital Product Policies": wiki-guide/About-Digital-Product-Policies.md
- "Release and Licensing Policy": wiki-guide/Digital-products-release-licensing-policy.md
+ - "Digital Product Life Cycle": wiki-guide/Digital-Product-Lifecycle.md
- Other Resources:
- "Technical Infrastructure": wiki-guide/Technical-Infrastructure.md
- "Virtual Environments": wiki-guide/Virtual-Environments.md