Automating ML Pipeline with ModelKits + GitHub Actions

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • MyrinNew
    Senior Member
    • Feb 2024
    • 5175

    #1

    Automating ML Pipeline with ModelKits + GitHub Actions

    Building machine learning (ML) applications doesn’t end with training the models. Managing machine learning models often involves juggling multiple components—code, metadata, documentation, and more. Without a clear structure, this complexity can slow down development and create bottlenecks during deployment. To tackle these challenges, you need tools and workflows that simplify the process, ensure consistency, and support automation.


    ModelKits offer a way to package models with their artifacts (like code, metadata, and documentation) into a single, consistent unit. When combined with GitHub Actions, these tools help you build CI/CD pipelines to automate key tasks like unpacking, testing, and deployment.


    This guide will walk you through integrating ModelKits with GitHub Actions to create reliable workflows for machine learning applications. By the end, you’ll know how to automate model operations and streamline deployment processes.


    Prerequisites

    To follow along in this tutorial, you need the following:

    1. A GitHub account: Create a GitHub account by following the steps. Similarly, create a GitHub repository. In this article, the repository used is called kitops-githubactions.
    2. A container registry: You can use Jozu Hub, the GitHub Package registry, or DockerHub. This guide uses Jozu Hub.
    3. KitOps: Check out the guide on installing KitOps.
    4. Familiarity with GitHub Actions basics: You’ll be working with workflows, jobs, and runners to automate your pipeline. In particular:
      • Workflows define the pipeline and consist of event triggers, jobs, and steps.
      • Events trigger your GitHub Actions workflow. Triggers could be pushing to a GitHub branch, pull request, workflow_dispatch, etc.
      • Jobs contain steps that execute specific tasks, such as building or pushing ModelKits.
      • Runners are virtual machines that execute your workflows. This guide uses GitHub-hosted runners with an Ubuntu environment.
      • Actions are custom steps you can combine to create jobs. An example of an action could be checking out your repository. KitOps has an action that enables you to download the Kit CLI and add it to the path.


    Install KitOps

    First, you must make sure you have the Kit CLI installed locally. Once installed, run the command below to verify the installation:




    kit version





    You should see an output like the one shown in the image:





    Login to your Jozu Hub account and create a repository. In this article, the name of the Jozu Hub repository you will use is llama3-githubactions.





    Unpack the LLAMA3 ModelKit

    On your local terminal, run the command below:


    kit unpack jozu.ml/jozu/llama3-8b:8B-instruct-q5_0


    This will automatically create new files for you such as a: Kitfile, LICENSE, README.md, USE_POLICY.md, and llama3-8b-8B-instruct-q5_0.gguf.


    If you open the Kitfile, you should see the contents as:




    manifestVersion: 1.0.0
    package:
    name: llama3
    version: 3.0.0
    description: Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes.
    authors: ['Meta Platforms, Inc.']
    model:
    name: llama3-8b-8B-instruct-q5_0
    path: ./llama3-8b-8B-instruct-q5_0.gguf
    license: META LLAMA 3 COMMUNITY LICENSE AGREEMENT
    description: Llama 3 8B instruct model
    code:
    - path: LICENSE
    description: License file.
    - path: README.md
    description: Readme file.
    - path: USE_POLICY.md
    description: Use policy file.





    Let’s organize these files into folders and modify the Kitfile. Having an organized directory helps in improving readability and enabling easier collaboration. Your directory structure should look like this:




    |-- models
    |-- llama3-8b-8B-instruct-q5_0.gguf
    |-- docs
    |-- README.md
    |-- USE_POLICY.md
    |-- LICENSE
    |-- Kitfile





    Here, create a models folder and move the llama3-8b-8B-instruct-q5_0.gguf file from the root directory to the models folder. Similarly, create a docs directory and move the README.md, USE_POLICY.md, and LICENSE files into it.


    Modify your Kitfile to reflect the new directory structure.




    manifestVersion: 1.0.0
    package:
    name: llama3
    version: 3.0.0
    description: Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes.
    authors: ['Meta Platforms, Inc.']
    model:
    name: llama3-8b-8B-instruct-q5_0
    path: models/llama3-8b-8B-instruct-q5_0.gguf
    license: META LLAMA 3 COMMUNITY LICENSE AGREEMENT
    description: Llama 3 8B instruct model
    code:
    - path: docs/LICENSE
    description: License file.
    - path: docs/README.md
    description: Readme file.
    - path: docs/USE_POLICY.md
    description: Use policy file.





    There are four major components in the code snippet above:
    • manifestVersion: Specifies the version for the Kitfile.
    • package: Contains the metadata for your LLAMA3 package.
    • model: Specifies the model metadata, such as the model's name, its path, and a description.
    • code: Specifies the directory containing docs that need to be packaged.


    Now your Kitfile is ready, let’s create a CI/CD pipeline for automatically packing, tagging, and pushing to the Jozu Hub repository you created earlier.


    Integrate with GitHub Actions

    Before creating your workflow YAML file, you need to configure some secrets. These secrets are your Jozu Hub email and password, which you will later use for authenticating your GitHub Actions runner with your Jozu account. To create one, go to Settings in your GitHub repository.





    Under the Security section*,* expand the Secrets and variables dropdown and click on Actions.





    Add two secrets, i.e., JOZU_EMAIL and JOZU_PASSWORD.





    Now that your secrets are created, the next step here is to create your workflow file.


    Create your workflow file

    In the root directory of your repository, create a folder called .github/workflows. Inside this folder, create a file called workflow.yml. Paste the following code into the workflow.yml file:




    name: Deploy LLAMA3 to Jozu Hub
    on:
    push:
    branches:
    - master
    workflow_dispatch:
    permissions:
    id-token: write
    contents: read
    pull-requests: write
    issues: write
    actions: write
    env:
    ARTIFACT_NAME: jozu-artifact
    REPOSITORY_NAME: llama3-githubactions
    TAG: latest
    USERNAME: emmanueloffisongetim

    jobs:
    unpack-to-model:
    name: unpack-large-model
    runs-on: ubuntu-latest
    steps:
    - name: checkout repository
    uses: actions/checkout@v4
    - name: install kit
    uses: jozu-ai/gh-kit-setup@v1.0.0
    - name: kit version
    shell: bash
    run: |
    kit version

    - name: unpack llama3 model to models folder
    shell: bash
    run: kit unpack jozu.ml/jozu/llama3-8b:8B-instruct-q5_0 --model -d models

    - name: upload-artifact
    id: upload-artifact
    uses: actions/upload-artifact@v4
    with:
    name: ${{env.ARTIFACT_NAME}}
    path: .
    overwrite: true


    push-to-jozuhub:
    name: push-to-jozu
    runs-on: ubuntu-latest
    needs: unpack-to-model
    steps:
    - name: download-artifact
    uses: actions/download-artifact@v4
    with:
    name: ${{env.ARTIFACT_NAME}}

    - name: Display structure of downloaded files
    run: ls -R
    - name: install kit
    uses: jozu-ai/gh-kit-setup@v1.0.0
    - name: login-to-jozuhub
    shell: bash
    env:
    JOZU_EMAIL: ${{secrets.JOZU_EMAIL}}
    JOZU_PASSWORD: ${{secrets.JOZU_PASSWORD}}
    run: kit login jozu.ml -u $JOZU_EMAIL -p $JOZU_PASSWORD
    - name: pack-modelkit
    shell: bash
    env:
    REPOSITORY_NAME: ${{env.REPOSITORY_NAME}}
    TAG: ${{env.TAG}}
    USERNAME: ${{env.USERNAME}}
    run: kit pack . -t jozu.ml/$USERNAME/$REPOSITORY_NAME:$TAG
    - name: push-modelkit
    shell: bash
    env:
    REPOSITORY_NAME: ${{env.REPOSITORY_NAME}}
    TAG: ${{env.TAG}}
    USERNAME: ${{env.USERNAME}}
    run: kit push jozu.ml/$USERNAME/$REPOSITORY_NAME:$TAG





    This pipeline consists of two jobs: unpack-to-model and push-to-jozuhub.
    • unpack-to-model: This job installs the necessary kit, unpacks the LLAMA3 model into a models folder, and uploads the folder as a GitHub artifact.
    • push-to-jozuhub: This job depends on the unpack-to-model job. It logs in to Jozu Hub, packs your artifact and pushes to your Jozu Hub repository.


    Since the LLAMA3 model is huge (approximately 5GB), pushing it directly to GitHub would significantly increase the repository's size unnecessarily. To avoid this, the model is unpacked within the pipeline and stored as an artifact, rather than adding it directly to the repository.

    This approach optimizes storage and keeps the repository lightweight.


    This pipeline is triggered by a push event to the master branch. It also includes the following pipeline-specific environment variables:
    • ARTIFACT_NAME: The name assigned to the GitHub artifact.
    • REPOSITORY_NAME: The name of your Jozu Hub repository.
    • USERNAME: Your Jozu username.
    • TAG: The tag assigned to the packed ModelKit.


    Run your pipeline

    When you push to the master branch, the pipeline is automatically triggered. You can view its execution by navigating to the Actions section of your repository, where a visual graph of the pipeline's workflow is displayed.





    After your pipeline run is completed, you will see the ModelKit in your Jozu Hub repository.





    Wrapping up

    Manually deploying your AI projects every time you make a change can be time-consuming and frustrating. With GitHub Actions and KitOps, you can automate the building, testing, and deployment of your models and their dependencies seamlessly.


    KitOps simplifies packaging models and managing dependencies, while GitHub Actions streamlines the deployment process by automatically triggering workflows whenever changes are pushed. This automation leads to faster, more reliable deployments and enhances team collaboration.


    If you have questions about integrating KitOps into your workflow, join the conversation on Discord and start using KitOps today!




    More...
Working...