AI on Jose Gato Blog

Building Custom Claude Code Skills for Zero Touch Provisioning Operations

Sun, 18 Jan 2026 00:00:00 +0000

Building Custom Claude Code Skills for Zero Touch Provisioning Operations

Note: If you’re not familiar with OpenShift, just think oc = kubectl in the examples below.

Note: This is not an official project, and for sure, nothing I would use in production environments. By the moment, just a personal project to learn about using “Claude Code” for these tasks.

Managing multiple OpenShift clusters through RHACM using Zero Touch Provisioning (ZTP) is highly repetitive, especially when deploying and destroying the same clusters frequently for testing new features, bug fixes, or workflow validation. In a nutshell, ZTP uses a Kustomize file (synced with ArgoCD) containing manifests to deploy OpenShift clusters through a hub cluster.

During a typical week, I perform these steps countless times:

Edit the kustomization file to add a cluster YAML
Git commit and push with a descriptive message
Refresh and sync the ArgoCD app, then wait for completion
Create secrets that shouldn’t live in Git repos
Monitor until everything is ready

Redeploying means reversing the process: comment out the kustomization entry, push, sync, wait, then repeat the deployment steps.

While I had some scripts for this, I wanted to improve my AI skills by using Claude Code Skills and Commands instead of complex shell scripts full of conditionals and wait loops. The project aims to make my life easier while focusing on learning how to interact with these models effectively.

The project is available here, and I welcome feedback.

I won’t dive into every command detail since the Git repo has documentation and things are actively changing. Instead, I’ll walk through some typical workflows.

Deploying a Cluster

While my favorite command is “redeploy a cluster” (the one that inspired this project), I’ll demonstrate the simpler “deploy a cluster” workflow.

First, configure your environment by setting your KUBECONFIG and ArgoCD endpoint:

Note: Command outputs are simplified for readability. The actual execution is longer and requires some initial user interaction.

❯ /configure_environment 

● I'll help you configure the environment for ZTP operations.

 Let me start by checking the current KUBECONFIG status.

● The KUBECONFIG environment variable is not currently set. I need you to provide the path to your kubeconfig file.

● User declined to answer questions

❯ KUBECONFIG=~/Servers/EL8000-2/kubeconfig-el8k-hub-2 

 ⎿  🔍 Checking connectivity to OpenShift cluster...
    ✅ Connected to OpenShift cluster

● Excellent! The cluster is reachable with the KUBECONFIG: /home/jgato/Servers/EL8000-2/kubeconfig-el8k-hub-2

 Now let me check the available hubs to help you select one.

● Perfect! I found the available hubs. Based on your KUBECONFIG path (which contains "hub-2"), I'll present the hub options:
                                                                                     
● User answered Claude's questions:
 ⎿  · Which RHACM hub would you like to use for ZTP operations? → hub-2 (Recommended)

● Perfect! Environment configuration is complete.

 Configuration Summary:
 - KUBECONFIG: /home/jgato/Servers/EL8000-2/kubeconfig-el8k-hub-2
 - Selected Hub: hub-2 (Default hub)
     - ArgoCD Endpoint: openshift-gitops-server-openshift-gitops.example.com
 - Cluster Status: ✅ Connected and reachable

✻ Worked for 33s

Now we are ready to deploy with /deploy_cluster vso6:

During the process these will be the tasks to do.

  Todos · ctrl+t to hide todos
  ☐ Verify cluster manifest exists and check kustomization.yaml
  ☐ Prepare cluster pre-requirements (namespace and secrets)
  ☐ Add/uncomment cluster entry in kustomization.yaml
  ☐ Create git commit for cluster addition
  ☐ Push changes to origin/main
  ☐ Sync ArgoCD clusters application on hub-2
  ☐ Monitor cluster installation until ManagedCluster is available and joined (max 3 hours)
  ☐ Extract kubeadmin password
  ☐ Extract and save kubeconfig file

The only user intervention will happen during “prepare cluster pre-requierements”. To introduce some credentials. Here we go:

❯ /deploy_clusters vsno6 

● Let me read the cluster manifest to show you a summary.

● Read(vsno6.yaml)
  ⎿  Read 84 lines

● Cluster Summary for vsno6:
  - Type: Single Node OpenShift (SNO)
  - Namespace: vsno6
  - OpenShift Version: 4.18.5
...
...
  - Status in kustomization.yaml: Currently commented out (line 12: #- vsno6.yaml)

● Now I'll prepare the cluster pre-requirements (namespace and secrets).

● Skill(/prepare_clusters)
  ⎿  Successfully loaded skill

● I'll prepare the cluster pre-requirements for vsno6. Since we're in the context of cluster deployment, I'll proceed with creating the namespace and secrets.

● The namespace vsno6 already exists. Now I'll run the preparation script to create the required secrets.

● Bash(export KUBECONFIG=/home/jgato/Servers/EL8000-2/kubeconfig-el8k-hub-2 && .claude/commands/scripts/prepare_ztp_cluster_pre_reqs.sh vsno6)
  ⎿  vsno6                                         
     secret/assisted-deployment-pull-secret created
     Now introduce the credentials for the BMC
     … +4 lines (ctrl+o to expand)

● The secrets were created successfully. Let me verify both secrets exist in the vsno6 namespace.

● Bash(oc --kubeconfig /home/jgato/Servers/EL8000-2/kubeconfig-el8k-hub-2 get secret -n vsno6 assisted-deployment-pull-secret vsno6-bmc-secret)
  ⎿  NAME                              TYPE     DATA   AGE
     assisted-deployment-pull-secret   Opaque   1      15s
     vsno6-bmc-secret                  Opaque   2      6s

● Cluster preparation completed successfully!

  The namespace vsno6 now has the required secrets:
  - ✅ assisted-deployment-pull-secret (created 15s ago)
  - ✅ vsno6-bmc-secret (created 6s ago)

● Now I'll uncomment the vsno6.yaml entry in kustomization.yaml.

● Changes applied to kustomization.yaml:
  - vsno5.yaml
  -#- vsno6.yaml
  +- vsno6.yaml
  #- vsno7.yaml

● Now I'll create a git commit and push the changes.

● Commit created successfully. Now I'll push to origin/main.

● Push completed successfully. Now I'll sync the ArgoCD clusters application on hub-2.

● Skill(/synch_clusters)
  ⎿  Successfully loaded skill

● I'll sync the ArgoCD clusters application on hub-2.

  Sync Details:
  - Hub: hub-2 (openshift-gitops-server-openshift-gitops.apps.hub-2.example.com)
  - Application: clusters
  - Prune: No (standard sync without prune)

  Let me check if we're already logged in to the ArgoCD instance.

● We're already logged in to the ArgoCD instance. Now I'll refresh and sync the clusters application.

● Let me sync the application directly, which will also refresh it.

● Bash(argocd app sync clusters --server openshift-gitops-server-openshift-gitops.apps.hub-2.example.com --insecure --grpc-web) 

● ArgoCD Sync Completed Successfully!

  Key results:
  - ✅ ClusterInstance vsno6 created in namespace vsno6
  - ✅ Sync Status: Synced to main (ea48355)
  - ✅ Phase: Succeeded
  - ✅ Duration: 2s

● Now I'll begin monitoring the cluster installation. I'll check the status every 5 minutes until the ManagedCluster is Available and Joined, with a maximum wait of 3 hours.

  Starting installation monitoring at 12:32...

● Skill(visualize-cluster-status)
  ⎿  Successfully loaded skill

● I'll check the status of cluster vsno6 using the optimized data collection script.

The visualization command runs every 5 minutes until the cluster is fully deployed.

● # 🎯 vsno6 | Status: ✅ INSTALLED | Duration: ~41m (hub-2)

  ## ClusterInstance Status (Primary CR)
  **Created:** 11:31:58Z | **Generation:** 1 (observed: 1) | **Phase:** Deployed
  - ✅ ClusterInstanceValidated - Validation succeeded
  - ✅ RenderedTemplates - Rendered templates successfully (7 manifests)
  - ✅ RenderedTemplatesValidated - Rendered templates validation succeeded
  - ✅ RenderedTemplatesApplied - Applied site config manifests
  - ✅ Provisioned - Provisioning completed

  ## Core Resources
  +----------------------+--------+--------------------+----------------------+
  | Resource             | Status | State/Info         | Details              |
  +----------------------+--------+--------------------+----------------------+
  | 📦 BareMetalHost     | ✅     | provisioned/det.   | Power: Off           |
  | 💿 InfraEnv          | ✅     | Image ready        | Created: 11:31:59Z   |
  | 🚀 AgentClusterInst  | ✅     | adding-hosts       | Cluster is installed |
  | 🎮 ManagedCluster    | ✅     | Available & Joined | Ready                |
  +----------------------+--------+--------------------+----------------------+

  ## Agent Details (1 total, 1 approved)
  +---------+--------+-----------+
  | ID      | Role   | State     |
  +---------+--------+-----------+
  | ...0006 | master | installed |
  +---------+--------+-----------+

  **Conditions:** ✅ Validated ✅ Requirements ✅ Completed ✅ Not Failed

  Installation Complete! 🎉

  The cluster vsno6 has successfully completed installation in approximately 41 minutes:
  - ✅ ManagedCluster is Available and Joined
  - ✅ Agent installed and operational
  - ✅ All provisioning steps completed

Finally, it extracts credentials and shows the final results:

● Deployment Complete! 🎉

  The cluster vsno6 has been successfully deployed on hub-2.

  Deployment Summary

  Cluster Information:
  - Name: vsno6
  - Type: Single Node OpenShift (SNO)
  - OpenShift Version: 4.18.5
  - Installation Time: ~41 minutes
  - Status: ✅ Available and Joined

  Cluster Access:
  - Kubeadmin Password: ******************
  - Kubeconfig Location: /tmp/kubeconfig-vsno6

  Deployment Steps Completed:
  1. ✅ Verified cluster manifest (vsno6.yaml)
  2. ✅ Created namespace and secrets (pull-secret, BMC credentials)
  3. ✅ Updated kustomization.yaml (uncommmented vsno6.yaml entry)
  4. ✅ Committed and pushed changes to git (commit: ea48355)
  5. ✅ Synced ArgoCD clusters application on hub-2
  6. ✅ Monitored installation to completion
  7. ✅ Extracted kubeadmin password
  8. ✅ Extracted kubeconfig file

  Final Cluster Status:
  - ManagedCluster: Available & Joined
  - Agent: Installed (master role)
  - BareMetalHost: Provisioned and detached
  - All conditions: Validated ✅

The cluster is now ready for use.

The redeploy command is much more impressive, but it requires time and gathering lots of logs. Not very readable for the blog, but I invite you to try it and give me some feedback.

Key Learnings

The most valuable part of this project has been the lessons learned. LLM examples always show the final successful result, but let me be honest about the main challenges I faced:

Upfront User Prompting

I needed the model to handle the entire redeployment with error control and retries, but without asking for secrets 90 minutes into execution. I wanted to trigger it and walk away confidently.

Getting the model to gather all necessary context upfront and maintain it throughout the process proved challenging. The non-deterministic nature of LLMs made this harder than it sounds.

Preventing Unwanted Decision-Making

ZTP operations are highly repeatable. In a controlled environment where I’ve deployed clusters hundreds of times, failures are rare and expected to succeed. I needed the model to follow the task list patiently without making decisions that could break the process.

During long redeployments, the model would sometimes try to debug, investigate problems, or even attempt fixes. This wasn’t the goal. I don’t want debugging or problem-solving during these workflows. If something takes too long, just break and let me investigate when I return.

Over-Engineering by the Model

The cluster status visualization command was particularly challenging. Even with a clear structure, the model would sometimes improvise its own inspection and show different information. This broke the consistent “application experience” I wanted.

Constraining the model to show only what I specified was painful. After displaying status, it would often investigate potential failures or provide unnecessary details. Sometimes, less is more.

Non-Deterministic Command Execution

I wanted to use natural language as much as possible. If I had to script everything, I might as well write a complete shell script—that wasn’t my goal. However, for some steps, I needed scripts to constrain exactly how actions execute.

Otherwise, the model would execute the same command in different ways. For example, getting a secret could be:

oc --kubeconfig get secret <params>
oc get secret --kubeconfig <params> && oc get secret <params>
Various combinations with echo, cat, pipes, etc.

While generally smart, this variability becomes a security concern. Claude Code uses permission mechanisms to allow approved commands without user intervention. I spent hours controlling my allow list:

  "permissions": {
    "allow": [
      "Skill(redeploy_clusters)",
...
...
      "Bash(.claude/commands/scripts/*)",
      "Bash(.claude/skills/visualize-cluster-status/scripts/*)",
...
...
      "Bash(mkdir -p tmp-*)",
      "Bash(oc --kubeconfig *)",
...
    ],

I wanted to allow operations based on your kubeconfig permissions. This config looked good, but after an hour of deployment, the model might use:

oc get secret -n vsno6

Instead of the allowed patterns:

oc --kubeconfig=.... get secret -n vsno6
# or
KUBECONFIG=.... oc get secret -n vsno6

This doesn’t match the allow list, triggering permission prompts. I don’t want user intervention during automation, but I also don’t want an allow list covering every command variation.

The non-deterministic nature of models made controlling this extremely challenging. I didn’t want to rely entirely on predefined scripts, but finding the right balance was difficult.

The Addictive Nature of AI-Assisted Development

Adding new features becomes surprisingly addictive. The initial implementation is easy and straightforward—getting a working first version takes minutes. Refining it to reliable quality is harder, but you can leverage the model itself to add new steps or make the workflow more robust.

This creates an “addiction loop”: “Could I also make it do…?” Why not? A couple of minutes later, you see results. The rapid feedback cycle is compelling.

Example: “Can I make ArgoCD login use KUBECONFIG instead of opening an interactive web browser?” (Currently working on this!)

Final Thoughts

This has been fascinating work with impressive results. However, I experienced the classic “AI effect”: the first successful run felt like magic, but moving from demo to active use revealed challenges—mainly non-deterministic command execution and security control.

I still have work to do. Time will tell whether this saves me more time than I spend refining prompts… or whether some prompts ultimately become scripts again. :)

Converting a Jira issue into a solution doc with AI

Mon, 14 Jul 2025 00:00:00 +0000

Jira bug to a document with a solution

In this short blog I will document a kind of “hypothetical procedure”: how to extract, from a Jira ticket, enough knowledge to create a document with the solution to a problem.

I will start with a very good example, a Jira ticket with a bug (open to the world because bugs should be open to everyone), a fix and a couple of workarounds. Basically, there is a problem with a wrong configured OpenShift ServiceAccount (we don’t need to go into details). This has been fixed on newer versions, but for the ones already suffering the bug, there is a kind of manual workaround.

So, let’s use a chatbot, with a good LLM, and an MCP agent to interact with Jira. My environment will be composed by:

Cursor to use as chat bot and the agent to interact with the MCP
Gemini Pro 2.5 as the LLM
A Jira MCP to provide new tools for the chatboot Another combination of tools would work the same.

Note: in order to focus on the procedure, I am not explaining how to create this environment. But there is tons of documentation out there. Actually, I guess your LLM would help you with the configuration.

I will start with a prompt to be used with Cursor. The idea is to create a first generic prompt that I would use in future similar scenarios. Something like:

you are an IT guy that works with support activities.
Write me the solution with the following format:
**Title**
**Abstract**
**Body **
The bug has the id ACM-15056, get all the comments 
for that ticket getting fields all and expand attribute
to comments.
You have found a bug related to one Openshift component.
More in concrete on Red Hat Advanced Cluster Management.
The bug is documented on a Jira ticket, that explains the problem.
It explains also that is already fixed on newer versions and it should not happen again.
But for the ones that already hit the bug, they have to fix it
with a manual procedure explained also in the Jira Ticket.

Note: I will use the Jira MCP to retrieve all the content. In previous tests, I just exported the Jira ticket to include it together with the prompt. But this is a more automatic way. Consider: that the Jira API extracts everything and cannot filter, for example, private comments. Something, that maybe, you don’t want to use with the model.

Note: it took me a while to instruct the agent correctly about how to use the jira tool to retrieve all the comments. In the case of Gemini Pro I have to give a lot of details, claude-4 managed to find out on its own.

The prompt is composed by a generic part, and then, a kind of form that I could fill depending on the case. The agent will start using the MCP to interact with Jira and you have to give it perms to do it.

After having the enough perms, this is the result:

Not bad for a first attempt, it is focusing on an existing KCS that points you to create the missing pull-secret. This is not the workaround we want to document, I want the one about: the pull-secret is no longer needed.

Second attempt. Now, with more concrete instructions about the solution to document and some more details based on some more iterations I had. Some addings:

Focus on the workadound I am interested with.
It is needed to restart pods.
Be brief.
How to format links. I don’t want to make this too long about all the interactions I did. But you should have in mind that, getting the better prompt takes time:

you are an IT guy that works with support activities.
Write me the solution with the following format:
**Title**
**Abstract**
**Body **
The bug has the id ACM-15056, get all the comments 
for that ticket getting fields all and expand attribute
to comments. 
You have found a bug related to one Openshift component.
More in concrete on Red Hat Advanced Cluster Management.
The bug is documented on a Jira ticket, that explains the problem.
It explains also that is already fixed on newer versions and it should not happen again.
But for the ones that already hit the bug, they have to fix it
with a manual procedure explained also in the Jira Ticket.
Focus on showing
the manual procedure of modifying the ServiceAccount to not use the pull-secret.
Be brief explaining how to remove the pull-secret from the ServiceAccount using an oc patch. And show an example on how it would look a filing ServiceAccount. When removig only remove the failing pull secret.
In the section where you do the patch, dont explain every parameter.
Remember you have to restart the affected pods.
Finally add a note that points to the other KCS with the second solution and recommend to upgrade as the best option.
Compose the links in the way of [name](link)

Now, the solution is much more correct about, first, how to check the fail.

Second, how to only remove the failing part and how to correctly restart the services:

To master prompts is essential to get the better from the model.

We are happily done :) or not…

A final round using MCPs

Writing an article with a solution, it does not require just me to review it. In general, you have another colleague that will give you a hand. And in addition, as it is our case, we could create a Jira Ticket to ask for a review under an internal process. I created a new prompt to create a new Jira ticket, that internally will trigger a review of article:

I want to create a jira ticket to inform that I have created an article (called solution article) that needs review. Dont create the jria ticket yet, but lets compose it.
now I have the article ready and I will publish it on an open blog. But first, I need that a a committee makes a review.
Write me the text to ask a review for the previous document, with the following format:
**Abstract**
**Business impact (targeted use cases)**
**Audience**
Business Impact is about what could happen if the previous bug is not fixed, and how it could affect our customers
Audience is about who will read it, and if it is technical or not. Dont forget to include a link to the original bug.
Because the objective is a jira ticket, please include a title for the jira ticket explain that we are creating a KBA and why.
Also add some metadata about the reviewer, who is My Manager < mymanager@redhat.com>
plese, add that the article to review can be found here: https://<whatever>.redhat.com/articles/<article>

That prepares everything:

everything seems correct, so lets create the ticket:

create the ticket on the project <TEST>,
set the Assignee to My Manager <MyManager@redhat.com>

Worked like a charm. I could even add comments and some more labels, to configure it properly into the review process.

Now, I am done ;)

Conclusions

Jira tickets contain so much information, that sometimes, it is hard to retrieve the most relevant parts. A ticket with a problem, the fix, and the workaround, are susceptible to creating articles that just tell you how to solve a problem. An LLM chat would help you synthesize and extract this concrete workaround, and to format it the proper way.

Some important lessons learned:

Guiding the bot took me a while and also to refine the prompt, like explaining how to build links, or how to properly use the MCP tools. But once it is done, next time I can use 80% of the prompt and just add some concrete guides.
When I say a while it means… many tries. But it was also my first time. There is no magic, and you need time to learn how to interact.
Other concerns should be taken in mind, like for example: the Jira API does not filter comments. Including even the private ones.
The person guiding the bot is responsible for the final document. To know exactly the problem and the solution is a must. But also, to review that the output and commands are correct. Finally, another process that implies other persons to help with the review would be needed.

The experiment or idea explained here, would help to document more solutions, but it has to be used with extreme care. The saved time (that would come after repeating and gaining experience) should be counted about the tedious work of starting to write something from scratch. But it does not save you to read, check, try and review.

Integrating Red Hat and IBM served models

Fri, 06 Jun 2025 00:00:00 +0000

Integrating Red Hat and IBM served models

Recently, Red Hat/IBM released different AI models, tools and infrastructure. This allow us to play and learn on different activities related to AI. Through an internal platform I asked for some infrastructure to allocate an AI model that I can use on my daily duties. Or at least, to experiment. Something that seems a funny thing to do on a #LearningDay.

Why to use and integrate and external LLM model? I could just use the already integrated Copilot on VS, or, I could just serve it locally with ollama. There are different reasons I could summarize as:

Performance and scale: for this demo I will not need to scale, but I dont want to wait several seconds for every request. Neither I want to burn my poor laptop lacking of a GPU.
Privacy: because maybe you dont want to (or you are not allowed to) interact (and send your data) to a model running who knows where. Ok, now I am doing it, but on a more reliable environment and using 100% OpenSource.

This is good for experimenting, but if I had to start a more serious project, I would replicate all the environment. Moving to a more controlled cloud environment if needed. Using the same approach, models and technologies.

As I am using an internal platform, the process of registering and getting the infrastructure is not covered here. If you are a Red Hat colleague, I started here.

So, now that I have the model, what I really have is an endpoint and an api key.

For the endpoint, something like:

https://granite-3-2-8b-instruct--apicast-staging.apps.i....paas.redhat.com:443/v1/

Notice this url will serve the model with routes as: /v1/completions or v1/chat/completions. Because, the servers export the models using the OpenAI API.

With the endpoint, the model name and the key we can just curl the model. Easy and quick:

> curl -sH "Content-Type: application/json"\
            -d "{ \
              \"model\": \"/data/granite-3.2-8b-instruct\",\
              \"prompt\": \"ey thereeeeeeeeeeeeee\",\
              \"max_tokens\": 700,\
              \"temperature\": 0\
            }"\
            --url "https://granite-3-2-8b-instruct....redhat.com:443/v1/completions"\
            -H "Authorization: Bearer 192829....8edcde" | jq
	
{
  "id": "cmpl-5620b9cca2db4e27b7cad839246110ad",
  "object": "text_completion",
  "created": 1746805780,
  "model": "ibm-granite/granite-3.2-8b-instruct",
  "choices": [
    {
      "index": 0,
      "text": "\n\nHello! I'm an assistant, designed to help answer your questions. I don't have personal experiences or a physical presence, but I'm here to provide information and assistance. How can I help you today?",
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null,
      "prompt_logprobs": null
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "total_tokens": 55,
    "completion_tokens": 45,
    "prompt_tokens_details": null
  }
}

But even if curl is always cool, it does not help very much about integrating and AI model in our daily tasks. Following, I will integrate the model using different tools. The model will be used as a chat bot, code/doc correction and execute some tasks. Of course, everything using Open Source.

Integrate the model into VisualStudio and Continue

My following work is based on a colleague (thanks @EranCohen), who proposed to use Visualstudio and the Continue plugin to create a hub of models.

Continue is an open-source AI code assistant designed to integrate Large Language Models (LLMs) directly into your Integrated Development Environment (IDE). LitteLLM act as a proxy for different models.

Try direct connect between Visual Studio and our models

Once you have Continue plugin installed (just a few clicks), you can configure your Local Assistant to interact with different models.

Access to the Continue Local Assistante configuration with:

It will open a new configuration file, that you can fill with something like:

name: Local Assistant
version: 1.0.0
schema: v1
models:
  - name: Red Hat Model (direct connect)
    provider: openai
    model: ibm-granite/granite-3.2-8b-instruct
    apiKey: 19282906ec......2ad8edcde
    apiBase: https://granite-3-2-8b-instruct....redhat.com:443/v1/
    systemMessage: You are Granite Chat. You carefully follow instructions and can
      use tools at your disposal to fulfill the request. You always respond to
      greetings with "Hello! I am Granite Chat. How can I help you today?
    contextLength: 32000
context:
  - provider: code
  - provider: docs
  - provider: diff
  - provider: terminal
  - provider: problems
  - provider: folder
  - provider: codebase

So, I can chat with it:

Pretty straight forward to integrate the RedHat/IBM serving models platform with VS and Continue.

By the way, I had to add some Red Hat CA to trust on the server that is serving the model. You know, copy the certs on your OS path and update the certs DB

Try with LitteLLM proxy in the middle

Also proposed by @EranCohen, for a better tool to talk to a model, to use LitteLLM in the middle.

LittleLLM proxy helps you to act as a hub for different models, you can switch from one to another depending on the needs. You can use one model completion, other for chatting, etc.

Some quick instructions will be:

> git clone https://github.com/BerriAI/litellm
> cd litellm

Now lets create the proxy configuration:

> cat litellm_config.yaml
model_list:
  - model_name: gpt-4o
    litellm_params:
      model: hosted_vllm/ibm-granite/granite-3.2-8b-instruct
      api_base: https://granite-3-2-8b-instruct--apicast-staging......paas.redhat.com:443/v1/
      api_key: 192.....cde

litellm_settings:
  ssl_verify: "/etc/ssl/certs/2022-IT-Root-CA.pem"
  drop_params: true

model_name Here I am not 100% sure, I am using gpt-4o to later make it work in agent mode. According to this. But it should be the model name to be used, when making a request to the proxy.
model It is in the format of “provider/model”. In my case, because I am using this experimentation infrastructure, I know that is served using vLLM. I can use the provider VLLM.
api_base and api_keythat I obtained from our internal infrastructure and services.

Notice, I have added some configuration to use the Red Hat CA. That I will mount inside the container.

So, now I can run the proxy in a container, passing the LiteLLM config and the CA:

> podman run -v $(pwd)/litellm_config.yaml:/app/config.yaml\
	-v $(pwd)/redhat-ca/2022-IT-Root-CA.pem:/etc/ssl/certs/2022-IT-Root-CA.pem:ro\
	-p 4000:4000\
	--privileged ghcr.io/berriai/litellm:main-latest\
	--config /app/config.yaml --detailed_debug

Now, I can directly interact with the proxy. For example, using the /chat/completions/:

> curl -s --location 'http://0.0.0.0:4000/chat/completions' \
   --header 'Content-Type: application/json' \
   --data '{
    "model": "Red Hat Model",
    "messages": [
        {
        "role": "user",
        "content": "Hi there, how is going?"
        }
    ]
}' | jq
{
  "id": "chatcmpl-bb08825889bd4629ac1e709ce99b2ae1",
  "created": 1746803998,
  "model": "hosted_vllm/ibm-granite/granite-3.2-8b-instruct",
  "object": "chat.completion",
  "system_fingerprint": null,
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "Greetings! I'm an artificial intelligence and don't have feelings, but I'm functioning optimally and ready to assist you. How can I help you today?",
        "role": "assistant",
        "tool_calls": null,
        "function_call": null
      }
    }
  ],
  "usage": {
    "completion_tokens": 39,
    "prompt_tokens": 66,
    "total_tokens": 105,
    "completion_tokens_details": null,
    "prompt_tokens_details": null
  },
  "service_tier": null,
  "prompt_logprobs": null
}

Now, lets integrate the LiteLLM proxy with our Continue Local Assistant. We can add a new model (to our previous created Continue configuration) provided through LiteLLM proxy:

name: Local Assistant
version: 1.0.0
schema: v1
models:
  - name: Red Hat Model (litellm)
    provider: openai
    model: gpt-4o
    apiKey: ..........
    apiBase: http://127.0.0.1:4000/v1/
    systemMessage: You are Granite Chat. You carefully follow instructions and can
      use tools at your disposal to fulfill the request. You always respond to
      greetings with "Hello! I am Granite Chat. How can I help you today?
    contextLength: 32000
  - name: Red Hat Model (direct connect)
    provider: openai
    model: ibm-granite/granite-3.2-8b-instruct
    apiKey: ........
    apiBase: https://granite-3-2-8b-instruct--apicast.....paas.redhat.com:443/v1/
    systemMessage: You are Granite Chat. You carefully follow instructions and can
      use tools at your disposal to fulfill the request. You always respond to
      greetings with "Hello! I am Granite Chat. How can I help you today?
    contextLength: 32000
context:
  - provider: code
  - provider: docs
  - provider: diff
  - provider: terminal
  - provider: problems
  - provider: folder
  - provider: codebase

My Local Assistant is now configured with two models.

And different ways to interact, like edit, chat, or agent:

Lets just give some greetings:

Working with the model

Now, that I have the model integrated in different ways, I can use for different tasks. Because chat with is cool, but not enough. So, I want to make it help me with some daily tasks. Here I have a shell script far from been perfect. So, I will ask Granite for help me to improve it:

It is quite impressive how added some comments about what is happening in the script. Also, some outputs to help following the script execution. Finally, it added a lot of checks to make the script much more robust. Not bad for a quick try. I can add the model to my daily activities and…. in a safe 100% OpenSource environment performing pretty well.

To be continued

As a second phase, for another blog entry, I would like to go further using the Agent mode and configuring some MCP. That is the way of adding more context (like a kind of RAG) but also executing actions.

In principle, VisualStudio comes with some pre builtin functions to execute actions, and I started to configure an MCP to interact with my GitHub account. My intention is to make the agent to do a PR to my blog’s repo to publish it. But, I am having some problems with the agent mode trying to execute MCP’s tools. So, I keep this for a second round of the blog.

Conclusions

Finally, due to the fact that I am writing this article from VisualStudio, I will let the model to write the final conclusions. So, he gets also some authoring in the blog:

As I explored integrating Red Hat and IBM AI tools into my daily tasks using Visual Studio and the Continue plugin, I was able to successfully connect directly to an AI model for various applications, such as chatbot functionality, code/doc correction, and task execution, all through open-source means. This endeavor has significantly augmented my productivity, providing quicker access to insights and improving my overall coding experience within a familiar environment.

Utilizing the LittleLLM proxy as an intermediary allowed me to easily switch between different AI models according to specific task requirements. This flexibility was particularly beneficial in scenarios necessitating optimized models for various functions.

The journey with AI integration has highlighted its potential beyond mere conversational entities. By employing advanced modes like Agent Mode and leveraging tools such as MCP (Model Context Protocol), AI can further assist in intricate tasks like automated pull requests or complex data manipulations, thereby revolutionizing its role in software development and maintenance. The road ahead promises exciting possibilities in harnessing AI’s true power within development workflows.

Again me: I dont think this has augmented my productivity, or not yet ;)

InstructLab for creating your own 'ChatPGT' like model

Fri, 21 Jun 2024 00:00:00 +0000

My first steps with InstructLab and Merlinite model

Recently, Red Hat and IBM released the open-source project InstructLab and community. The project aims to work with large language model (LLM) models such as Granite and Merlinite. Both Granite and Merlinite are also available as open-source releases from IBM.

Granite is trained on Meta’s Llama2-7B LLM model, while Merlinite is fine-tuned from Mistral.

InstructLab helps users download, serve, and interact with LLM models. Similar to other projects like Ollama, but with the main difference being that InstructLab focuses on retraining (fine-tuning) existing models. This allows for easy injection of new knowledge into an existing model without requiring AI skills.

In this blog post, I will provide a brief overview of my initial experiences with InstructLab. In the future, I hope to train the model using my own dataset (documents, notes, etc.) locally, running 100% open-source as “intelligent assistant”.

In general, is not recommended to train models with just a CPU. In my case, while writing this blog, I am using a “powerful” baremetal server. Which does not have GPU, but, it has lots of CPU and memory. Try to reproduce this on a regular laptop would be too tedious about the time it would take. Specially for generating synthetic data.

Installing InstructLab and downloading Granite model

I will not go on how to install InstructLab, because it is very well explained here.

During this tutorial I am using Merlinite model, but you could also try Granite. If you followed previous instructions you will be able to easily serve the model:

$ ilab serve --model-path models/merlinite-7b-lab-Q4_K_M.gguf
INFO 2024-06-14 07:46:06,435 lab.py:340 Using model 'models/merlinite-7b-lab-Q4_K_M.gguf' with -1 gpu-layers and 4096 max context size.
INFO 2024-06-14 07:46:08,237 server.py:206 Starting server process, press CTRL+C to shutdown server...
INFO 2024-06-14 07:46:08,237 server.py:207 After application startup complete see http://127.0.0.1:8000/docs for API.

And start chatting (ilab chat) in other terminal:

$ ilab chat
>>> hi, who are you?
Hello! I'm glad to see you. How can I assist you today?

Nice to meet you merlinite. Until now everything was easy and smooth, a couple of commands and you can run your own model locally.

Adding new knowledge or skills

This is just an example for playing/demoing and not trying to write the whole knowledge about SNO

After having a more extensive conversation with Merlinite, I was disappointed to discover that it doesn’t have any knowledge about Single-Node OpenShift (SNO).

Very briefly, an SNO is an Openshift cluster with one one node. Single-node Openshift. And if you don’t know what is Openshift, you can think on it as vitamined Kuberneted provided by Red Hat.

>>> what do you know about SNO?
In OpenShift, SNO stands for "Source to Native Operator." It is a component
that enables the seamless deployment and management of Kubernetes native
applications within OpenShift environments. The Source to Native Operator
simplifies the process of building, deploying, and managing containerized
applications by automating various tasks related to application lifecycle
management.

>>> No, it is not. In the context of Openshift, what is an SNO?
SNO is a well-known acronym that stands for "Simple Network Management
Protocol." It is a network protocol used by various devices to communicate
and manage the status of their connections, ensuring seamless communication
between them. I'm here to help with any questions or concerns you might have
about SNO or any other topic!

No, the model is not on the way of answering about SNO? Lets extend Merlinite knowledge. How to add more knowledge or skills are well documented here.

Basically, you create a qna.yaml over the Taxonomy (that was downloaded during your installation, actually with ilab init). The taxonomy is a set of directories with custom knowledge, something structured like this:

knowledge/
├── animals
│   ├── birds
│   │   └── black_capped_checkadee
│   ├── black_rhino
│   ├── blue_foot_booby
│   ├── cloning
│   │   └── dolly_the_sheep
│   ├── indian_elephant
│   ├── tabby_cat
│   └── zebu
├── archeology
│   └── fossils
│       └── jurahylobittacus
├── astronomy
│   └── constellations
│       ├── Phoenix
│       └── Telescopium
├── biology
│   └── mitochondrion
├── british_history
│   └── royal_corgi

In my case, I will create a new knowledge, writing a qna inside knowledge/technical_manual/openshift/sno/:

version: 2
task_description: 'Teach the model about SNO.'
created_by: jgato
domain: pop_culture
seed_examples:
  - question: how do you define a single node openshift?
    answer: |
      About OpenShift on a single node (SNO)

      You can create a single-node cluster with standard installation methods.
      OpenShift Container Platform on a single node is a specialized
      installation that requires the creation of a special ignition
      configuration ISO. The primary use case is for edge computing
      workloads, including intermittent connectivity, portable clouds, and
      5G radio access networks (RAN) close to a base station.
      The major tradeoff with an installation on a single node is the
      lack of high availability.

  - question: What is an SNO?
    answer: |
      A SNO is a single-node Openshift,
      or Single Node Openshift.
      A cluster composed by only one node.
      Specially designed for working at Edge,
      but it can also run on suppored clouds.
  - question: Can you add workers to an SNO?
    answer: |
      Yes, a SNO allows to add an extra worker.
  - question: What are the requirements for an SNO?
    answer: |
      | **Prile**      | **vCPU**     | **Memory**  | **Storage** |
      |----------------|--------------|-------------|-------------|
      | Minimum        | 8 vCPU cores | 16 GB RAM   | 120 GB      |
  - question: Which is the preferable method to deploy an SNO?
    answer: |
      Assisted Installer is the best way of installing an SNO
  - question: How many nodes have an SNO?
    answer: |
      An SNO, that stands for Single Node Openshift,
      it is compossed by only one node.
      But there is one special configurarion that allows
      to add one worker.
document:
  repo: https://github.com/jgato/ilab_knowledge.git
  commit: eb5da33
  patterns:
    - "Meet single node OpenShift.md"

Just some simple questions and answers, and the reference to a document with extra information. This document is just a Markdown version of a blog talking about SNO. All the credits for this knowledge to: Moran Goldboim, Eran Cohen.

Generate synthetic data

Now that we have our new knowledge, we need to generate more data. InstrucLab will use the running model to generate more synthetic data from the added knowledge.

First, validate the Taxonomy with your new knowledge:

$ ilab diff
knowledge/technical_manual/openshift/sno/qna.yaml
Taxonomy in /taxonomy/ is valid :)

And generate more data, that later, will re-train the model.

$ ilab generate --model models/granite-7b-lab-Q4_K_M.gguf --num-cpus 24 --num-instructions 50

llama_cpp_python is built without hardware acceleration. ilab generate will be very slow.
Generating synthetic data using 'models/merlinite-7b-lab-Q4_K_M.gguf' model, taxonomy:'taxonomy' against http://127.0.0.1:8000/v1 server
Cannot find prompt.txt. Using default prompt depending on model-family.
  0%|                                                                   | 0/10 [00:00<?, ?it/s]Synthesizing new instructions. If you aren't satisfied with the generated instructions, interru
pt training (Ctrl-C) and try adjusting your YAML files. Adding more examples may help.                                                                                                        INFO 2024-06-14 07:48:23,797 generate_data.py:506 Selected taxonomy path knowledge->technical_manual->openshift->sno


^[[1;5D^[[1;5D^[[1;5D^[[1;5D^[[1;5CQ> What does SNO stand for in the OpenShift context?
I>
A> SNO stands for Single Node OpenShift, which is a new type of OpenShift deployment introduced in OpenShift version 4.9. It combines both control and worker node functionalities into a sing
le server, making it an ideal solution for edge computing workloads with limited space, power, and cooling resources.

 10%|█████▉                                                     | 1/10 [00:55<08:23, 55.99s/it]INFO 2024-06-14 07:49:19,791 generate_data.py:506 Selected taxonomy path knowledge->technical_m
anual->openshift->sno
..
..
INFO 2024-06-14 08:37:03,431 generate_data.py:608 100 instructions generated, 24 discarded due to format (see generated/discarded_merlinite-7b-lab-Q4_K_M_2024-06-14T07_52_52.log), 13 discarded due to rouge score
INFO 2024-06-14 08:37:03,431 generate_data.py:612 Generation took 2651.20s

From the 100 instructions generated, only a few are discarded. Good sign.

For the final model, I re-generated the data with more instructions. This was an iterative process of tuning the qna content and the number of instructions.

During the generation process, you’ll see the model generating questions and answers. If you’re not satisfied with how it’s going, you may need to fine-tune your qna file.

Some comments on generating data:

--num-cpus using 24, anyway, if there is no GPU… be patient.
--num-instructions is about the new generative data sets created from you qna.
--model to specify that I am using Merlinite to generate the synthetic data.

Train the model

Notice: When writing this article, training on Linux using CPU was quite challenging and resulted in an “inaccurate” model. I tried the suggestion found here, which provided me with a working trained model. There are ongoing efforts to improve training and incoming features that will allow for better training regardless of the platform. While training on CPU is not suitable for serious model training, it can be useful for casual exploration and learning.

Alternatives: If you prefer to train without using your CPU, you can try training on Google Colab Notebook. This notebook allows you to upload generated data and train the model using a GPU. However, keep in mind that this approach requires more manual effort, and you won’t have the benefit of using InstructLab’s user-friendly interface, which simplifies the complexity and doesn’t require technical/AI expertise.

So, lets train using InstructLab with Linux and CPU.

$ ilab train --iters 300 --num-epochs 3
INFO 2024-06-21 03:15:08,999 config.py:58 PyTorch version 2.3.1+cpu available.
LINUX_TRAIN.PY: NUM EPOCHS IS:  3
LINUX_TRAIN.PY: TRAIN FILE IS:  generated/train_merlinite-7b-lab-Q4_K_M_2024-06-19T09_16_56.jsonl
LINUX_TRAIN.PY: TEST FILE IS:  generated/test_merlinite-7b-lab-Q4_K_M_2024-06-19T09_16_56.jsonl
LINUX_TRAIN.PY: Using device 'cpu'
LINUX_TRAIN.PY: LOADING DATASETS
/home/jgato/playing/instructlab/venv/lib64/python3.9/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resu
me when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
pad_token <unk>
LINUX_TRAIN.PY: NOT USING 4-bit quantization
LINUX_TRAIN.PY: LOADING THE BASE MODEL
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  1.84it/s]
LINUX_TRAIN.PY: Model device cpu
LINUX_TRAIN.PY: SANITY CHECKING THE BASE MODEL
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [06:01<00:00, 60.23s/it]
LINUX_TRAIN.PY: GETTING THE ATTENTION LAYERS
LINUX_TRAIN.PY: CONFIGURING LoRA
Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 1189.65 examples/s]
/home/jgato/playing/instructlab/venv/lib64/python3.9/site-packages/accelerate/accelerator.py:444: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 o
f Accelerate: dict_keys(['dispatch_batches', 'split_batches', 'even_batches', 'use_seedable_sampler']). Please pass an `accelerate.DataLoaderConfiguration` instead:
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)
...
...
Wrote training_results/final/ggml-model-f16.gguf

Looks like we have a new model that we can serve.

Chatting with the new model

So, lets serve the new extended model:

$ ilab  --config ./config-custom.yaml serve
INFO 2024-06-12 05:56:51,935 lab.py:340 Using model 'models/ggml-model-f16.gguf' with -1 gpu-layers and 4096 max context size.
INFO 2024-06-12 05:56:54,729 server.py:206 Starting server process, press CTRL+C to shutdown server...
INFO 2024-06-12 05:56:54,729 server.py:207 After application startup complete see http://127.0.0.1:8000/docs for API.

Is the new model able to talk about SNO?

$ ilab --config ./config-custom.yaml chat -m models/ggml-model-f16.gguf
╭────────────────────────────────────────────────── system ──────────────────────────────────────────────────╮
│ Welcome to InstructLab Chat w/ MODELS/GGML-MODEL-F16.GGUF (type /h for help)                               │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
>>> hi
Hello! I'm here to help answer any questions or concerns you might have.
How can I assist you today?

>>> what do you know about SNO?
SNO stands for "Single Node OpenShift", which is a deployment option for the
OpenShift container platform that allows users to deploy and manage their
workloads in a single node environment. This is particularly useful when
implementing edge computing scenarios, where resource constraints or
connectivity issues may impact the availability of centralized management
clusters. In such environments, it is essential to ensure that the workload
remains operational with its existing configuration, even when any centralized
management functionality is unavailable.

Single Node OpenShift provides users with a consistent experience across the
sites where OpenShift is deployed, and it enables operators to manage their
environments in the same way, regardless of the size of the
deployment. This consistency helps reduce the complexity and cost associated
with implementing multiple versions of OpenShift tailored for different use
cases or environments.

>>> which scenarios are recommended for SNO?
Single Node OpenShift is particularly well-suited for several use cases and
environments, including:
 1. **Edge computing scenarios:** Due to its small footprint and ability to
 operate autonomously, SNO is ideal for deployments that require minimal
 management overhead or where connectivity may be intermittent  or unavailable.
 This makes it especially useful in telecommunication service provider (TSP)
 Radio Access Network (RAN) implementations, which often have stringent
 latency requirements.
 2. **Resource-intensive workloads:** SNO can handle demanding tasks that
 consume the majority of available system resources, ensuring optimal
 performance even when facing heavy loads. This is particularly
 useful in scenarios where users need to deploy and manage their
 workloads with minimal external intervention or dependency on centralized
 management clusters.

Not bad!! ;)

It knows the basic, in the context of Openshift, an SNO is just a single-node Openshift. Interesting that also learnt how it is very related to Telco and RAN (Radio Access Network). Actually, this was the main reason of developing SNO by Red Hat.

What about me?

Obviously the model does know nothing about me. So, I did a quickly train with some info from me and my LinkedIn Profile, and….:

>>> who is Jose Gato?
 José Gato is a Senior Manager at Red Hat with more than 15 years of
 experience in the technology sector.
 He holds a Bachelor's degree in Computer Engineering from the Universidad
 Rey Juan Carlos and has worked  on various projects related to Open Source,
 Cloud, IoT, AI, Data Science, and others. His career includes  several
 positions at different companies, such as Red Hat, Telefónica, HP, IBM,
 Accenture, and GatoTech.
 He is also a member of the Open Source Software (OSS) Group in Spain.

>>> how tall is he?
 Jose Gato's height is approximately 1.8 meters or 5 feet, 10 inches
 (in approximate terms).

Cool, not bad!! I love how he invented some companies like GatoTech ;)

>>> who is his brother?
│ José Gato has two brothers: José Luis and Juan.                                                            │
>>> and his mother?
│ José Gato's mother is named Carmen.                                                                        │

I think it is confusing my Surname with my second name and invented I have a brother called Jose Luis, but many people also makes mistakes with my name and surname. And yes, my brother is Juan. But my mother is not Carmen.

Well, this was just for a little bit of fun, and surely needs much more work tuning the model:

>>> would Jose Gato be the next CTO at Red Hat?
 It is not known if he will be the next CTO at Red Hat. The position has
 been vacant since December 2021 when Guiherme de Almeida Filho left
 the company.                                                         │

not my intention anyway ;)

Conclusions

InstructLab is laying the groundwork for a new community by providing tools that enable users to create custom large language models (LLMs) without requiring technical or AI skills. Imagine modeling your company’s knowledge, personal documents, and notes within a taxonomy. You can build and run your own model locally, ensuring complete privacy control.

In this blog post, I summarize my experience with these tools. However, behind the scenes lies a journey of trial and error. The tools may seem magical in their ease of use, but, of course, there’s no magic involved. To achieve success, you’ll need to have well structure your qna (and keep Markdown formatting plain), as well as carefully manage Linux settings, CPU usage, and other parameters such as the number of instructions and epochs. Nevertheless, the overall feeling and experience is close to entering a new world of opportunities.

So, lets contribute to this awesome new community!! :)