Notes from San Francisco

MLOps

MLOps (short for Machine Learning Operations) is a set of practices, tools, and processes that aim to automate and streamline the lifecycle of machine learning models, from development to deployment and monitoring — much like DevOps does for software engineering.

🔧 MLOps = ML + DevOps

It combines:

Machine Learning (ML): building and training models
DevOps: automating software delivery and infrastructure changes

🔁 Key Stages of the MLOps Lifecycle:

Model Development
- Data preprocessing
- Feature engineering
- Model training & validation
- Experiment tracking
Model Deployment
- Packaging the model (e.g., with Docker)
- Deploying to production (REST API, batch, streaming)
- Versioning models
Model Monitoring & Maintenance
- Monitoring performance (accuracy, drift, latency)
- Retraining or rolling back as needed
- Logging and alerts

⚙️ MLOps Tools (Examples):

Data versioning: DVC, LakeFS
Model training: MLFlow, Kubeflow, SageMaker
CI/CD pipelines: GitHub Actions, Jenkins, Argo Workflows
Monitoring: Prometheus, Seldon Core, WhyLabs

✅ Benefits of MLOps:

Faster and more reliable ML deployment
Reproducibility and auditability of experiments
Continuous training and evaluation
Scalability of ML systems
Collaboration between data scientists and engineers

AI Agent Loop

Option One

An AI Agent Loop refers to the cyclical process by which an autonomous AI agent perceives its environment, plans actions, executes those actions, and reflects on the results. This loop enables the agent to operate intelligently in dynamic environments by continually adapting its behavior based on feedback and outcomes. It is foundational to agentic architectures, including tools like Auto-GPT, LangChain agents, and ReAct-based systems.

The core loop typically consists of the following stages:

Perception (Observation/Input): The agent receives new information—such as a user prompt, a tool/API response, or external data from a knowledge base.
Planning (Reasoning/Decision-Making): The agent uses a language model (or multiple models) to decide what to do next. This might include formulating a subtask, selecting a tool, or querying a database.
Action (Execution): The chosen action is carried out, such as calling an API, searching a document store, or interacting with a web service.
Reflection (Feedback Integration): The agent evaluates the result of the action. It may update memory, revise its plan, or take another action based on what it has learned.

This loop can continue recursively until a task is complete or a stopping condition is met. What’s especially powerful about agent loops is their ability to simulate human-like reasoning, breaking complex problems into smaller steps and learning dynamically from their environment. For instance, an AI agent might be tasked with booking travel. It would search for flights, check hotel availability, validate dates against a calendar, and iterate through possible conflicts—all within its agent loop.

The agent loop is key to creating systems that act autonomously and adaptively, especially in real-world applications like research assistants, workflow automation, or customer support bots. It enables agents to bridge the gap between static language models and interactive, goal-driven systems that can reason, act, and self-correct. As agent frameworks continue to evolve, the sophistication and robustness of these loops will be critical to developing reliable, high-performing AI agents.

Option Two

An AI Agent Loop refers to the iterative cycle through which an AI agent interacts with its environment, processes information, and takes actions to achieve a specific goal. This loop is fundamental in autonomous and semi-autonomous AI systems, especially those powered by Large Language Models (LLMs) integrated with tools, memory, and decision-making capabilities. The agent loop typically includes several stages: perception (input), reasoning (planning or inference), action (executing decisions), and reflection (evaluating outcomes), often repeating many times within a session.

At the core of the loop is a feedback mechanism—the agent takes an action (like querying a tool, calling an API, or returning a response), observes the result, and uses that observation to refine its next steps. This feedback may come from user input, system state, or external tool responses. In tool-augmented settings like Retrieval-Augmented Generation (RAG), the agent loop involves fetching relevant documents or calling APIs, then incorporating those results into future reasoning steps. By re-evaluating context after each step, the loop allows for more sophisticated, dynamic, and goal-driven behaviors.

The memory component can be integrated into this loop to track past actions, outcomes, and user instructions, enabling the agent to maintain context over time. This persistent state helps the agent avoid repetition, follow long-term plans, or provide more personalized assistance. Agents with memory can summarize, categorize, and retrieve past events to influence current decisions, extending their usefulness in real-world applications such as personal assistants, customer service bots, or autonomous research tools.

Ultimately, the AI Agent Loop allows a system to simulate intelligent, adaptive behavior by continuously refining its understanding and actions based on new data. It bridges static prompt-response behavior and dynamic problem-solving, making AI agents more capable in complex, evolving tasks. The design and tuning of this loop—how the agent thinks, what tools it uses, when it stops—are central to building effective and safe autonomous systems.

Disclaimer: Details above ChatGPT generated.

Using RAG with existing LLMs

Retrieval-Augmented Generation (RAG) can be effectively used with existing language models (like OpenAI’s GPT-4, Anthropic’s Claude, Meta’s LLaMA, or open-source models via Hugging Face) without needing to retrain them. The core idea is to supplement the model’s knowledge with external documents retrieved at runtime—enhancing factual accuracy, domain relevance, and recency.

✅ How RAG Works with an Existing Model

Here’s a typical RAG flow using an existing model:

User Query
→ e.g., “What are the latest techniques in cancer immunotherapy?”
Retrieve Relevant Context (Knowledge)
- Use a retriever to find top-k documents or chunks from a vector database or search index.
- Retrieval is often done using vector similarity (e.g., using FAISS, Chroma, Pinecone).
Augment Prompt (Grounding)
- Insert retrieved content into the prompt/context passed to the language model.
- e.g., “Based on the following research papers: [chunk1] [chunk2]… answer the question…”
Generate Answer (Using Existing Model)
- Pass the augmented prompt to the model (e.g., OpenAI GPT-4 via API).
- The model responds using both its internal knowledge and the provided documents.

🛠️ Tools that Can be Used with Existing Models

Component	Open Source Option	Hosted Option
Retriever (embedding + vector store)	`sentence-transformers`, `FAISS`, `Chroma`	Pinecone, Weaviate
Text splitter / Document prep	`LangChain`, `Haystack`	N/A
Language Model	Hugging Face Transformers	OpenAI (GPT), Anthropic (Claude), Cohere
Framework (optional)	`LangChain`, `LlamaIndex`	LangChain cloud

Benefits of Using RAG with Existing Models

No need to fine-tune: Leverages powerful pre-trained models with external context.
Better factual accuracy: Uses grounded data (your documents or latest info).
Custom domain knowledge: You can inject proprietary or domain-specific data.
Scalable: Just update your vector DB; no model retraining required.

Disclaimer: Details above are generated by ChatGPT

RAG pipeline building frameworks comparison

When building a Retrieval-Augmented Generation (RAG) pipeline, the “best” tool depends on your goals, level of abstraction, and control you want over the components. Here’s a breakdown of LangChain, Hugging Face, and PyTorch, to help you choose:

🧱 1. LangChain:

Best for: Rapid prototyping and production-ready apps with modular components

Pros:
- High-level framework with built-in components for RAG: document loaders, text splitters, retrievers, and chains.
- Supports OpenAI, Hugging Face models, vector stores like FAISS, Chroma, Pinecone.
- Easy to build with memory, agents, and tools like web search or APIs.
- Lots of integration examples and growing community.
Cons:
- Less control over low-level model behavior.
- Performance tuning and debugging can be tricky if you need custom logic.

✅ Use LangChain if you want to build fast, integrate easily with LLMs and vector DBs, and focus on app logic rather than infrastructure.

🤗 2. Hugging Face (Transformers + Datasets)

Best for: Full control over models, fine-tuning, or self-hosted RAG pipelines

Pros:
- Massive ecosystem of pre-trained models (retrievers, rerankers, generators).
- You can mix-and-match dense retrievers (like DPR) and generators (like T5, LLaMA).
- Great if you’re doing inference locally or deploying custom models.
Cons:
- More work to build RAG logic from scratch (document splitting, indexing, memory).
- No unified framework like LangChain for chaining components.

✅ Use Hugging Face if you need flexibility, fine-tuning, or self-hosting without relying on OpenAI.

🔧 3. PyTorch

Best for: Researchers and ML engineers building everything from scratch

Pros:
- Full low-level control over training and model internals.
- Good for building custom retrievers or generators.
- Essential if you’re training your own dense embeddings (e.g., with sentence transformers).
Cons:
- Very low-level: no built-in RAG pipeline or easy integration with external tools.
- Not ideal for quick prototyping.

✅ Use PyTorch if you’re building or experimenting with novel RAG architectures or training your own models from scratch.

🔚 Conclusion:

Use Case	Best Tool
Fast RAG prototyping	LangChain
Customizable, open RAG stack	Hugging Face
Full control / training models	PyTorch

Disclaimer: Details above are ChatGPT generated.

End of May – Roland Garros starts

Second grand slam of the year

Number 51 and good reads

Walter Isaacson books are good read, well researched biographies and more.

Code Breaker – Jennifer Doudna, Gene Editing, and the Future of the Human Race is a great addition to Dr. Isaacson work

Fascinating story of the researcher, and developments of CRISPR-Cas9 that have lead to Nobel Prize in 2020.

ATP Notes

Miami Open – winner Carlos Alcaraz

Monte Carlo Masters 1000 – winner Stefanos Tsitsipas

Russian Library Columbia University Press

Between Dog and Wolf by Sasha Sokolov

Strolls with Pushkin by Andrei Sinyavsky

Redemption by Friedrich Gorenstein

A Double Life by Karolina Pavlova

Into Quarter-Finals in Miami Open

Win against Jenson Brooksby 7-6, 6-1 gets Daniil Medvedev into Quarter-Finals match against Hubert Hurkacz

Second match Medvedev – Andy Murray

Second match of two former number one at the time of the match.

First match took place in 2019 in Brisbane, with Medvedev ranked number 2 winning 7-5, 6-2, against Andy ranked 85, on that day.

2022 Miami open – second win for Medvedev 6-4, 6-2 – ATP Tour reference