New content weekly

Exploring Technology
One Video at a Time

Deep dives into self-hosting, AI tools, automation, and the future of personal infrastructure.

6k+
Subscribers
52+
Videos
Growing
Daily

About

Building the Future,
One Tutorial at a Time

I create in-depth content about self-hosting, local AI infrastructure, automation tools, and the technologies that empower individuals to take control of their digital lives.

From homelab setups to AI agent configurations, every video is designed to help you build something real and useful.

Self-Hosting AI & ML Automation Open Source
SpeedyFoxAI

Latest Content

Featured Videos

Explore my latest tutorials, reviews, and deep dives into the tools and technologies shaping our digital future.

Local AI

OpenClaw Memory Upgrade: Jarvis-like Memory System | Full Tutorial

Build a complete Jarvis-like memory system for OpenClaw with Redis, Markdown logs, and Qdrant vector database. Includes blueprints and real-world examples.

Watch on YouTube
Live Coding

OpenClaw + Ollama: AI Agent Builds Website Via SSH (Live Coding)

Watch Kimi build SpeedyFoxAI.com via SSH using only local AI agents in this full live coding session.

Watch on YouTube
Proxmox

Proxmox LXC CUDA Install (The SAFE Way) — Won't Break Your Drivers! 2026

Safe method to install NVIDIA CUDA drivers in Proxmox LXC containers without breaking your system.

Watch on YouTube
Rob - SpeedyFoxAI Creator

About the Creator

Hi, I'm Rob. I create in-depth content about self-hosting, local AI infrastructure, and automation tools that help you take control of your digital life.

Last updated: February 2026

FAQ

Frequently Asked Questions

Everything you need to know about running AI locally.

What is local AI and why should I use it?

Local AI means running large language models (LLMs) on your own hardware instead of using cloud services like ChatGPT. Benefits include: complete privacy (your data never leaves your machine), no usage limits or API costs, no subscription fees, and full control over the AI. Tools like Ollama make this accessible even on consumer hardware.

What's the difference between Hugging Face and Ollama?

Hugging Face is a model repository and community platform with thousands of open-source AI models. Ollama is a purpose-built tool that makes running those models locally as simple as one command. Think of Hugging Face as the "app store" for models, and Ollama as the "player" that runs them. Many users download GGUF models from Hugging Face and run them through Ollama for the best of both worlds.

What do 7B, 13B, 70B mean in model names?

These numbers represent billions of parameters (weights) in the neural network. More parameters = more capable but hungrier for resources: 7B models run well on 8GB RAM and handle basic tasks. 13B models offer balanced performance needing 16GB RAM. 30B+ models provide professional quality requiring 24GB+ VRAM. 70B+ models approach GPT-4 level but need server-grade hardware.

What hardware do I need to run local AI?

Minimum: 8GB RAM for 7B models (Q4 quantized). Recommended: 16GB RAM for 13B models or 7B at higher quality. Ideal: 24GB+ VRAM for 30B+ models. A dedicated GPU (NVIDIA RTX series) dramatically speeds up inference, but modern CPUs and Apple Silicon Macs can run smaller models efficiently. Start with what you have—you can always upgrade.

What is GGUF and why does it matter?

GGUF (GPT-Generated Unified Format) is a model compression format that shrinks LLMs by 4x-8x with minimal quality loss. It uses quantization—storing weights with fewer bits. A 70B model that needs 140GB can run in 40GB with Q4 quantization. Tools like llama.cpp, Ollama, and LM Studio all support GGUF, making large models accessible on consumer hardware.

What's the difference between Q4, Q6, and Q8 quantization?

These are compression levels for GGUF models: Q4_K_M (4-bit) is best for limited RAM—about 70% of original quality, good for most use cases. Q6_K (6-bit) hits the sweet spot at ~85% quality. Q8_0 (8-bit) has near-unnoticeable quality loss at ~95% quality if you have VRAM to spare. Rule of thumb: Start with Q4_K_M, upgrade if you notice quality issues.

Can I run local AI without a GPU?

Yes! Tools like Ollama and llama.cpp support CPU-only inference. It's slower (10-30 tokens/sec vs 50-100+ on GPU) but works fine for 7B models on modern CPUs, batch processing, and Apple Silicon Macs (which use unified memory efficiently). A GPU helps but isn't required to get started with local AI—many beginners start on CPU and upgrade later.

What are the privacy benefits of local AI?

With local AI, your data never leaves your machine. No cloud logs, no training data retention, no API call records. This matters for sensitive business documents, personal health information, proprietary code, and any confidential work. You become the data controller, not a third-party AI company. For privacy-conscious users and organizations, local AI is the only way to guarantee data sovereignty.

Is local AI really free? What's the catch?

The models are free (open source). Your costs are: Hardware (one-time cost, or use existing machine), Electricity (~-20/month depending on usage), and Time (learning curve). Compare to ChatGPT Plus (0/month)—local AI pays for itself in 3-6 months if you already own hardware. Plus, no usage limits means you can process unlimited documents, code, or creative projects.

Can I use local AI for coding like GitHub Copilot?

Absolutely! Several models excel at coding: DeepSeek-Coder (free, coding-focused), CodeLlama (Meta's specialized code model), Qwen 2.5 Coder (Alibaba's coding model), and Phi-4 (Microsoft's capable small model). Tools like Continue.dev (VS Code extension) or Ollama + aider integrate local AI directly into your IDE for a Copilot-like experience—completely free and private.

Is self-hosting AI tools difficult for beginners?

Modern tools have made self-hosting much more accessible. Platforms like OpenClaw, Ollama, and various Docker containers provide one-command installations. My tutorials break down each step for beginners while providing advanced tips for experienced users. Start with simpler tools and gradually build your infrastructure—you don't need to be a Linux expert to get started.

How does OpenClaw compare to other AI platforms?

OpenClaw is a fully open-source, self-hosted AI agent platform. Unlike cloud-based solutions, it runs entirely on your infrastructure with no external API dependencies. It supports multiple local LLM models through Ollama, offers team collaboration features, and provides a web interface for managing AI agents—making it ideal for privacy-conscious users, homelab enthusiasts, and organizations wanting complete control over their AI workflows.

Free Resources & Downloads

Access configs, scripts, templates, and tools mentioned in my videos. Everything is open source and ready to use.

Browse Downloads

Tutorials

Step-by-step guides for OpenClaw, local AI, and self-hosted infrastructure.

Google Services Setup

Available

Complete 10-step guide to integrating Gmail, Calendar, Drive, and other Google services with OpenClaw for seamless AI assistant functionality.

What You Will Learn

  • • Create Google Cloud Project
  • • Enable APIs (Gmail, Calendar, Drive)
  • • OAuth configuration
  • • OpenClaw integration
Est. time: 20-30 min
Read Tutorial

Jarvis-like Memory for Kimi

Available Now
jarvis-like-memory

Build a complete multi-layer memory system with Redis (short-term), Markdown (logs), and Qdrant (semantic search). Includes architecture diagrams and testing.

What You Will Learn

  • • Three-layer memory architecture
  • • Redis buffer + Qdrant vector DB
  • • Heartbeat automation + Cron
  • • Complete testing procedures

Memory Architecture

Visual Guide
jarvis-like-memory

Visual reference diagrams showing the complete three-layer memory system architecture. Data flows, infrastructure topology, and command reference.

What You Will Learn

  • • Layer-by-layer architecture
  • • Data flow diagrams
  • • Infrastructure topology
  • • Command reference tables
Reference View Diagrams

Memory Comparison

Side by Side
jarvis-like-memory

Side-by-side comparison of OpenClaw default (no persistence) vs. custom three-layer memory system. Architecture, features, and trade-offs.

Compare

  • • Default vs. custom architecture
  • • Feature comparison table
  • • Pros and cons of each
  • • When to use which

Connect

Let's Build Together

Have a question or want to collaborate? Reach out through any of these channels.