Why I Built My Own Data Pipeline Tool After 10+ Years as Data Scientist Then CTO
I started my career as a data scientist at Indosat Ooredoo, then moved to Eureka AI in Singapore where I spent 7+ years — from writing ML models to leading engineering as Head of Engineering & ML. Eventually I became CTO, responsible for the whole data stack from ingestion to insight.
And at every single company, the same thing happened.
The Pattern
It always starts simple. You need to move data from A to B. You write some SQL. You build a pipeline. Maybe you adopt dbt because everyone says you should. Then you need ML features, so you add Feast. Then you need metrics that everyone agrees on, so you build a semantic layer. Then someone wants “self-serve analytics,” so you add a BI tool. Then someone wants to ask questions in natural language, so you wire up an LLM to your database.
Suddenly your data team is maintaining 5-7 tools that barely talk to each other.
The Breaking Point
The last straw was watching a data team spend three days trying to answer a simple business question: “What’s our revenue by region this quarter?”
Three days. Not because the data wasn’t there. Not because the team wasn’t smart. But because:
- The pipeline that calculated revenue was in dbt
- The region mapping was in a different system
- The metric definition existed in two places with slightly different logic
- Nobody could agree which numbers were “right”
- And the AI chatbot they’d bolted on top had no context about any of this
Three days for a question that should take three minutes.
What I Built
Seeknal is a CLI tool that handles the full loop: define pipelines, serve features, manage metrics, and analyze data with an AI agent — all from one unified graph.
pip install seeknal
seeknal init my-project
seeknal draft # scaffold your pipeline
seeknal dry-run # compile, preview, check data quality
seeknal apply # execute with incremental awareness
seeknal ask "show me revenue by region last quarter"
The workflow is inspired by Terraform and kubectl. Data engineers love dry-runs — you see exactly what will happen before it happens. No more “oops, I just overwrote the production table.”
The AI Agent That Knows Your Data
seeknal ask is an AI agent that has full context of your pipelines, schemas, and lineage. It can:
- Answer questions about your data in natural language
- Profile datasets and find anomalies
- Build new pipelines from a description
- Generate reports
- Ingest files (even Excel) directly into your pipeline
It supports Google Gemini, OpenAI, Anthropic, or runs fully local with Ollama. Because sometimes your data shouldn’t leave your machine.
Why Open Source
I’ve been the person who couldn’t afford expensive data tools. I’ve been the team of one trying to build a data stack with zero budget. Open source leveled the playing field for me, so I’m doing the same.
Seeknal is Apache 2.0. No lock-in. Install it, use it, modify it, deploy it however you want.
Where It’s At
23 releases. Production-grade pipeline execution. Incremental processing. Column-level lineage. Data quality checks. AI agent with 16 tools and 11 built-in skills. PostgreSQL and Apache Iceberg support.
If any of this resonates — if you’ve ever felt the pain of managing too many data tools that don’t work together — I’d love to hear your story. Chances are, I’ve lived it too.