Google Colab SEO: Enhancing Your Data Analysis

Sam Torres from Gray Dot Agency brought a practical message to Tech SEO Connect: stop using LLMs for everything. When you have structured, tabular data—the kind we work with constantly in SEO—machine learning models are faster, cheaper, more accurate, and won’t make things up.

“I’ve seen a lot of times where people are using LLMs to analyze their data,” she said. “It doesn’t seem to matter how many times we talk about the fact that they make things up. They always tell you that your question is good, even when it might not have been. And your data isn’t safe.”

Her solution: use LLMs to write the machine learning code, then run that code in Google Colab. Best of both worlds—conversational AI for coding, mathematical precision for analysis.

Why Machine Learning Over LLMs for Data Analysis

Torres laid out the case for machine learning models over LLMs when working with structured data:

They do specific things really well. Models like BERT for keyword clustering have been refined for their exact purpose. They’re not generalists trying to do everything.

They’ve been tested for 10-15+ years. Academic rigor, peer review, and widespread use have validated these models. There’s transparency in training data, so you can understand potential biases.

Results are reproducible. Same input, same output. Every time. Unlike ChatGPT, which Torres described as “well, today I feel like I like this.”

They’re more efficient. No API limits or token limits to worry about. More cost-effective for large datasets.

They’re honest. No “yes, that’s great” validation. The model just tells you what the data shows.

The rule of thumb: LLMs shine with messy, unstructured, language-heavy data. Machine learning wins with structured, tabular, numbers-based data. SEO analytics is usually the latter.

Google Colab: The Gateway to Machine Learning

Torres introduced Google Colab as the tool that makes machine learning accessible without the setup headaches. It’s a cloud-based Jupyter Notebook environment that builds your dev environment automatically. Free to use. Easy to share for collaboration and methodology review.

The key benefit for client work: “When I put GSC and GA4 data into Google Colab, it’s not being used to train anything else.” Unlike uploading data to ChatGPT or third-party GPTs (which she explicitly warned against), your data stays yours.

A Colab notebook has code cells (where the analysis runs) and text cells (for documentation). Torres emphasized documentation: “As I get older, I can’t remember what I was thinking about six days ago. What do you mean I’m supposed to remember what I decided six months ago?”

The Workflow: LLMs Write the Code, ML Runs It

Torres’s workflow combines the ease of LLMs with the precision of machine learning:

First, define what you’re trying to accomplish. What answer are you looking for? What data do you have? Then ask Claude (or your preferred LLM) which models fit your use case. Have the LLM build the entire notebook. Download it, upload to Google Colab, and run it.

She was honest about troubleshooting: “It doesn’t always work first out-of-the-box. I usually get to step 7 or 8 and then something breaks.” Her trick, learned from a Shopify engineer: end every prompt with “are you sure?” The LLM rewrites the code, usually more efficiently.

Problem Types and Data Considerations

Using the right terminology in your prompts gets better results. Torres outlined the main problem types:

Classification: Grouping things together. Tagging URLs by content topic when you don’t have good folder structure.

Regression: Predicting the future. Finding patterns to forecast what happens next.

Clustering: Finding neighborhoods. Similar to classification but discovering natural groupings rather than assigning to predefined categories.

Dimensionality reduction: Removing noise. Finding the gems in a large dataset by filtering out the junk.

Anomaly detection: Identifying what’s weird in your data. Often the same models used for forecasting.

Data considerations that affect model choice: dataset size (30 days vs. 16 months), format (including images), seasonality and holidays (different by geography), number of dimensions you’re analyzing, and data quality. “Some models can deal with null or zero values, and some models just fall apart. Clean your data first.”

Three Models for Anomaly Detection and Forecasting

Torres revealed a “plot twist”: anomaly detection and forecasting use the same models. The difference is whether you’re asking the model to point out what’s weird or predict what’s coming next.

Isolation Forest

A great starting point. Fast, flexible, works across multiple dimensions (though struggles past 8). Very good at flagging anomalies—sometimes too many, but Torres prefers more to wade through than nothing. Handles messy data well. Less effective at detecting gradual changes.

LOF (Local Outlier Factor)

Finds neighbors and identifies who doesn’t belong. Excellent for keyword-related analysis where terms are related but distinct (like “JavaScript” vs. “migrations” vs. “JavaScript migrations”). Great for A/B test anomaly detection. Works well with word-based data. Less effective for overarching patterns or extremely diverse datasets.

Prophet (Forecast Residual Approach)

Torres’s favorite, though “highest maintenance.” Created by the Facebook team, Prophet handles holidays across many countries and languages. Excellent for time series data (which is most SEO data). Great at identifying gradual shifts—distinguishing algorithm updates from content drift or entity drift. Good for SERP volatility analysis.

The caveat: Prophet can be fragile. Torres shared that a model update broke her working notebook, requiring two hours to troubleshoot. It also needs sufficient data—30 days probably isn’t enough unless it’s log files.

Practical Application

The notebooks Torres demonstrated produce visualizations showing both negative and positive anomalies. Her advice: don’t just focus on what went wrong. “If you focus on what’s right, you can start doubling down and move forward that way.”

She’s sharing all notebooks and models for free—classification, anomaly detection, forecasting, and A/B test variant grouping. No sales pitch, just tools.

Learning Resources

Torres recommended three communities for continued learning:

ML for SEO — Lazarina Stoy’s community. “Everything I know about machine learning, I learned from Lazarina.” Free resources and guidance.

SEO Community — Started by Noah Lerner and others. A welcoming space with a rule: you can’t say “is this a dumb question?” You have to say “help me get smart.”

Women in Tech SEO — Founded by Areej AbuAli. Torres is deeply involved (“It’s literally tattooed on my body”). Great for learning and asking questions.

She also recommended following Lazarina Stoy and Brittany Mueller (who runs Orange Labs, a paid community).

My Takeaways

Torres’s talk was the most immediately practical of the conference. She handed us tools and said “go use them.” No theory, no speculation—just working notebooks you can download today.

What I’m implementing:

1. Stop uploading client data to LLMs. The data isn’t safe, and the analysis isn’t reliable. Google Colab keeps data private and produces reproducible results.

2. Use LLMs to write code, not analyze data. Have Claude build the notebook, then run it in Colab. Best of both worlds.

3. Learn the problem type vocabulary. Classification, regression, clustering, dimensionality reduction, anomaly detection—using these terms in prompts gets better model recommendations.

4. Start with Isolation Forest. Fast, flexible, handles messy data. Good training wheels for machine learning.

5. Use Prophet for time series. It’s higher maintenance but handles the gradual shifts that matter for traffic investigations. Just budget time for troubleshooting.

6. Document everything. You won’t remember your methodology in six months. Text cells in Colab exist for a reason.

The underlying message was empowering: machine learning isn’t as scary as it seems. With LLMs handling the code and Colab handling the environment, the barrier to entry has collapsed. You don’t need to be a data scientist—you just need to know what question to ask.

Machine Learning for SEO: Sam Torres on Using the Right Tool for the Job

Why Machine Learning Over LLMs for Data Analysis

Google Colab: The Gateway to Machine Learning

The Workflow: LLMs Write the Code, ML Runs It

Problem Types and Data Considerations

Three Models for Anomaly Detection and Forecasting

Isolation Forest

LOF (Local Outlier Factor)

Prophet (Forecast Residual Approach)

Practical Application

Learning Resources

My Takeaways

Author: Eric Richmond

Related Posts

Need More Credits

📦 Credit Packs

🔄 Never Run Out - Auto-Refill

Premium Feature

Choose Your Plan

Use Your Own API Key