We Took Google's New AI Analyst for a Test Drive

Apr 17

i recently received a marketing email from Google talking up their Analytics Advisor. It’s an AI assistant baked into Google Analytics, powered by Gemini, pitched as your "always-on data analyst." The marketing email i received opened with "meet your new, always-on data analyst."

That framing is showing up everywhere right now. Brands are asking harder questions about what their analysts actually do. A handful are already acting on those questions, reducing headcount and replacing the work with tools like this one. The pitch is familiar it’s faster, cheaper, always available, and just as good.

It's a fair question to ask. And it deserves a better answer than "of course you still need humans" or "AI will replace everyone in six months." So instead of speculating, we signed into a real enterprise GA4 property after reading all the hype in the email and spent a few hours using the thing like an analyst would.

This is what we found. It isn't a takedown. Google's team has built something genuinely impressive in places. It's also not an endorsement. The tool does a few things well and several things badly, and the difference matters enormously depending on what you're asking it to do.

How we tested it

We used Analytics Advisor against a real enterprise GA4 property belonging to a major North American cruise brand. Substantial traffic volume, a full funnel, real revenue, real seasonality.

We organized our questions around four archetypes, because that's a useful way to think about what an analyst actually does day to day:

Reporting. What happened? ("What were my top traffic sources last month?")
Diagnostic. Why did it happen? ("Why did purchases peak on January 28?")
Strategic. What should we do? ("What should we prioritize to grow this business?")
Anomaly. Is this signal or noise? ("Are there meaningful changes I should worry about?")

These aren't arbitrary categories. They're a rough hierarchy of complexity and of what we actually pay analysts for. The first one is dashboard work. The fourth one is where analysts earn their keep. We wanted to see where on that ladder the tool stopped being useful.

No cherry-picked prompts. Same property throughout.

What worked

Credit first, because there's some real credit to give.

The reporting layer is fast and accurate. We asked "What were my top traffic sources last month?" and got a clean prose summary, a top-ten channel table, and accurate session counts. Thirty seconds, no navigation, no report-building. For a busy marketer who just wants a number, that's useful.

It auto-identified the industry. When we asked the open-ended strategic question, the tool's visible reasoning pass mentioned it was "researching industry benchmarks and growth strategies for luxury travel and e-commerce." It got the category right from context alone.

It caught a real data integrity issue. This was the most analyst-like moment we observed. Mid-analysis, the tool noticed that total ecommerce purchases (352,361) equaled total revenue ($352,361), meaning an implied average order value of exactly $1.00. It flagged this explicitly as "likely a data configuration issue or a very high volume of micro-transactions" and recommended an implementation audit. That is a legitimate, non-trivial catch. A good analyst would have caught it. An average analyst might have missed it. The tool did not.

So, when the job is "tell me what the data says, and flag when the data is obviously wrong," Analytics Advisor does a credible job.

The harder test is what happens when you move up the ladder.

Where analyst work actually begins

We asked, "Why did purchases peak on January 28, 2026, and what should we do to sustain that kind of performance?"

January 28 was a real spike, surfaced by the tool itself. Good grounds for a diagnostic test.

The tool came back with three drivers:

One product accounted for 50% of purchases that day, "suggesting a highly effective promotion or specific demand."
Direct traffic surged by +460 purchases, "which typically points to a direct marketing effort like an email campaign."
Desktop users drove +341 purchases, "which often indicates a demographic that prefers a more detailed checkout experience or responded to a professional/work-hour outreach."

And four recommendations:

Study the January 28 email and replicate its messaging.
Retarget cart abandoners from that day.
Optimize product landing pages for desktop, since desktop converted best.
"Double down on Direct-to-Consumer channels (Email/Loyalty)" because Direct traffic outperformed Paid and Organic.

This is where a careful reader starts noticing things.

"Direct traffic" is not the same thing as "Direct-to-Consumer channels." In GA4, Direct is a technical label for traffic with no referrer like typed URLs, bookmarks, anything with our a referrer really. "Direct-to-Consumer" is a marketing strategy term covering Email, SMS, Loyalty, owned CRM. Those are completely different concepts. A recommendation to "double down on DTC because Direct performed best" is built on a vocabulary collision. A human analyst would not make that mistake. It's a structural tell that the tool is pattern-matching language rather than reasoning about the system.

"Optimize for desktop" is often the wrong instinct. Desktop already converted best. In analytics, that usually means desktop is the ceiling, not the opportunity. A human analyst typically flips this: understand what's working on desktop, then look for what's breaking the handoff to mobile, especially for high-consideration purchases where decisions happen collaboratively across devices. The tool recommended pouring resources into the channel with the least headroom.

The psychology is fabricated. "Desktop users are a demographic that prefers a detailed checkout experience or responded to professional/work-hour outreach." That's the shape of an insight without the substance of one. For a cruise brand, desktop actually peaks because cruises are $2,000 to $10,000+ purchases, high-consideration, and frequently planned collaboratively with a spouse on a shared home computer. Completely different story. Completely different implication.

The diagnosis is hedged in a way that hides its own weakness. "Suggesting a highly effective promotion or specific demand." Those are opposite diagnoses that require opposite strategies. A promotion means "replicate the creative." High demand means "build inventory around a hot product." A human analyst would go find out which it was before recommending anything. The tool hedged with "or" and still produced four confident recommendations.

The answer was articulate, specific, and would actively mislead anyone who didn't know the business well enough to catch the errors.

The strategic test

We asked the biggest question next. "What should we prioritize to grow this business over the next year?"

This is where you'd expect the most obvious failure. It's also where the tool did its best work.

Advisor entered a multi-step mode. It announced that it would conduct a comprehensive 12-month audit, analyze year-over-year trends, map the purchase funnel, identify hero products, and research industry benchmarks. It surfaced three strategic pillars.

Pillar 1: Reduce single-product dependency. The business was heavily concentrated in one SKU (roughly 50% of revenue). The tool recommended doubling down on that product's lifecycle and developing a second hero to diversify. Reasonable.

Pillar 2: Close the gap between traffic and conversions. This is where it found the $1 AOV anomaly and recommended an implementation audit. Real analyst-grade work.

Pillar 3: Data-driven funnel optimization. "Reduce the Abandonment Rate between Add to Cart and Purchase. If drop-off is at the payment stage, add more payment options or simplify the form."

And that's where the wheels came off for this specific business.

A cruise brand doesn't have a traditional "Add to Cart" flow. Cruise purchases involve itinerary selection, cabin category selection, passenger configuration, a deposit, and a final payment made weeks or months later. It's a multi-session, staged-payment model. The advice to "add Apple Pay and simplify the form" is generic ecommerce playbook that doesn't apply to a cruiseline the same way it would apply to a shoe company.

Look across all three pillars. Every recommendation could be repurposed for a SaaS company, an apparel retailer, or a consumer electronics brand with zero modification. Nothing was true because the business was a cruise line. None of the actual cruise growth levers appeared in the roadmap. No mention of past-guest reactivation, despite cruise having a very high repeat rate. No mention of travel advisor enablement, despite advisors driving more than 60% of the industry's bookings. No mention of the booking curve, shoulder-season demand stimulation, or suite upsell, which is where real average-order-value lift actually lives for this category.

The tool knew it was a travel brand. It still delivered a generic ecommerce playbook.

There's one more subtlety worth flagging. In Pillar 2, the tool flagged the revenue data as potentially broken. In Pillar 3 and in its final "next year's priority" summary, it recommended a value-led strategy built on that same revenue data. A human analyst who finds a data integrity issue immediately questions every prior conclusion that rested on the data. The tool didn't. It kept generating.

The anomaly test

We saved the most revealing test for last. "Have there been any meaningful changes in user behavior in the last 30 days that I should be concerned about, or is everything within normal variance?"

The tool's opening:

"The last 30 days are highly stable and largely within normal variance, with some positive trends in user engagement and regional growth. There are no major causes for concern."

Good news. Except the same response included a landing page table with the following declines, all on revenue-critical pages:

The top-of-funnel destination discovery page: down 18.0%
The main product discovery tool: down 9.4%
The special offers page: down 12.4%
The logged-in customer dashboard: down 17.5%

Four of the top ten pages on the site, all in revenue-driving areas, all down double digits. For a cruise brand, those pages are where demand signal lives. The customer dashboard matters even more, because cruise has unusually high repeat rates and past-guest engagement is one of the most valuable metrics in the category.

The tool filed these under a section titled "Areas for Minor Attention." Its recommendation: "You may want to refresh the content or promotions on these pages to maintain their pull."

Refresh the content. For an 18% drop on destination pages.

A human analyst would open with these numbers, not bury them. They'd name the pages, rank them by revenue impact, and ask diagnostic questions: did we change anything, did a campaign end, did search rankings move, is this a tracking issue, is this seasonality. They would not frame a set of double-digit declines on the most important pages of the site as "minor attention."

The same response showed a few other patterns worth noting. Total sessions declined 2.6% overall, but the tool led with regional gains in Canada (+22,000) and Australia (+6,000). It never mentioned that the combined growth in those markets was smaller than the total decline, meaning the dominant market (the United States) fell more than the gains elsewhere. And it confidently asserted that "no configuration changes were detected, confirming that the data shifts are due to genuine user behavior rather than technical tracking issues," while ignoring the $1 AOV data-integrity issue it had flagged earlier in the same session.

The pattern across this response was not random error. It was a consistent tendency to lead with reassuring framings and de-emphasize contrary evidence. That's the opposite of what an analyst does. An analyst is paid to find what's wrong.

The pattern across four tests

Step back from any single question. A shape emerges.

Analytics Advisor is excellent at retrieval and summarization. It is credible at flagging data anomalies that are obvious within the data itself. It is unreliable at diagnostic reasoning that requires understanding the business. It is weak at strategy beyond generic ecommerce best-practices. And it has a strong reassurance bias when asked to judge whether things are okay.

This isn't a Google problem. It's the architectural reality of vendor AI. A tool built on top of a platform's data can only see what the platform sees. Industry dynamics, organizational history, competitive posture, distribution model, seasonality that spans years, stakeholder politics, the last three campaigns the brand ran, the quirks of the category, none of it is visible to the tool. Everything that makes analysis analysis lives outside the data the tool has access to.

The same critique applies, with different logos, to Adobe Sensei, Optimizely's Opal, Salesforce Einstein, Amplitude's Ask Amplitude, and every other vendor AI that's been shipped into a platform in the last three years. This is not a Google critique. It's a category observation.

And there's a subtler structural issue. An AI trained on a platform's worldview will recommend doubling down on that worldview. Analytics Advisor's recommendations included "leverage Direct-to-Consumer channels" and "optimize your landing pages," both of which route into Google's own Ads and Optimize ecosystems. That's not a conspiracy, that’s really happening. Build a tool inside a platform and its advice will pull toward the platform.

A human analyst can look at your GA data and say "your data is telling you to stop buying Google Ads and invest in email." A Google-built AI will never tell you that. Not because it's malicious, but because the training, the UI, the suggested prompts, and the clickable actions all point the other way.

When the tool fits, and when a human still belongs

None of this means AI has no place in analytics. It absolutely does. The skill is knowing which job you're hiring the tool for.

Reasonable places to use it right now:

First-pass reporting. "What happened with X last month" is genuinely faster than building a report. For a busy marketer or executive who wants a number, this is a legitimate time-saver.
Data hygiene checks. The AOV catch we observed is a good example. Surface-level anomalies that are obvious within the data itself are well within range.
Hypothesis generation. Use it to surface candidate explanations, then have a human evaluate and pursue the ones that make sense.
Democratized access. Someone in a non-analytics role who wants to pull a quick number without bothering the analytics team: reasonable fit.

Places where a human analyst still belongs:

Diagnostic reasoning under business context. The Jan 28 test. You need someone who knows what was happening in the business, what campaigns ran, what the category does, and what the distribution model is.
Strategic prioritization. The growth-roadmap test. Generic playbooks don't work in businesses with unusual dynamics, and most real businesses have unusual dynamics.
Anomaly triage that matters. The reassurance bias test. When the question is "should I worry about this," you need someone who is paid to worry.
Any analysis whose answer will drive a real business decision. If someone's going to reallocate budget, hire, fire, shift strategy, or make a promise to a board, the analysis behind that decision needs a human in the room. Not because the AI is stupid. Because the stakes require accountability the tool cannot carry.

The framing we keep coming back to is it's not either/or. The AI shortens the retrieval loop. The human does the reasoning. The best teams will use both well, and the worst teams will use the tool as a substitute for the judgment it can't actually provide.

The takeaway

Analytics Advisor is a useful tool. It is not an analyst. It’s not an advisor. The marketing language around it, from Google and from every competing product in this category, is conflating two different kinds of work and hoping nobody notices.

If your analytics questions are mostly "what happened," the tool will save you time.

If your analytics questions are mostly "why, what should we do, and is this a problem," the tool will give you articulate, plausible answers that will mislead you in proportion to how unusual your business is. The more distinctive the category, the worse the advice.

That's not a reason to avoid it. That's a reason to know what you're using it for. The teams who do that well will gain leverage. The teams who treat it as a replacement for the person asking the right questions will make worse decisions faster, and they'll have a harder time noticing, because the output will always sound confident.

Which, if you're thinking about firing your analyst, is worth sitting with for a minute.

jason thompson

Jason is CEO of 33 Sticks, a boutique analytics consultancy specializing in conversion optimization and analytics transformation. He works directly with Fortune 500 clients to maximize their use of data while helping team members reach their potential. He writes about data literacy, critical thinking, and why most "insights" aren't.

Subscribe to Hippie CEO Life for thoughts on doing good work in a world optimized for engagement.

https://www.hippieceolife.com/