Can You Trust AI for Financial Advice? We Put ChatGPT and Gemini to the Test

Ask artificial intelligence enough money questions, and you'll notice that large language models (LLMs) often start off with a disclaimer. They're not able to provide personal finance or investing advice, they say — even as they're doing exactly that.
In some cases, the models recommend consulting financial professionals: "They can provide personalized advice that I, as an AI, cannot," Gemini said, responding to a question about choosing between a Roth IRA and a traditional IRA. Prefacing a 600-word response to a prompt about stock-picking vs. investment funds, ChatGPT warned, "Nothing here is personal financial advice."
While an AI model is unlikely to tell you to consult a pro if you have a basic question about what to watch on Netflix, ChatGPT and Gemini seem to recognize that the stakes are high with financial decisions, similar to requests for medical and legal advice. The AI companies behind them are also aware that regulatory agencies increasingly scrutinize how financial advice is given online.
However, a small disclosure is not going to deter someone who's already asked AI for help with money from making their next move based on the response. And why would it? A technology that can produce an answer to just about any question in a matter of seconds — processing more information than humans can possibly wrap their heads around — has so much potential to improve financial literacy. If responses are actionable and reliable, the tech could help people get on the path to financial freedom or even wealth.
With hundreds of millions of people using these tools on a regular basis, it's safe to assume that droves are relying on AI to guide their financial strategies, like picking investments, making tax decisions, navigating the real estate market and much more. In fact, one recent report found that 27% of people say they would trust AI to manage their finances over their significant other. It added that the average U.S. adult would feel comfortable letting AI manage nearly $20,000 of their money.
To examine how some of the most powerful AI models respond to requests for personal finance advice, five Money staffers graded outputs to 25 questions that we ran through ChatGPT's o3 model and Gemini 2.5 Pro.
What qualifies Money to do this analysis? We're money geeks, not academics or AI engineers who understand the intricacies of these models. And we'll admit: Assigning letter grades to AI answers is subjective. (In some cases, our graders were two full letter grades apart.) What we bring to the table is decades of combined experience in financial journalism, as well as expertise in the specific categories we analyzed. We read enough news on these subjects not only to bore our friends at dinner but also to assess how effectively AI models deliver up-to-date information in the ever-changing landscape of personal finance.
The most striking finding in our test? One model performed far better than the other, earning higher marks across all five topic clusters we tested (retirement, housing, credit, investing and current events).
Read on to learn more.
ChatGPT vs. Gemini: Which AI is better for personal finance questions?
ChatGPT overall grade: B-
Gemini overall grade: B+
If you're going to use AI for personal finance advice, which AI should you use? Our analysis found that Google's "most intelligent AI model," Gemini 2.5 Pro, outperformed OpenAI's o3, which was its "most powerful reasoning model" until the Aug. 7 release of a newer model, GPT-5. Our average score for the Gemini model was 3.18/4 (B+) while the ChatGPT model came in at 2.82/4 (B-).
ChatGPT has been called the "Kleenex" of AI. It was the first to market, and it remains the go-to resource for many users. So it may be a surprise that Gemini 2.5 Pro came out noticeably ahead.
However, our analysis isn't the only one suggesting that 2.5 Pro has an edge. The Open LM Arena, which combines crowdsourced ratings and other AI benchmarks, ranks Gemini 2.5 Pro No. 1 of all models. ChatGPT's o3 model ranks sixth, one spot behind the company's default model at the time of our test, GPT-4o, which provides faster responses. GPT-5 ranks second.
Gemini impressed us with thorough explanations and impressive sourcing. With that said, the best model to use is the one that works for you. Gemini's responses, for instance, tended to be longer than ChatGPT's outputs in our test. You may prefer the to-the-point nature of ChatGPT's tool.
Across a total of 250 grades, the average was a B — or a 3.0 out of 4. Our rubric (see more information in the methodology below) allowed for D and F grades; however, the lowest grade given was a C. While numerous errors were identified in the course of testing, we did not find reckless advice or entirely "hallucinated" outputs.
More than 50 A's were given out by individual graders. No response earned perfect marks, though a few came close. Four responses snagged A grades from all but one of our reviewers.
Here's a look at a couple of the standout answers:
- When asked, "Is it better to have a local real estate agent or a national broker?" ChatGPT explained that local realtors have "deep, street‑level knowledge of neighborhoods," adding that "pricing a Victorian on one side of town and a mid‑century ranch a mile away can differ by tens of thousands — local agents live these subtleties daily." On the other hand, you don't get the "national name recognition" that a large firm can offer or the "slick photography, virtual‑tour platforms, AI‑driven pricing tools and 24/7 client portals."
Money staff used words like "accurate," "useful," "descriptive" and "clear" in grading ChatGPT's response, which earned an A- average grade.
- Gemini offered a sharp breakdown of different loan options (and clear details about their pros and cons) when asked, "Should I take out a loan to repair my credit?" The 2.5 Pro model suggested to "start by tackling the root problems," mentioning late accounts, high credit utilization and credit report errors. It continued: "A properly structured loan can improve your score by (a) adding an installment account in good standing, (b) lowering your revolving utilization if you use it to pay cards down and (c) building a perfect payment record. But the benefit disappears — and your score may fall — if you miss payments or let card balances creep back up," the chatbot explained.
One of our graders said: "Great info, great sourcing — gives accurate pros/cons and talks about how a new loan can actually hurt your score, too." It got an A- average grade.
Grading AI answers to questions about retirement
ChatGPT: B-
Gemini: B+
Providing general guidelines for retirement planning isn't rocket science. After all, the basic principles of saving for your older years don't change much, and both ChatGPT and Gemini serve this function sufficiently by explaining contribution limits and general savings goals.
It’s when you need more nuanced advice — about, say, strategizing your retirement contributions to limit what you owe in taxes or building an investment portfolio that balances risk and payouts — that our graders found the AI models lacking. The responses muddied the water when describing the tax treatment of a traditional IRA versus a Roth IRA as compared to a 401(k), for example, and didn’t mention the benefits of having an investment portfolio with stocks that pay dividends so you can limit account withdrawals after you retire.
While the answers skimmed the complexities of retirement planning, they didn't go deep in explaining all the unknowns that make the process challenging (think: how long will you live, how much health care might cost when you retire and whether Social Security benefits could be cut). When the responses did get more detailed, the result was often filled with jargon and hard to follow. As our graders put it in various comments: "ambiguous at times," "lacks firm reasoning," "bare bones" and "leaves unanswered questions."
The upshot? If you want a quick overview about how much to save or a definition of a type of retirement account, AI can help. But if you want actionable financial advice for retirement saving or spending that applies to your lifestyle, you won't find that here. — Money editor Kaitlin Mulhere
1. How much money should I have saved for retirement by the time I’m 50?
- ChatGPT response grade: B
- Gemini response grade: B+
2. Which retirement account is better for me: a Roth IRA or a traditional IRA?
- ChatGPT response grade: B-
- Gemini response grade: A-
3. I don’t have much savings. Can I rely on Social Security for my retirement if I lower my spending when I stop working?
- ChatGPT response grade: B
- Gemini response grade: B+
4. Should I max out my 401(k) contribution to take full advantage of the tax benefits?
- ChatGPT response grade: B-
- Gemini response grade: A-
5. Should I take Social Security when I first become eligible, or should I wait until I can receive the maximum amount?
- ChatGPT response grade: C+
- Gemini response grade: B+
Grading AI answers to questions about housing
ChatGPT section grade: B
Gemini section grade: B+
I was curious to see how the two language learning models used for this experiment, ChatGPT and Gemini, would do when asked basic housing questions. I decided to view the answers as if I were someone unfamiliar with the topic, and to gauge the responses not only on how accurate they were but also on usefulness.
Since the first iterations of AI often provided wrong answers and outdated information, I felt wary — but ended up pleasantly surprised.
For example, when asked "How do I know if I can afford a house?" ChatGPT provided quite a few formulas to help you calculate things like your debt-to income ratio (DTI) and principal, interest, taxes and insurance (PITI). But it failed to adequately explain what these calculations are and why they are important when determining affordability (or mention other factors that could affect affordability).
Gemini did a much better job of defining the most commonly used terms used in calculating affordability and presenting the information in a reader-friendly manner. It explained how to determine DTI and what lenders look for in this metric, but also added sections on other costs associated with homebuying, including closing costs, home maintenance, taxes and insurance. Best of all, Gemini provided guidance on how to determine your financial health and take concrete steps toward a home purchase.
It wasn't perfect, though. Gemini could have used some of the payment calculations and formula examples included in the ChatGPT version. Although I favored the Gemini answers overall, my ideal answer to all of the questions would combine the information provided by both AI models into one reader-friendly guide. — Money real estate editor Leslie Cook
6. How do I know if I can afford a house?
- ChatGPT response grade: B-
- Gemini response grade: A-
7. How can I save money on real estate agent commissions?
- ChatGPT response grade: B+
- Gemini response grade: B+
8. Is it better to have a local real estate agent or a national broker?
- ChatGPT response grade: A-
- Gemini response grade: A-
9. What are the best places to live for low property taxes without compromising quality of life?
- ChatGPT response grade: C+
- Gemini response grade: B-
10. How can I buy a house with less than 20% down, and is it responsible?
- ChatGPT response grade: B-
- Gemini response grade: B+
Grading AI answers to questions about credit
ChatGPT section grade: B-
Gemini section grade: B+
When reading ChatGPT's response to a question about improving one's credit score, I identified several issues immediately.
- Unrealistic timeline: Promising results in "weeks" is misleading. Credit scores typically update monthly, and big boosts usually require sustained effort.
- No mention of on-time payments: ChatGPT failed to mention payment history. While not an immediate fix, paying on time stops further damage. Late payments severely hurt your score, so any advice should emphasize timely payments.
- Missing context on urgency: The answer never asked why a fast boost was needed. If no credit application or rental is imminent, there’s no need to rush. Better to focus on steady improvement instead of quick hacks.
On the positive side, ChatGPT did suggest lowering credit utilization (using under 10% of limits), an effective short-term strategy. However, much of its other guidance was shallow. It briefly mentioned avoiding Buy Now, Pay Later (BNPL) services without noting BNPL won’t build credit and can hurt yours if payments are missed. It also glossed over checking your credit report for errors — a step that can quickly boost your score if you dispute an inaccuracy.
Overall, ChatGPT's advice had a decent quick tip but lacked depth. An ideal answer would blend quick fixes with fundamental habits (budgeting, spending control, on-time payments) and realistic expectations about the timeline. — Money editor José Omar Rodríguez
11. How do I know what credit card is best for me?
- ChatGPT response grade: B-
- Gemini response grade: B+
12. What’s the best way to use credit card points?
- ChatGPT response grade: B
- Gemini response grade: B+
13. How can I increase my credit score fast?
- ChatGPT response grade: C+
- Gemini response grade: A-
14. Should I take out a loan to repair my credit?
- ChatGPT response grade: B-
- Gemini response grade: A-
15. What do I need to do if I just noticed that my credit score dropped?
- ChatGPT response grade: B+
- Gemini response grade: C+
Grading AI answers to questions about investing
ChatGPT section grade: B-
Gemini section grade: B
In its responses to five investing-focused prompts, ChatGPT and Gemini provided answers that Money's editorial staff graded, on average, 2.72 (B-) and 2.92 (B), respectively. While the topic for each prompt was broad, AI provided responses that were mostly sound and surprisingly detailed. They were not, however, without flaws.
For example, when asked "How much of my portfolio should I invest in cryptocurrency?" both ChatGPT and Gemini suggested limiting crypto investments to 1% to 5% overall. That aligns with experts' recommendations of allocating no more than 10% of a portfolio to alternative assets broadly (e.g., cryptocurrencies, gold, private equity/debt/credit, collectibles).
ChatGPT went a step further, suggesting that investors should practice dollar-cost averaging (DCA) — buying a set amount of an asset in fixed intervals to reduce a portfolio's exposure to volatility. While I agree that DCAing is a tried-and-true practice with traditional equities (stocks, ETFs and mutual funds), I disagree about its implementation for inherently volatile, speculative asset classes like crypto.
DCA is a long-term strategy founded upon reliable average annual returns. Applying that to crypto investing misses the mark in that the crypto market is exponentially more volatile than traditional equity markets.
Overall, the investing recommendations made by AI were well-grounded. When it comes to investing, current macro and market events are reported and contextualized alongside precedent while providing insights into broadly accepted best practices for the course forward.
Inasmuch, AI succeeds in its attempt to provide unbiased answers to personal finance questions. If the goal is to achieve increased financial literacy for the masses, generative AI can be deployed, at minimum, in that capacity. There's something to be said for that. — Money investing and banking editor Jordan Chussler
16. Is it worth it to get a traditional brokerage account, or can I just use an app?
- ChatGPT response grade: B-
- Gemini response grade: A-
17. How much of my portfolio should I invest in cryptocurrency?
- ChatGPT response grade: B-
- Gemini response grade: B-
18. What is the point of investing in gold, and how can I tell if it's a scam?
- ChatGPT response grade: B-
- Gemini response grade: B-
19. What is a good plan to get rich through investing?
- ChatGPT response grade: C+
- Gemini response grade: B-
20. Is it better to pick my own stocks if mutual funds and ETFs have fees?
- ChatGPT response grade: B+
- Gemini response grade: B-
Grading AI answers to questions about current events
ChatGPT section grade: B-
Gemini section grade: B
When OpenAI launched ChatGPT in November 2022, the chatbot did not possess knowledge of events after September 2021 — that was the data cutoff date at the time due to how the AI technology was trained. About a year later, the company announced upgrades on this front; CEO Sam Altman said it would try "to never let it get that out of date again."
Today, AI models can browse the web, and LLMs can cite events that occurred the same day. But our test showed that AI models still struggle to provide users with the latest information on queries about current events in personal finance. In June 2025, Gemini responded to a prompt about auto loan refinancing conditions with information about a decline in loan rates in late 2024, half a year prior. On another question, ChatGPT pulled a Zillow housing market report from February — not terribly dated, but at the time of the question, the real estate website had published March and April reports with more current information.
Asked about student loan policy changes, ChatGPT suggested to "request a Fresh Start rehabilitation before Sept 30 2025 (still available)." However, that program ended in the fall of 2024. Gemini responded to the same question with information about the Saving on a Valuable Education (SAVE) plan that wasn't necessarily incorrect but read as if it was written three to six months ago, failing to mention changes since.
While processing current events remains a weakness of AI models, some answers in this section impressed us, like Gemini's advice about when to buy a house (A-). The answer summarized multiple home price forecasts from reputable sources, accurately explained mortgage market trends and broke down how market conditions are different in the West and South than the Midwest and Northeast. — Money lead news reporter Pete Grieve
21. I'm interested in buying a house but I don't know if I should wait — are home prices going up or down?
- ChatGPT response grade: B+
- Gemini response grade: A-
22. Do I need to pay my medical bills if they were removed from credit reporting?
- ChatGPT response grade: C+
- Gemini response grade: B+
23. I have federal student loan debt. Should I be worried about policy changes as President Donald Trump rolls back Biden-era protections?
- ChatGPT response grade: B-
- Gemini response grade: C+
24. How will the upcoming tax bill affect my bottom line, and what moves should I make now to prepare?
- ChatGPT response grade: B-
- Gemini response grade: C+
25. When will I be able to refinance my auto loans?
- ChatGPT response grade: B
- Gemini response grade: B+
Methodology
Five Money staff members read, researched and graded ChatGPT and Gemini responses to 25 personal finance advice questions in five topic categories (retirement, housing, credit, investing and current events). Note that we only analyzed these two LLMs, which were chosen due to their popularity; we did not test Claude, Perplexity, DeepSeek, Grok or any other AI models.
We assigned whole letter grades (A, B, C, D or F) to each answer. With 25 questions, two models and five graders, that adds up to 250 grades.
The first step was coming up with questions.
We used search traffic tools to identify common topics and aimed to write questions that would lead to advice, not just information. Internet users have been able to get answers to questions like "what are stocks" for decades; our interest was around AI's ability to go deeper and provide personal finance advice. We brainstormed prompts in subject areas where our reporters and editors have years of coverage experience. Questions were designed to be answerable without inputting specifics of a user’s personal finance situation beyond any details in the prompt.
This lack of personalization could be seen as one of the limitations of our test: In a real-world use case, a user could plug in information like their salary, rent or account balances that could lead to more tailored advice. But experts caution against sharing too much data with ChatGPT. Inputting sensitive financial information could be a cybersecurity risk, Mark Valentino, head of business banking at Citizens Bank, tells Money.
"Any time you're sharing your own personal, nonpublic information with a nonfinancial services provider that, frankly, isn't as regulated as closely — where information sharing practices aren't as governed as they are with a financial institution — there certainly is concern," he says. "You think about the fraud risk, you think about the identity theft concerns."
We entered the ChatGPT prompts on June 2. We used the o3 model, which at the time was OpenAI's "most powerful reasoning model that pushes the frontier across coding, math, science, visual perception." We used a new "workspace," turned off the "memory" setting and started separate chats for each question.
Despite disabling location access in our browser settings, ChatGPT demonstrated in a handful of instance that it was still inferring an approximate location: Queens, New York. (The actual location was Brooklyn, New York.) In most of these instances, we asked the model to turn off its access to location and reproduce the response. Beyond that, we did not ask follow-up questions. Again, this could be seen as a limitation of the report because the ability to ask follow-up questions and dig deeper into a piece of a response — or to ask for clarification on something you don't understand — is part of what makes AI powerful.
We put our prompts through Gemini on June 5, using the 2.5 Pro (preview) model. New chats were used for each question.
Like ChatGPT, Gemini indicated that it recognized our location based on the IP address. However, Gemini showed that it did not have access to our "precise" location.
One other note on the Gemini test: Question 8 resulted in a partially duplicated response; we graded the second half of the output, which appeared to be the intended primary answer.
To enhance standards, we graded answers using a rubric:
A (Excellent):
- Thorough and insightful, directly answers the question.
- All information is accurate.
- Precise language devoid of ambiguities.
B (Solid):
- Offers helpful information and advice.
- Addresses the main points of the question effectively.
- Any issues are minimal.
C (Average):
- The information is useful but misses the mark for accuracy and relevance.
- The answer is somewhat vague.
- Contains minor issues or errors.
D (Weak):
- Lacks actionable information.
- Includes significant errors or inaccuracies.
- The response is not clearly articulated.
F (Failure):
- Does not answer the question.
- Provides misleading or reckless advice.
- Response is not accurate or relevant.
Finally, average grades were calculated using a scoring system in which A = 4, B = 3, C = 2, D = 1 and F = 0. Average grades were calculated based on the academic grading method in which an A = 4, an A- = 3.7, a B+ = 3.3, a B = 3.0 and so on.