Do Large Language Models’ Reasoning Limitations Really Matter?

Konrad Hoppe
10 min readNov 7, 2024

--

A recent study ( https://arxiv.org/pdf/2410.05229 ) by an Apple AI Team highlighted the challenges language models face in numerical reasoning tasks. This has led to a wave of skepticism regarding their utility.

However, I believe this perspective misses the bigger picture. The key is not to dismiss these models but to understand where they can be most effectively employed.

Key insights

  1. Strategic Deployment : Just as we wouldn’t dismiss hiring working students on projects for their lack of experience with negotiating complex client situations, we shouldn’t discount language models for their numerical reasoning limitations. Instead, we should leverage their strengths.
  2. Translation Mastery : Language models excel at translating human-readable text into various languages, including programming languages. This makes them invaluable for a wide range of business applications.
  3. Subject Matter Expertise : Language models require guidance from subject matter experts to perform optimally. Experts need to determine how to instruct the model and choose the appropriate language or domain for translation to achieve the best results.

Solve Problems in the Right Language: Mathematics

While large language models (LLMs) may not excel at reasoning like humans, their true strength lies in translating complex problems into more understandable or practical terms. In business, many problems can be addressed using applied mathematics.

Thus, the key to solving most business problems is to frame them in mathematical terms. This is where LLMs shine. They can convert complex issues into mathematical language, which, while precise, can be challenging to solve manually. This is why code is essential. LLMs can generate the necessary code to address these problems, making them powerful tools for experts in various fields.

Ultimately, the value of LLMs lies in their ability to transform problems into the right language-mathematics, implemented through code.

Practical Application: Route Optimization

To illustrate this point, let us consider a logistics problem where a company needs to find the shortest route between delivery points. Here’s how a language model can aid in this scenario.

Let us first consider an attempt where we describe the distances between delivery points on a map and ask to find a route between two cities:

Prompt to LLM:

By using one or several of the following routes: From Zuerich to Baden in 22 km, from Baden to Basel in 83km, from Zuerich to Rotkreuz in 39km, from Baden to Olten in 43km, from Olten to Bern in 67km, from Olten to Luzern in 57km, from Olten to Basel in 39km, from Rotkreuz to Luzern in 18km, from Rotkreuz to Schwyz in 26km, from Luzern to Altdorf in 39km, from Schwyz to Altdorf in 16km. How can I come from Zuerich to Altdorf and how long will it take?

Model reply:

To plan a route from Zürich to Altdorf using the given routes, we need to find a path that connects these two locations. Let’s explore the possible routes:

  1. Zürich to Rotkreuz (39 km)
  2. Rotkreuz to Luzern (18 km)
  3. Luzern to Altdorf (39 km)

Adding up the distances:

  • Zürich to Rotkreuz: 39 km
  • Rotkreuz to Luzern: 18 km
  • Luzern to Altdorf: 39 km

Total distance: 39 km + 18 km + 39 km = 96 km

So, you can travel from Zürich to Altdorf via Rotkreuz and Luzern, and the total distance will be 96 km.

While this solution is valid, it misses a shorter route from Zurich to Altdorf via Schwyz and Rotkreuz in only 81km.

So this might be considered as an example where a Language Model has failed to reason logically. Someone who searches for a route between two places, does not want an arbitrary connection, but rather the shortest one.

Translating Problems into Programming Languages

Given their proficiency in translating between languages and domains, we can effortlessly convert our detailed route-finding problem statement into any programming language using Language Models.

A natural choice would be a declarative programming language, where the language model only needs to translate the problem statement without creating an algorithm. Prolog (https://en.wikipedia.org/wiki/Prolog) is a suitable candidate for this task.

Back in our university days, discovering Prolog felt like uncovering a hidden gem in the world of computer science. Its declarative nature made it a powerful tool for solving complex problems without diving into procedural code.

As math nerds, we were thrilled by the idea that Prolog could serve as an automatic theorem-proving machine, potentially eliminating the need for tedious math homework.

The beauty of Prolog lies in its ability to express logical relations and rules, making it ideal for tasks such as computational linguistics or proving mathematical theorems. Additionally, Prolog allows for a clear separation between rules and data. This means that a set of axioms can be easily applied to different datasets by simply swapping the database.

For example, consider a Prolog program to determine if someone is an ancestor:

parent(john, michael).
parent(michael, sarah).
parent(sarah, emily).
ancestor(X, Y) :- parent(X, Y).
ancestor(X, Y) :- parent(X, Z), ancestor(Z, Y).

To query this knowledge base, you could ask:

?- ancestor(john, emily).
true.
?- ancestor(michael, emily).
true.
?- ancestor(sarah, john).
false.

This program defines that John is Michael’s parent, Michael is Sarah’s parent, and Sarah is Emily’s parent. The rule for ancestor states that X is an ancestor of Y if X is a parent of Y, or if X is a parent of Z and Z is an ancestor of Y. This example beautifully demonstrates how Prolog can handle recursive relationships and logical queries in an intuitive manner.

The excitement of leveraging Prolog for such tasks made us believe we had a magical tool at our disposal. However, our excitement was tempered by the realization that Gödel’s incompleteness theorem shows not all mathematical truths can be proven within a given logical system, but this goes beyond this blog article…

Despite its limitations, Prolog’s potential to simplify and solve complex problems has been a captivating part of our academic journey, offering a glimpse into the possibilities of logic-based problem solving.

Naive attempt: General route finding using Prolog

So let’s try to solve the route finding problem in Prolog:

Prompt to LLM:

I need to write a Prolog program that finds a route between two places on a map. The Prolog program should be able to deal with cycles in the graph and avoid infinite recursion. The program should output the identified route in correct order together with the total length of the trip

The following routes are available: From zuerich to baden in 22 km, from baden to basel in 83km, from zuerich to rotkreuz in 39km, from baden to olten in 43km, from olten to bern in 67km, from olten to luzern in 57km, from olten to basel in 39km, from rotkreuz to luzern in 18km, from rotkreuz to schwyz in 26km, from luzern to altdorf in 39km, from schwyz to altdorf in 16km.

I need to write a Prolog program that returns the route from zuerich to altdorf and also returns the total length of the trip.

How can I achieve this with Prolog?

The reply with the Prolog code can be found below in Appendix 1.

This initial Prolog code finds a route but not necessarily the shortest one. Interestingly, the route produced by the code is even longer than the one initially suggested by the LLM:

Route = [(zuerich, baden, 22), (baden, basel, 83), (basel, olten, 39), (olten, luzern, 57), (luzern, altdorf, 39)],

Length = 240

The reason for this lies in the recursive nature of the solver. It iterates through the knowledge base, searching for the next feasible step without necessarily optimizing for the shortest path. However, by providing minimal guidance, such as suggesting a breadth-first search approach, we can improve the solution:

Refined attempt: Minimal guidance for the LLM

Query to LLM:

I need to write a Prolog program that finds a route between two places on a map. The Prolog program should be able to deal with cycles in the graph and avoid infinite recursion. I would suggest to build an breadth-first approach.

The following routes are available: From zuerich to baden in 22 km, from baden to basel in 83km, from zuerich to rotkreuz in 39km, from baden to olten in 43km, from olten to bern in 67km, from olten to luzern in 57km, from olten to basel in 39km, from rotkreuz to luzern in 18km, from rotkreuz to schwyz in 26km, from luzern to altdorf in 39km, from schwyz to altdorf in 16km.

I need to write a Prolog program that returns the route from zuerich to altdorf and also returns the total length of the trip.

How can I achieve this with Prolog?

The refined approach, detailed in Appendix 2, returns the shortest path: Zürich -> Rotkreuz -> Schwyz -> Altdorf, with a total length of 81 km.

LLMs: A Powerful Tool for Subject Matter Experts

The example above illustrates two key points: (1) Language models are excellent at translating problems into different languages, such as programming languages, and (2) they require subject matter expertise to achieve the best results.

While Prolog is effective for certain types of logical problem-solving, it might be seen as cumbersome for some users. For problems that are more directly related to optimization, a language specifically designed for mathematical formulations and constraint optimization might be more suitable.

AMPL (https://en.wikipedia.org/wiki/AMPL) is an excellent choice for such tasks. AMPL is designed to describe and solve high-complexity problems in mathematical optimization. It provides a more intuitive and direct approach for defining and solving optimization problems, making it a natural candidate for tasks like route optimization.

Let’s see how AMPL can be leveraged to solve the same route-finding problem:

Query to LLM:

I need to write an AMPL program that finds a route between two places on a graph.

The following routes are available: From zuerich to baden in 22 km, from baden to basel in 83km, from zuerich to rotkreuz in 39km, from baden to olten in 43km, from olten to bern in 67km, from olten to luzern in 57km, from olten to basel in 39km, from rotkreuz to luzern in 18km, from rotkreuz to schwyz in 26km, from luzern to altdorf in 39km, from schwyz to altdorf in 16km.

I need to write a AMPL program that returns the route from zuerich to altdorf.

How can I achieve this with AMPL?

The reply gives the desired result: Solving the AMPL model also yields the shortest path: Zürich -> Rotkreuz -> Schwyz -> Altdorf, with a total length of 81 km.

Conclusion

Language models are powerful tools that, when used correctly, can significantly boost productivity and solve complex business problems. They excel in translating problems into various languages and domains, making them invaluable for subject matter experts.

🔧 Takeaway: Language models are not about numerical reasoning. Their true strength lies in their ability to translate and facilitate problem-solving across different domains. With the right guidance and expertise, they can be a game-changer in optimizing business processes.

#AI #MachineLearning #BusinessOptimization #LanguageModels #Prolog #AMPL #ProductivityBoost

Appendix 1: Naive Prolog program

Model Reply:

route(zuerich, baden, 22).
route(baden, basel, 83).
route(zuerich, rotkreuz, 39).
route(baden, olten, 43).
route(olten, bern, 67).
route(olten, luzern, 57).
route(olten, basel, 39).
route(rotkreuz, luzern, 18).
route(rotkreuz, schwyz, 26).
route(luzern, altdorf, 39).
route(schwyz, altdorf, 16).
connected(X, Y, D) :- route(X, Y, D); route(Y, X, D).path(X, Y, Visited, [(X,Y,D)], D) :- connected(X, Y, D), \+ member((Y,_), Visited).
path(X, Y, Visited, [(X,Z,D1)|Path], D) :-
connected(X, Z, D1),
Z \= Y,
\+ member((Z,_), Visited),
path(Z, Y, [(Z,_,_)|Visited], Path, D2),
D is D1 + D2.
route(X, Y, Route, Length) :- path(X, Y, [], Route, Length).

Now, you can find a route from zuerich to altdorf and calculate the total length of the trip by querying:

?- route(zuerich, altdorf, Route, Length).

This should return the correct route and length.

Appendix 2: Refined Prolog program

Model Reply:

route(zuerich, baden, 22).
route(baden, basel, 83).
route(zuerich, rotkreuz, 39).
route(baden, olten, 43).
route(olten, bern, 67).
route(olten, luzern, 57).
route(olten, basel, 39).
route(rotkreuz, luzern, 18).
route(rotkreuz, schwyz, 26).
route(luzern, altdorf, 39).
route(schwyz, altdorf, 16).

Then, we define a predicate connected/3 that checks if two places are directly connected:

connected(X, Y, D) :- route(X, Y, D).
connected(X, Y, D) :- route(Y, X, D).

Next, we define a predicate path/4 that finds a path between two places:

path(X, Y, Visited, [(X,Y,D)|Visited]) :-
connected(X, Y, D),
\+ member((X,_), Visited),
\+ member((_,X), Visited).
path(X, Y, Visited, Path) :-
connected(X, Z, D),
Z \= Y,
\+ member((X,_), Visited),
\+ member((_,X), Visited),
path(Z, Y, [(X,Z,D)|Visited], Path).

Finally, we define a predicate shortest_path/3 that finds the shortest path between two places:

shortest_path(X, Y, Path) :-
findall(P, path(X, Y, [], P), Paths),
shortest(Paths, Path).
shortest([P], P).
shortest([P|Ps], Shortest) :-
shortest(Ps, S),
path_length(P, L1),
path_length(S, L2),
(L1 < L2 -> Shortest = P ; Shortest = S).
path_length(Path, Length) :-
findall(D, (member((_,_,D), Path)), Ds),
sum_list(Ds, Length).

You can find the shortest path from Zuerich to Altdorf by querying shortest_path(zuerich, altdorf, Path). The program will return the shortest path and its length.

Appendix 3: AMPL Program

Model Reply:

# Define the set of nodes (places)
set NODES;
# Define the set of arcs (routes) with distances
set ARCS within {NODES, NODES};
param distance {ARCS};
# Define the source and destination nodes
param source symbolic;
param destination symbolic;
# Decision variable: x[i,j] is 1 if the route from i to j is part of the shortest path, 0 otherwise
var x {ARCS} binary;
# Objective: Minimize the total distance of the selected routes
minimize Total_Distance: sum {(i,j) in ARCS} distance[i,j] * x[i,j];
# Flow conservation constraints
s.t. Flow_Conservation {n in NODES}:
(if n = source then 1 else 0) - (if n = destination then 1 else 0) =
sum {(i,j) in ARCS: i = n} x[i,j] - sum {(i,j) in ARCS: j = n} x[i,j];

and data is defined as:

# List of nodes
set NODES := zuerich baden basel rotkreuz olten bern luzern schwyz altdorf;
# List of arcs with distances
param: ARCS: distance :=
[zuerich, baden] 22
[baden, basel] 83
[zuerich, rotkreuz] 39
[baden, olten] 43
[olten, bern] 67
[olten, luzern] 57
[olten, basel] 39
[rotkreuz, luzern] 18
[rotkreuz, schwyz] 26
[luzern, altdorf] 39
[schwyz, altdorf] 16;
# Define the source and destination
param source := zuerich;
param destination := altdorf;

Originally published at https://www.linkedin.com.

--

--

Konrad Hoppe
Konrad Hoppe

Written by Konrad Hoppe

applied AI / strategy consultant / aspiring XC rider on weekends // http://www.konrad-hoppe.com

No responses yet