Observations from training geospatial AI agents

Tariro Mashongamhende
Jun 1
5 min read

We have built the Eikon system to work as a dataset, tools/models and as an AI agent system all in one. This experience has taught us a few things about what trade-offs need to be made in order to build a reliable and usable system.

This piece will focus on the training of the Eikon AI agent and will cover why we feel we need to build our own agent(s), how we think about building them and whether model size / scale / type make any difference for what we do.

What an AI agent is can mean different things to different people ranging from ideas from the late 80s from works like Society of Mind to more modern interpretations which see an AI agent as a Large Language Model that is able to take actions in the real (insert digital) world using tools. In general we take inspiration from both views. We think that the model itself must have some ability to understand the current state of the (it’s) world as well as to be able to plan and execute actions to meet its desired goals. Perhaps where we differ from some more limited interpretations of this, is that we see the scope of these actions not being bound solely to the digital world (code). As a geospatial AI company there is an intrinsic requirement for our systems to be able to understand the real world as any recommendations will also impact the real world. This is both helpful and unhelpful depending on the scope of your definition. For those who see an AI agent as something focused on using computers this requirement dramatically changes the number of things the system needs to be able to do. For example web search is unhelpful when the thing being searched for is not easily available or accessible on the open web. For us this requirement to interact with some representation of reality is helpful as it focuses our work on building systems that are focused on reality and the world of atoms than on the world of bits.

In order for our system to understand the current state of the world it needs to have developed some overall understanding of what the world is (we are influenced particularly by the ideas of Kevin Lynch’s Image of the City). LLMs have some version of this, however, what they know and what they don’t know is still a guessing game for most people, including some of the people building them at frontier AI labs. Thus, we need a mechanism to apply this after the fact to any LLM we use. This immediately raises a problem. Proprietary models are by their very nature opaque to users of them, and as mentioned, sometimes also to their creators. In addition, they typically offer limited opportunity for customisation and adjustment. This means that in order to build a system with the sufficient understanding of the world that we desire we must build on top of systems that allow for adjustment and changes, which naturally points us towards open-source/open-weight alternatives. There are many open-source/open-weight models, and these can be either self-hosted or borrowed, by using inference providers (check out Doubleword they're a good one). Either option is associated with costs but both in general are cheaper than using proprietary models in the long-term / large-scale processing, however these typically are associated with poorer instruction following and desired behaviour (particularly for smaller models) and limited ability for customisation when hosted through a vanilla inference provider. Returning to our guiding constraint, the need to give an understanding of the world to our models means using inference providers is also unlikely to offer us the sufficient means to do this easily. So the remaining option is to use open-source models as a foundation.

We have a number of different ideas about how to add an understanding of the world, or a specific part of the world into an AI system, most of these are not trade secrets, this can be done through some combination of knowledge system integration through approaches like RAG, fine-tuning and reinforcement learning. We use some combination of these to make the open-source models we start with achieve the desired behaviour we want for our system. We assess our AI system across three main types of tasks:

1) Planning

2) Execution

3) Self-evaluation

We find these components are sufficient to make a well-functioning AI agent system regardless of model size, architecture or reasoning capabilities. Making a good planning system is relatively easy as this is where we see the best instruction following, be this related to form or function. Self-evaluation is about twice as hard as Planning. The main reasons for this is that the self-evaluation component requires contrastive thought i.e. identifying related and unrelated things as well as inductive and deductive reasoning to form valid conclusions from premises. Reasoning models naturally do better at this task than non-reasoning models, however, not without paying a penalty on the number of tokens required to achieve the desired answer. Where decisions boil down to some form of classification this can feel expensive both in terms of tokens and wall time. As such we take steps to achieve the desired outputs comparable to reasoning models without the token and wall time penalty associated with their use. One by-product of this is to reduce the model’s overall generalisability, however in this case the task is constrained sufficiently such that any classification based decisions are able to be satisfied. Lastly, the LLMs we have tested tend to perform most poorly on execution steps, particularly those that require multiple sequential steps. This observation isn’t novel but that it is 5 times more difficult for a model to produce desired output steps was a surprise to us, as the ability to take actions is an important, if not defining, characteristic of an AI agent.

So what are the implications of these observations? First, building AI agents that need to reliably interact with real-world systems need to be able to reliably execute multiple steps accurately. In areas which have strict requirements, those very same requirements are more likely to result in poorer outcomes than letting a model define it’s own approach to solving a problem. This is because approaches proposed by models may have built-in redundancy, namely that they are less strict on requirements, so, the how it gets there is less important than that it at least eventually gets there. So why don’t we just let AI agents figure out how to solve issues with no frameworks or strict requirements, the answer is probably more to do with the cost associated with this i.e. the number of tokens associated with such an approach. Users of AI systems have no doubt noticed that AI models that use tools have a tendency to use tools at a much higher rate than a human would. We observe this across a range of different models from frontier models to open-source and from large to small LLMs. Put another way we observe that LLM based AI agents are not frugal when it comes to using tools and taking actions to solve problems. Therefore if models decide their own strategies to achieve a desired objective and they are predisposed to be profligate with their use of tools along the way, it is no surprise that the people paying for these systems may eventually balk at the cost associated with using them sooner or later. One approach is to improve the instruction following of such a system, and as the reader must surely begin to notice by now, this is not costless. Our chosen approach is instead to make our models extremely frugal when they slip into this pattern of making repeated use of tools for no additional information gain and minimising the number of execution steps required to achieve a desired task. By doing so this allows us to be able to achieve the tasks our users want whilst also not burning through their budgets. Yes, this means it costs less to achieve similar tasks using our AI agent than a larger more frontier system and as a result our users get more bang for their buck using Eikon.