Autonomous Mobile Robots (AMRs) are self-navigating robots that use sensors, cameras, and onboard intelligence to move through environments without direct human control. Unlike traditional automated guided vehicles (AGVs), AMRs can adapt to their surroundings, making real-time navigation decisions to avoid obstacles and optimize routes. AMRs are highly flexible and can be quickly reprogrammed for different tasks, making them ideal for dynamic environments like warehouses and distribution centers.
Read also: How AMRs Are Fulfilling the Potential of Automation in Modern Supply Chains
Driven by increasing demand in ecommerce and manufacturing industries, along with advances in AMR technology, adoption of AMRs in warehouse and logistics operations is expected to increase exponentially over the next decade. Specifically, AMRs have proven to be cost-effective in four main use cases:
Picking and Sorting: efficiently retrieving and sorting items in warehouses Cross-Docking: transferring goods directly from receiving to shipping Inventory Management: tracking stock levels and locations Material Transport: moving materials between production lines or storage areas.For warehouse and logistics managers, the objective is to leverage AMRs to handle repetitive tasks with precision and efficiency, according to the robot’s specific training. For example, once trained, an autonomous forklift can respond to prompts from the organization’s enterprise resource planning (ERP) system to perform designated actions. When a truck arrives at the warehouse yard, the ERP triggers the AMR to initiate unloading. The AMRs then move to the yard, using advanced sensors and cameras to determine the most effective method for unloading pallets and placing them in the Goods Receipt Area for further processing.
Now, consider a scenario where the same AMR needs to be retrained for cross-docking tasks. In this case, if there is an outstanding outbound delivery pending for those incoming items, the robot would move goods directly from the Goods Receiving Area to the Shipping Area.
Understanding Large Language Models
To better understand how AMRs are trained, it may be helpful to first explore the basics of a Large Language Model (LLM). LLMs are a type of generative artificial intelligence (Gen AI) that can create original text-based content. At its core, an LLM is powered by a complex network of nodes known as a neural network, in which connections between these nodes are represented by weights that can take on a range of values, not limited to between 0 and 1. These networks process vast amounts of data, breaking down text into smaller units called tokens, each of which is assigned a unique numeric representation. These tokens are then organized into multi-dimensional vectors, allowing the model to recognize and interpret relationships between words and concepts.
Without delving too deeply into technical details, it is essential to recognize that an LLM is primarily a tool for generating content based on a given prompt. Consistency in its responses is achieved by training on extensive and diverse datasets, while high-performance computing resources—typically graphic processing units (GPUs) or even more specialized processors like tensor processing units (TPUs)—provide the computational power required for both training and operation.
LLMs excel at interpreting natural language prompts, enabling them to respond to human-written instructions and complete tasks in ways that resemble human reasoning, albeit based purely on learned patterns. The text input that guides the LLM’s response is called a prompt, and the memory space available for processing this prompt is known as the context window. The size of the context window, which can vary between models, is measured in tokens (not words) and typically ranges from hundreds to several thousand, enabling the model to handle complex instructions and context.
LLMs are a vast topic with layers of complexity, but for our purpose we will examine how solution architects are harnessing Large Language Models to enhance AMR operations in warehousing, by focusing on the example of training an AMR to perform different tasks within the warehouse. In this case, the model is trained to recognize open deliveries within the ERP system and to dynamically direct the AMR to pick up material pallets from the Goods Receipt Area, and transport them to the Shipping Area. This allows for efficient loading onto outbound trucks, ensuring timely delivery to customers.
In the following discussion, we will focus solely on adapting and aligning the model, including fine-tuning the model primarily with Parameter-Efficient Fine-Tuning (PEFT) using soft prompts, as this is one of the most widely applied methodologies, but excluding the evaluation aspect that assesses the model’s learning outcomes. We also assume that application integration is already in place, as exploring its details would add significant scope to our discussion.
Figure 1: Typical Generative AI project lifecycle.
The Generative AI project lifecycle
The Generative AI project lifecycle involves four key stages, as shown in Figure 1. In our example, we focus on retraining the model to adapt to a new feature, specifically within the Adapt, Align, and Evaluate stage. Here, the model is refined through prompt engineering and Parameter-Efficient Fine-Tuning (PEFT) with soft prompts, which update a limited number of parameters to efficiently adapt the model. This method is particularly valuable for models already trained and integrated with APIs; PEFT enables rapid fine-tuning to incorporate new features while preserving the model’s previous training. Through human feedback, the model is aligned for relevance, accuracy, and ethical considerations, and its performance is rigorously evaluated.
Figure 2: An overview of the time and effort involved in the five phases of LLM training.
Timeline for LLM training
As Figure 2 illustrates, the first of the five phases involved in training an LLM is the longest, typically taking from a matter of weeks to months. Once that is complete, the next four phases move relatively quickly. For our example, we will take a look at the first three phases below, talking more about the PEFT approach for Fine-tuning.
Pre-TrainingPre-training a large language model is a substantial undertaking. This stage is the most complex in model development due to the intricate architecture design, the vast amounts of data required, and the specialized expertise involved. However, most development work today begins with pre-trained foundation models, allowing you to skip this initial stage. When working with a foundation model, you will typically start by assessing the model’s performance through prompt engineering—a process that requires less technical expertise and doesn’t involve retraining the model. Prompt Engineering
The input text provided to the model is called the prompt, the process of generating text is known as inference, and the model’s response is referred to as the completion. The model’s memory for processing this input is known as the context window. In this example, the model performs well in responding to an unloading task, but in practical scenarios, you might need it to adjust its behavior to perform tasks like cross-docking pallets when specific conditions are met.
To achieve the desired outcome on the first try, you may need to refine the language or structure of your prompt. This iterative process, known as prompt engineering, involves experimenting with different prompt formats until the model behaves as intended. While prompt engineering is a complex field, a powerful strategy to improve model responses is to embed examples of the target task directly within the prompt, helping guide the model toward the desired output.
Prompt Tuning and Multitask Fine-TuningMultitask fine-tuning extends beyond traditional single-task fine-tuning by training a model on a diverse dataset with examples for multiple tasks. This dataset includes input-output pairs for various tasks, such as summarization, sentiment analysis, code translation, and entity recognition. By training on this mixed dataset, the model learns to perform multiple tasks simultaneously, mitigating the issue of catastrophic forgetting—where a model loses previously learned information when trained on new tasks. Over many training epochs, the model’s weights are updated based on the calculated losses across examples, resulting in an instruction-tuned model capable of excelling in multiple tasks concurrently.
A prominent example of this approach is the FLAN (Fine-tuned Language Net) family of models. FLAN is a collection of multitask fine-tuning instructions applied to different models, with the fine-tuning process serving as the final stage of training. In the original FLAN paper, the authors liken fine-tuning to a “dessert” following the “main course” of pre-training—an apt metaphor highlighting fine-tuning as the final refinement step that enhances the model’s adaptability across tasks.
Parameter-Efficient Fine-Tuning (PEFT)For large models with billions of parameters, the risk of catastrophic forgetting is significant, making Parameter-Efficient Fine-Tuning (PEFT) an optimal approach. PEFT techniques minimize the need to retrain all parameters, thereby preserving previously learned knowledge while fine-tuning for specific tasks.
In this example, we will employ PEFT methods to fine-tune the model. There are also additive methods within PEFT that aim to improve model performance without changing the weights at all. This includes prompt tuning, which sounds similar to prompt engineering, but they are quite different from each other.
In prompt engineering, you work on the language of your prompt to get the completion you want. This could be as simple as trying different words or phrases, or more complex, such as including examples for one or Few-shot Inference. The goal is to help the model understand the nature of the task you are asking it to carry out, and to generate a better completion. However, there are some limitations to prompt engineering, as it can require a lot of manual effort to write and try different prompts. You are also limited by the length of the context window, and in the end, you may still not achieve the performance you need for your task.
With prompt tuning, you add additional trainable tokens to your prompt and leave it up to the supervised learning process to determine their optimal values. The set of trainable tokens is called a soft prompt, and it gets prepended to embedding vectors that represent your input text.
Figure 3: Prompt Efficient Fine -Tuning using soft prompts.
The soft prompt vectors have the same length as the embedding vectors of the language tokens; including between 20 to 100 virtual tokens can be sufficient for good performance.
The tokens that represent natural language are hard in the sense that they each correspond to a fixed location in the embedding vector space. However, the soft prompts are not fixed, discrete words of natural language. Instead, you can think of them as virtual tokens that can take on any value within the continuous multidimensional embedding space. Through supervised learning, the model learns the values for these virtual tokens that maximize performance for a given task.
In full fine-tuning, the training data set consists of input prompts and output completions or labels. The weights of the LLM are updated during supervised learning.
In contrast with prompt tuning, the weights of the LLM are frozen, and the underlying model does not get updated. Instead, the embedding vectors of the soft prompt get updated over time to optimize the model’s completion of the prompt.
Prompt tuning is a parameter-efficient strategy that involves training a small number of additional parameters, making it significantly less resource-intensive than full fine-tuning, which may involve modifying millions to billions of parameters. Like LoRA (Low-Rank Adaptation), prompt tuning falls under the umbrella of parameter-efficient fine-tuning (PEFT) methods. However, PEFT can offer more flexibility because it allows the addition of new parameters tailored for specific tasks, rather than re-parametrizing an existing fixed set, as in LoRA. In PEFT, you can create separate soft prompts for each task, enabling efficient switching between tasks at inference without modifying the underlying model.
You can also train one set of soft prompts for one task and a different set for another. To use them for inference, you prepend your input prompt with the learned tokens; to switch to another task, you simply change the soft prompt. Because soft prompts are very small, taking little disk space, this kind of fine tuning is extremely efficient and flexible.
In the example above, notice that the same LLM is used for all tasks, since you only have to switch out the soft prompts at the time of inference.
Figure 4: Performance of the PEFT compared to other Fine-tuning Methods.
How well does prompt tuning perform? In the original paper describing prompt tuning, “Exploring the Method” by Brian Lester and his collaborators at Google, the authors compared prompt tuning to several other methods for a range of model sizes. In Figure 4, we see the Model size on the X axis and the SuperGLUE score on the Y axis. (General Language Understanding Evaluation [GLUE] refers to the evaluation of language model performance across an array of natural language understanding [NLU] tasks; SuperGLUE includes evaluations for more complex reasoning and generative tasks, as well as benchmarks for models competing with human performance.) The red line shows the scores for models that were created through full fine-tuning on a single task, while the orange line shows the score for models created using multitask fine-tuning. The green line shows the performance of prompt tuning, and the blue line shows scores for prompt engineering only.
As we can see, prompt tuning does not perform as well as full fine-tuning for smaller LLMs. However, as the model size increases, so does the performance of prompt tuning—and once models have around 10 billion parameters, prompt tuning can be as effective as full fine-tuning, and offer a significant boost in performance as compared to prompt engineering alone.
Final steps for integrating a retrained model with an AMR forklift
With the model fine-tuned using Parameter-Efficient Fine-Tuning (PEFT), we are nearly ready to integrate it with AMRs, using application programming interface (API) connections to streamline the cross-docking process. This integration enables the model to instruct AMRs to efficiently move pallets from the Goods Receipt Area directly to the Shipping Area, meeting real-time logistics needs. By leveraging the fine-tuned model’s specialized understanding, the AMRs can perform cross-docking with improved accuracy and responsiveness, adapting dynamically to varied demands and optimizing workflow efficiency in the warehouse.
Integrating the fine-tuned model with AMRs brings significant operational advantages. First, the model’s precise instructions ensure that pallets are transferred swiftly and accurately, reducing manual handling and minimizing potential errors in the cross-docking process. This streamlined workflow accelerates order fulfillment and improves resource allocation by reducing idle time for both robots and human operators.
By dynamically adapting to fluctuating demands, the system enhances flexibility in the warehouse, enabling more responsive handling of peak times and urgent orders. Additionally, real-time integration between the model and AMRs facilitates better inventory management, ensuring that goods move efficiently through the warehouse without unnecessary storage or delay.
Ultimately, this advanced automation reduces labor costs, optimizes floor space usage, and boosts overall productivity, giving the business a competitive edge in fulfilling customer demands with speed and precision.
Author Bio
Ashutosh Nagar is Solution Architect Digital Transformation for Mygo Consulting, Inc., a global SAP partner company focused on digital Supply Chain and Business Transformation, and enabling the core around SAP S/4HANA. As a solution architect and global supply chain consultant with nearly 25 years of experience, Mr. Nagar has led digital transformations for some of the world’s “top 100” companies. His distinguished career has included working across innovative technologies including Artificial Intelligence and Blockchain for industry-specific business models in Avionics, Aerospace & Defense, Automobile, Engineering, Medical Devices, Pharmaceutical, Food Processing, Infrastructure & Construction, specializing in Warehousing & Distribution and Warehouse Automation, among other diverse sectors. He is SAP Certified in Extended Warehouse Management, Warehouse Management, Material Management, and Transport Management. Additionally, as an APICS Certified Supply Chain Professional, he has led Supply Chain designs and complex end-to-end project implementations in the U.S., India, Australia, China, the Philippines, Singapore, the United Kingdom, Switzerland, and Germany. Mr. Nagar has special expertise in developing and launching ERP systems to optimize Supply Chain processes integrating Plan, Source, Make, and Deliver. He successfully leads cross-cultural teams with an analytical approach to meet his clients’ needs and key performance indicators. Mr. Nagar received his M.B.A. degree from Jiwaji University, Gwalior, India, and earned a Bachelor of Engineering degree in Electronics from Savitribai Phule University, Pune, India.
The post How Autonomous Mobile Robots (AMRs) are Revolutionizing Warehouse Operations and Logistics appeared first on Global Trade Magazine.