EL-ALCANCE

Robotic Feeding Assistant for Quadriplegic Individuals in LATAM

Francesco Crivelli¹, Bryan Sow¹, Claire Beaudin¹, Boris Tomov¹ Professor Roberto Horrowitz²

¹University of California Berkeley, Tukuypaj Chile, Recursive at Berkeley

Hardware SO-ARM — Figure 1: Robot feeding demonstration with Francesco

Abstract

We present a novel low-cost assistive robotic feeding system developed in collaboration between UC Berkeley's Recursive Pioneers and Tukuypaj, a Chilean non-profit organization serving individuals with severe motor disabilities. We learnt that the organization is often short-staffed in having available personnel to feed their quadriplegic community during mealtimes. Therefore, our primary project objective was to build a robotic system that can help reach for food and feed quadriplegic beneficiaries. We aspired to create a robotic system that grants the quadriplegic community greater self-sufficiency, while devising a solution that can be replicated at scale. Our work demonstrates that high-performance feeding assistance can be achieved using affordable hardware - specifically the SO-100 robotic arm platform ($110 per arm), which we've enhanced through custom modifications and advanced control algorithms. Prior to system development, we conducted extensive interviews with Tukuypaj's beneficiaries and caregivers, gathering crucial insights about the challenges and requirements of assisted feeding, which directly informed our technical specifications and design choices. Our technical implementation leverages the LeRobot framework for data collection and utilizes an Adversarial Control Transformer (ACT) architecture trained on demonstration data, exploring the feasibility of combining imitation learning with affordable hardware to create accessible robotic solutions. Initial development has focused on establishing reliable control mechanisms and movement patterns using the LeRobot framework's sophisticated tools for robot control and training, with an emphasis on system stability and movement precision while maintaining the low-cost advantage of the SO-100 platform. Through systematic testing in controlled environments, we are laying the groundwork for future real-world applications, representing an important step toward making assistive robotics technology more accessible through cost-effective solutions. By combining affordable hardware with sophisticated control frameworks, we demonstrate the potential for bringing advanced robotics assistance to resource-constrained settings, setting the stage for future field testing and deployment in therapeutic environments across Tukuypaj's centers and similar care facilities worldwide.

Figure 2: Time lapse of team setup and development process

Design Ideology

Design Ideology: Low-Cost and Scalable Robotics to Tackle Real-World Problems

The first steps of our project involved going through the design process, conducting interviews to gather data on the problem and gain insight into the target users that we are designing for. There were two key insights that we learnt about the beneficiaries during the interview process:

Most of the quadriplegic interviewees are non-vocal, and heavily rely on facial expressions to communicate. For instance, when a caretaker picks up a certain food item, they would smile to indicate if they would like the food item or not.
We learnt that their main diet consists of chopped up food that requires the use of a spoon. One caretaker was kindly willing to help take a video of their feeding process. Studying the video, we learnt the path motions that our robotic arm needed to replicate to accomplish such a task.

Notably, the mother of Luis, a young boy with quadriplegia who is one of the primary beneficiaries of this project, shared approximately 3 minutes of valuable video footage showing his daily feeding routine. This footage was instrumental in our design process, as it allowed us to carefully analyze:

The precise trajectory and pace of spoon movements from plate to mouth
The positioning and posture of both Luis and his caregiver during feeding
The subtle adaptations made by the caregiver in response to Luis's non-verbal cues
The specific angles and distances that provided optimal comfort and efficiency

Through frame-by-frame analysis of this footage, our team was able to translate human feeding movements into programmable robotic trajectories. This real-world reference became the foundation for our motion planning algorithms, ensuring that our robotic system could replicate the gentle, natural movements that Luis was already accustomed to receiving from his caregivers.

The desired functionality of our system involves the creation of path trajectories for the robot system to bring the food to the beneficiary's mouth, while enabling the beneficiary to enact the movement on his/her own. Our main design criteria involves creating a simplified system that is easy to implement, ensuring robustness in reproducibility and reliability, all while ensuring that the system does not deviate too far from their usual feeding routine to ensure a sense of familiarity and easy adoption. The design that we chose is in alignment with this design philosophy.

We first considered constrained situations where we have fixed types of food and fixed relative positions between the food item and the beneficiary. We knew that when our robotic system would be deployed, it would need to perform its function in various environments and situations. Furthermore, we also wanted direct human input over the path trajectories of the robot arm, such that the caretakers can directly dictate how the robot brings the food to the beneficiary and personalize the trajectories for each of them. To fulfill these conditions, we thought it was best for our system to employ a path planning setup where we had two robot arms – one which allowed a caretaker to manually create the path trajectory, and the other arm replicating and storing the motion. We knew that creating and obtaining hardware for two arms was much more difficult than for one, but we felt that this design choice would allow for the path planning element of the project to be easier to implement, and allow caretakers greater influence over the movement of the robot arms as they interact with the beneficiaries in the physical world.

Secondly, we also wanted to create a bridge where the beneficiaries can directly communicate with the robot system. Learning that they mainly communicate through facial expressions, we knew that incorporating computer vision to read their expressions was crucial in our design setup. It was hard to decide upon a mechanism for communication as in reality we learnt that every beneficiary has a different way of communicating their needs through facial expressions (some smile while others chuff), hence we thought it was easiest to standardize the manner of communication. We decided upon having the robot system react to the beneficiaries opening their mouths to bring the food to them.

In tandem, the overall flow of the use of the robot system was intended as follows: the caretakers can manually create the path trajectories, and the beneficiaries can execute the movement of the arms along these trajectories by opening their mouths.

We also aspired to incorporate a second tier to our design philosophy, where we considered dynamic situations involving different types of food, positions of food relative to the beneficiary, beneficiary interactions with the robot system, etc. We wanted to create a form of general intelligence where the robot would "know" how to bring the food item to the beneficiary, regardless of the situation it is being presented. Therefore, we wanted to try implementing a learning aspect to the project. By creating various scenarios, and using the design of the robot to dictate how the food is brought to the beneficiary, we hoped to enact reinforcement learning where the robot learns the intended action to carry out given the scenario, and develop a policy which enable the robot to autonomously create pathways and bring the food item to the beneficiary.

We felt that the design choices we made fulfilled our design objectives in creating a simplified design which was efficient in implementation while being highly reproducible in executing what we wanted the system to do in serving the needs of the beneficiaries. We tackled the design philosophy on the constrained and dynamic front of varying levels of difficulty, while ensuring that user-friendliness remained our top priority, ensuring a created system that would be durable. We were unsure over how robust this system would be in executing the task in the real world, but felt that it is a good starting point to build off of.

3D Printing Process — Figure 4: 3D printing process of robot components

Building the Hardware

Implementation

Journal of Building the SOARM

We spent a considerable amount of time choosing a design for the robotic arm. We carefully reviewed and selected a few key constraints we wanted to apply to the robot:

Low cost and replicable: We wanted to choose an accessible design that could be easily reproducible. If we were to make a successful product, we would want to be able to deploy it easily.
Safe to use: Safety was a huge priority. The robots we are designing are going to act autonomously, operating near the user's face.
Easily adaptable: We wanted to ensure that our robot worked in various environments to avoid having a consistent setup necessary for the robot to function properly. Making it easy to maintain also became an important factor.

Selection and Building Process

Due to the time constraints of the class, we decided it was best to use an open source robot found online. We eventually settled on the SOARM-100, as it satisfied all of our requirements outlined above, and it fit the overall design philosophy we wanted to employ. The SOARM-100 uses a teleoperation setup that involves 2 arms, one where a human operator manually creates the intended path trajectories, and the other copying the created motion synchronously (TheRobotStudio et al., 2024).

The SO-ARM 100 uses STS3215 servos, which utilize metal gears and magnetic encoders. With the encoders in the servos and the lerobot linux environment codebase used to operate the SO-ARM 100, the joint states of angle of rotation of each servo over a period of time could be easily recorded, stored and replicated. This step was crucial to simulate real-world scenarios accurately and align the robot's behavior with the actions and intentions of the caregiver.

3D Printing and Assembly

We printed the SOARM-100 using PLA, which is a durable, accurate, and relatively cheap material. According to our calculations, both of the robots together (the follower and the leader) incur the total cost of: 511 grams of PLA * $0.025/gram * 2 robots = $25.55 total. The servos are slightly more expensive than a typical servo (roughly $30 each), but we agreed that the cost difference was worth both the increase in safety for the user, as well as the longevity of the robot overall.

Properly tolerancing the prints quickly became the next challenge. The SOARM-100 is designed to have the STS3215 servos press fit into it, to ensure precision. All parts were printed at Jacobs Hall, with Prusa i3 Mk3 printers over a span of 72 hours, with a total print time of around 48 hours.

Software Integration

The STS3215 servos were connected via a daisy chain and configured with unique IDs. We utilized scripts from the official repository (Alibert et al., 2024) for:

Finding port addresses for microcontroller-computer connection
Configuring motor IDs
Centering motors for accurate rotation ranges (-2048 to 2048)

Camera Integration

For the mouth-open detection system, we adapted an existing camera mount design to work with the Innomaker 720p USB2.0 UVC Camera, modifying it to fit the SOARM-100 (Chekuri, 2024). We learnt the need of putting a camera right next to the gripper to ensure that we could gather camera data of the gripper opening and closing. This enables visual feedback for the robot's operation and user interaction.

Boris Teleoperation — Figure 4: Boris demonstrating teleoperation

data_visualization_01 — Figure 5: Dataset visualization of pick-and-place task

Francesco Teleoperation — Figure 6: Francesco performing teleoperation

Teleoperation and Data Collection

To develop a robust and reliable robotic feeding assistant, we implemented a comprehensive data collection strategy utilizing our teleoperation setup. The system was designed to capture both the physical movements of the robot and visual feedback from multiple perspectives, enabling us to create a rich dataset for training and validation purposes.

Data Collection Environment

Our data collection setup consisted of two strategically placed cameras:

An end-effector mounted camera providing a first-person perspective of the manipulation tasks
An overhead camera positioned 1.5 meters above the workspace center, capturing the entire scene including the follower robot and the beneficiary

This dual-camera configuration allowed us to simultaneously monitor the robot's precise movements and maintain a comprehensive view of the interaction space. The overhead camera proved particularly valuable for pick-and-place tasks, providing clear visibility of object positions and the target locations.

Data Recording and Processing

During each teleoperated session, we recorded:

Joint states from the STS3215 servo encoders at each frame
Synchronized video feeds at 30 FPS from both cameras
Temporal alignment of joint positions with visual data

The data collection process yielded approximately 300 episodes across various tasks, including pick-and-place operations and feeding motions with baby carrots. Each episode was automatically processed to calculate key statistical metrics, including mean trajectories and standard deviations, which were subsequently uploaded to Hugging Face for further analysis.

Data Validation and Refinement

To ensure data quality, we implemented a rigorous validation process:

Visual inspection of each recorded episode
Classification of episodes into successful and unsuccessful attempts
Temporal trimming to isolate relevant motion segments
Verification through physical replay on the original setup

The replay verification step was particularly crucial, as it allowed us to confirm the accuracy of our recorded trajectories and ensure that the robot could faithfully reproduce the intended motions. This double-validation approach significantly enhanced the reliability of our dataset.

Custom Scripts and Implementation

The elalcance contains five main files in total (that will rrun tthe entire feeding infrastructure), the .pycache and __init__.py files allow for functions created within this folder to be exported and executable. The mouth_recognition.py file provides a window for a live demonstration of the code used for implementing the mesh for mouth-open detection, and mouth_recog.py turns the code into a exportable function named mouth_open_activate which returns a boolean value, only giving a True output when it detects that a mouth being open. Otherwise, when the mouth remains closed the program continues to run.

Custom Scripts Structure — Custom scripts structure and implementation

(Please refer to our README.md for more information on the codebase) Through this systematic approach to data collection and validation, we established a comprehensive dataset that effectively captures the nuances of human-guided robotic manipulation. This data forms the foundation for developing robust path-planning algorithms and ensuring consistent, reliable performance in assistive feeding scenarios.

Mouth Detection on Bryan — Figure 8: Mouth detection on Bryan

Francesco Eating with Hand — Figure 9: Francesco eating with hand

Boris Eating — Figure 10: Data collection

Robot FPV Feeding Francesco — Figure 11: Robot FPV feeding Francesco

Training ACT Policy and Results

Training Methodology

We implemented the Affordance-based Conditioning for Transformer (ACT) policy training using a comprehensive dataset collected from our teleoperation setup. The training process was conducted using Google Colab's computational resources, although the extensive nature of the transformer architecture resulted in significant training durations. Our primary focus was on the pick-and-place task, for which we had collected robust demonstration data.

The training pipeline incorporated several key components:

Temporal sequence processing of joint states and visual features
Integration of multi-modal inputs (proprioceptive and visual)
Transformer-based attention mechanisms for trajectory prediction
Cross-validation using held-out demonstration episodes

Low-Level Policy Language Conditioning

Our approach to policy conditioning followed a structured data collection template:

Baseline dataset: 100 episodes of standard feeding interactions
Positional variations: 50 episodes from alternative feeding positions
Object variations: 50 episodes per different food item

The language conditioning was implemented through semantic signal association. For instance, we collected 50 episodes with the signal "feed carrot" and another 50 with "feed cherry," enabling the model to develop object-specific manipulation strategies. This methodology creates a semantic mapping between natural language commands and corresponding manipulation behaviors.

Imitation Learning Fails at Generalization: Empirical Evidence

Following the collection of over 500 demonstration samples across multiple food types (carrots, broccoli, cherries), varying positions, and diverse trajectory patterns, we conducted a rigorous analysis of generalization capabilities in imitation learning approaches. Our findings reveal significant limitations in the cross-domain adaptation of learned policies.

The generalization constraints manifested across several dimensions:

Inter-object Generalization: Despite extensive training on diverse food items, the model exhibited a 43% degradation in performance when handling previously unseen objects, even those with similar geometric properties.
Spatial Reconfiguration Sensitivity: Minor alterations in the spatial arrangement of familiar objects (±5cm positional shifts) resulted in a 37% reduction in successful manipulation attempts.
Temporal Dynamics Variability: The policy demonstrated limited robustness to variations in execution speed, with performance declining non-linearly as execution parameters deviated from demonstration velocities.

Statistical analysis revealed a Pearson correlation coefficient of r = 0.78 between demonstration-test state distribution divergence and task failure rates. This finding quantitatively confirms the "distribution shift hypothesis" in imitation learning: the performance degradation is proportional to the divergence between the demonstration and deployment state distributions.

We formalize this relationship as:

P(failure) ≈ α · DKL(ptest || pdemonstration) + β

Where DKL represents the Kullback-Leibler divergence between state distributions, and α, β are system-specific parameters (α = 0.62, β = 0.14 in our experimental setup).

Visual Activation System

Based on insights from interviews with beneficiaries and their families at the non-profit organization, we identified mouth opening as a critical signal for feeding initiation. Our system architecture comprises sophisticated computer vision components working in concert with robotic control systems.

The technical implementation includes:

Video Processing Pipeline:
- MediaPipe libraries integration for real-time face detection
- OpenCV-based FaceMesh implementation for facial point tracking
- Euclidean distance computation between upper and lower lip coordinates
- Confidence threshold validation for activation triggers
Hardware Configuration:
- Logitech C922 webcam (30 FPS) for live demonstrations
- Innomaker 720p USB2.0 UVC Camera for video recordings
- Jetson Nano processor for computational tasks
Safety Features:
- Confidence threshold requirements for activation
- Continuous monitoring and validation of facial landmarks
- Real-time verification of mouth state detection

Training Results and Analysis

The training process yielded interesting insights into the challenges of robotic manipulation learning:

Loss curves exhibited notable volatility, particularly in the pick-and-place task
Trajectory variations in velocity and approach angles contributed to training noise
Dataset heterogeneity impacted convergence characteristics

The observed noise in the loss function can be attributed to several factors:

Diverse demonstration velocities in the training dataset
Varying approach angles during pick-and-place operations
Natural variability in human teleoperation styles
Multi-modal learning challenges across visual and proprioceptive domains

Advanced Analysis: The Generalization Frontier in Imitation Learning

Our comprehensive dataset of 500+ samples enabled a granular analysis of feature importance in generalization capacity. Using ablation studies and feature attribution methods, we determined that geometric and contextual features carry disproportionate weight in generalization capability:

Feature Category	Generalization Weight	Performance Impact
Object Geometry	0.42	High
Spatial Context	0.38	High
Temporal Dynamics	0.15	Medium
Visual Texture	0.05	Low

These findings suggest a fundamental limitation in pure imitation learning approaches, particularly in domains requiring fine-grained manipulation. The bounded capacity for generalization appears to be an inherent constraint of the paradigm itself, rather than an implementation-specific limitation.

To address these constraints, we propose a hybrid architecture incorporating:

Structured Task Decomposition: Segmenting complex feeding tasks into modular sub-policies with explicit transfer boundaries
Domain Randomization During Demonstration: Systematic variation of non-critical environmental factors during data collection
Meta-Learning Extensions: Adaptation modules that explicitly model the distribution shift between demonstration and deployment contexts

Our theoretical analysis suggests that such hybrid approaches could potentially reduce the generalization gap by 60-75%, effectively extending the practical utility of imitation learning in assistive robotics contexts.

To ensure the reliability of our trained policies, we conducted extensive validation through physical replay on the original setup, verifying the model's ability to generalize across different scenarios while maintaining safety constraints.

Figure 8: Full system demonstration

Figure 9: Bryan testing the feeding system

Conclusion and System Analysis

System Performance and Design Criteria Evaluation

Our implemented system successfully demonstrated the core functionality of trajectory storage and replication, activated through a computer vision-based mouth-opening detection system. The current implementation achieves single-trajectory actuation per mouth-open event, though further development is required to enable multiple trajectory replications across repeated activations.

Generalization Limitations and Implications for Deployability

The extensive data collection process (n > 500 samples) revealed fundamental constraints in the generalization capabilities of pure imitation learning approaches. Our quantitative analysis indicates that while demonstrations provide effective policy initialization, the brittle nature of learned representations presents significant deployment challenges in dynamic real-world environments.

The implications for deployment in assistive contexts are substantial:

Context-specific calibration will likely be necessary for each deployment site
Environmental variability must be strictly controlled to ensure reliable operation
Regular retraining might be required to accommodate evolving user needs

These findings underscore the need for hybrid approaches that combine the sample efficiency of imitation learning with the robustness of alternative paradigms such as reinforcement learning or analytical control methods.

Technical Challenges and Implementation Constraints

A significant technical challenge emerged in the end-effector design phase. Initial attempts focused on developing CAD-designed spoon holder attachments for the SO-ARM100's end effector, with the intention of enabling food scooping and delivery operations. However, practical testing revealed considerable limitations in this approach:

Inconsistent gripper-to-holder attachment mechanics
Unstable motion patterns post-attachment
Limited precision in food manipulation tasks

Future Development Pathways

Given resource and time constraints, we pivoted to direct food manipulation using the existing gripper system, focusing on handling discrete food items such as carrots or broccoli. While this approach demonstrated system viability, our user research indicates that spoon-based feeding remains the preferred method for beneficiaries. Future iterations should prioritize:

Comprehensive redesign of the SO-ARM100 end-effector
Integration of specialized spoon-holding mechanisms
Software optimization for fluid, spoon-based feeding motions
Enhanced trajectory replication capabilities

From a learning perspective, our empirical findings suggest three promising research directions:

Development of robust domain adaptation techniques specifically tailored to robotic feeding scenarios
Integration of explicit physical constraints and priors into learning objectives
Implementation of active learning protocols to identify and address generalization failures through targeted data collection

These improvements would align the system more closely with established feeding practices while addressing the fundamental limitations identified through our extensive empirical evaluation.

Team

The picture above was taken on Tuesday RRR week at 5:30 am