EL-ALCANCE

Robotic Feeding Assistant for Quadriplegic Individuals in LATAM

Francesco Crivelli1, Bryan Sow1, Claire Beaudin1, Boris Tomov1 Professor Roberto Horrowitz2
1University of California Berkeley, Tukuypaj Chile, Recursive at Berkeley
Hardware SO-ARM
Figure 1: Robot feeding demonstration with Francesco

Abstract

We present a novel low-cost assistive robotic feeding system developed in collaboration between UC Berkeley's Recursive Pioneers and Tukuypaj, a Chilean non-profit organization serving individuals with severe motor disabilities. We learnt that the organization is often short-staffed in having available personnel to feed their quadriplegic community during mealtimes. Therefore, our primary project objective was to build a robotic system that can help reach for food and feed quadriplegic beneficiaries. We aspired to create a robotic system that grants the quadriplegic community greater self-sufficiency, while devising a solution that can be replicated at scale. Our work demonstrates that high-performance feeding assistance can be achieved using affordable hardware - specifically the SO-100 robotic arm platform ($110 per arm), which we've enhanced through custom modifications and advanced control algorithms. Prior to system development, we conducted extensive interviews with Tukuypaj's beneficiaries and caregivers, gathering crucial insights about the challenges and requirements of assisted feeding, which directly informed our technical specifications and design choices. Our technical implementation leverages the LeRobot framework for data collection and utilizes an Adversarial Control Transformer (ACT) architecture trained on demonstration data, exploring the feasibility of combining imitation learning with affordable hardware to create accessible robotic solutions. Initial development has focused on establishing reliable control mechanisms and movement patterns using the LeRobot framework's sophisticated tools for robot control and training, with an emphasis on system stability and movement precision while maintaining the low-cost advantage of the SO-100 platform. Through systematic testing in controlled environments, we are laying the groundwork for future real-world applications, representing an important step toward making assistive robotics technology more accessible through cost-effective solutions. By combining affordable hardware with sophisticated control frameworks, we demonstrate the potential for bringing advanced robotics assistance to resource-constrained settings, setting the stage for future field testing and deployment in therapeutic environments across Tukuypaj's centers and similar care facilities worldwide.

Figure 2: Time lapse of team setup and development process
Design Ideology

Design Ideology: Low-Cost and Scalable Robotics to Tackle Real-World Problems

The first steps of our project involved going through the design process, conducting interviews to gather data on the problem and gain insight into the target users that we are designing for. There were two key insights that we learnt about the beneficiaries during the interview process:

  • Most of the quadriplegic interviewees are non-vocal, and heavily rely on facial expressions to communicate. For instance, when a caretaker picks up a certain food item, they would smile to indicate if they would like the food item or not.
  • We learnt that their main diet consists of chopped up food that requires the use of a spoon. One caretaker was kindly willing to help take a video of their feeding process. Studying the video, we learnt the path motions that our robotic arm needed to replicate to accomplish such a task.

The desired functionality of our system involves the creation of path trajectories for the robot system to bring the food to the beneficiary's mouth, while enabling the beneficiary to enact the movement on his/her own. Our main design criteria involves creating a simplified system that is easy to implement, ensuring robustness in reproducibility and reliability, all while ensuring that the system does not deviate too far from their usual feeding routine to ensure a sense of familiarity and easy adoption. The design that we chose is in alignment with this design philosophy.

We first considered constrained situations where we have fixed types of food and fixed relative positions between the food item and the beneficiary. We knew that when our robotic system would be deployed, it would need to perform its function in various environments and situations. Furthermore, we also wanted direct human input over the path trajectories of the robot arm, such that the caretakers can directly dictate how the robot brings the food to the beneficiary and personalize the trajectories for each of them. To fulfill these conditions, we thought it was best for our system to employ a path planning setup where we had two robot arms – one which allowed a caretaker to manually create the path trajectory, and the other arm replicating and storing the motion. We knew that creating and obtaining hardware for two arms was much more difficult than for one, but we felt that this design choice would allow for the path planning element of the project to be easier to implement, and allow caretakers greater influence over the movement of the robot arms as they interact with the beneficiaries in the physical world.

Secondly, we also wanted to create a bridge where the beneficiaries can directly communicate with the robot system. Learning that they mainly communicate through facial expressions, we knew that incorporating computer vision to read their expressions was crucial in our design setup. It was hard to decide upon a mechanism for communication as in reality we learnt that every beneficiary has a different way of communicating their needs through facial expressions (some smile while others chuff), hence we thought it was easiest to standardize the manner of communication. We decided upon having the robot system react to the beneficiaries opening their mouths to bring the food to them.

In tandem, the overall flow of the use of the robot system was intended as follows: the caretakers can manually create the path trajectories, and the beneficiaries can execute the movement of the arms along these trajectories by opening their mouths.

We also aspired to incorporate a second tier to our design philosophy, where we considered dynamic situations involving different types of food, positions of food relative to the beneficiary, beneficiary interactions with the robot system, etc. We wanted to create a form of general intelligence where the robot would "know" how to bring the food item to the beneficiary, regardless of the situation it is being presented. Therefore, we wanted to try implementing a learning aspect to the project. By creating various scenarios, and using the design of the robot to dictate how the food is brought to the beneficiary, we hoped to enact reinforcement learning where the robot learns the intended action to carry out given the scenario, and develop a policy which enable the robot to autonomously create pathways and bring the food item to the beneficiary.

We felt that the design choices we made fulfilled our design objectives in creating a simplified design which was efficient in implementation while being highly reproducible in executing what we wanted the system to do in serving the needs of the beneficiaries. We tackled the design philosophy on the constrained and dynamic front of varying levels of difficulty, while ensuring that user-friendliness remained our top priority, ensuring a created system that would be durable. We were unsure over how robust this system would be in executing the task in the real world, but felt that it is a good starting point to build off of.

Luis - End User
Hardware SO-ARM
Figure 3: Hardware demonstration of SO-ARM robot
3D Printing Process
Figure 4: 3D printing process of robot components
Building the Hardware

Implementation

Journal of Building the SOARM

We spent a considerable amount of time choosing a design for the robotic arm. We carefully reviewed and selected a few key constraints we wanted to apply to the robot:

  • Low cost and replicable: We wanted to choose an accessible design that could be easily reproducible. If we were to make a successful product, we would want to be able to deploy it easily.
  • Safe to use: Safety was a huge priority. The robots we are designing are going to act autonomously, operating near the user's face.
  • Easily adaptable: We wanted to ensure that our robot worked in various environments to avoid having a consistent setup necessary for the robot to function properly. Making it easy to maintain also became an important factor.

Selection and Building Process

Due to the time constraints of the class, we decided it was best to use an open source robot found online. We eventually settled on the SOARM-100, as it satisfied all of our requirements outlined above, and it fit the overall design philosophy we wanted to employ. The SOARM-100 uses a teleoperation setup that involves 2 arms, one where a human operator manually creates the intended path trajectories, and the other copying the created motion synchronously (TheRobotStudio et al., 2024).

follower arm leader arm

The SO-ARM 100 uses STS3215 servos, which utilize metal gears and magnetic encoders. With the encoders in the servos and the lerobot linux environment codebase used to operate the SO-ARM 100, the joint states of angle of rotation of each servo over a period of time could be easily recorded, stored and replicated. This step was crucial to simulate real-world scenarios accurately and align the robot's behavior with the actions and intentions of the caregiver.

3D Printing and Assembly

We printed the SOARM-100 using PLA, which is a durable, accurate, and relatively cheap material. According to our calculations, both of the robots together (the follower and the leader) incur the total cost of: 511 grams of PLA * $0.025/gram * 2 robots = $25.55 total. The servos are slightly more expensive than a typical servo (roughly $30 each), but we agreed that the cost difference was worth both the increase in safety for the user, as well as the longevity of the robot overall.

Properly tolerancing the prints quickly became the next challenge. The SOARM-100 is designed to have the STS3215 servos press fit into it, to ensure precision. All parts were printed at Jacobs Hall, with Prusa i3 Mk3 printers over a span of 72 hours, with a total print time of around 48 hours.

All Hardware Components

Software Integration

The STS3215 servos were connected via a daisy chain and configured with unique IDs. We utilized scripts from the official repository (Alibert et al., 2024) for:

  • Finding port addresses for microcontroller-computer connection
  • Configuring motor IDs
  • Centering motors for accurate rotation ranges (-2048 to 2048)

Camera Integration

For the mouth-open detection system, we adapted an existing camera mount design to work with the Innomaker 720p USB2.0 UVC Camera, modifying it to fit the SOARM-100 (Chekuri, 2024). We learnt the need of putting a camera right next to the gripper to ensure that we could gather camera data of the gripper opening and closing. This enables visual feedback for the robot's operation and user interaction.

Camera Mount Design
Boris Teleoperation
Figure 4: Boris demonstrating teleoperation
data_visualization_01
Figure 5: Dataset visualization of pick-and-place task
Francesco Teleoperation
Figure 6: Francesco performing teleoperation
Boris Teleoperation
Figure 7: Data visualization of robot movements
Teleoperation and Data Collection

To develop a robust and reliable robotic feeding assistant, we implemented a comprehensive data collection strategy utilizing our teleoperation setup. The system was designed to capture both the physical movements of the robot and visual feedback from multiple perspectives, enabling us to create a rich dataset for training and validation purposes.

Data Collection Environment

Our data collection setup consisted of two strategically placed cameras:

  • An end-effector mounted camera providing a first-person perspective of the manipulation tasks
  • An overhead camera positioned 1.5 meters above the workspace center, capturing the entire scene including the follower robot and the beneficiary

This dual-camera configuration allowed us to simultaneously monitor the robot's precise movements and maintain a comprehensive view of the interaction space. The overhead camera proved particularly valuable for pick-and-place tasks, providing clear visibility of object positions and the target locations.

Data Recording and Processing

During each teleoperated session, we recorded:

  • Joint states from the STS3215 servo encoders at each frame
  • Synchronized video feeds at 30 FPS from both cameras
  • Temporal alignment of joint positions with visual data

The data collection process yielded approximately 300 episodes across various tasks, including pick-and-place operations and feeding motions with baby carrots. Each episode was automatically processed to calculate key statistical metrics, including mean trajectories and standard deviations, which were subsequently uploaded to Hugging Face for further analysis.

Data Validation and Refinement

To ensure data quality, we implemented a rigorous validation process:

  • Visual inspection of each recorded episode
  • Classification of episodes into successful and unsuccessful attempts
  • Temporal trimming to isolate relevant motion segments
  • Verification through physical replay on the original setup

The replay verification step was particularly crucial, as it allowed us to confirm the accuracy of our recorded trajectories and ensure that the robot could faithfully reproduce the intended motions. This double-validation approach significantly enhanced the reliability of our dataset.

Custom Scripts and Implementation

The elalcance contains five files in total, the .pycache and __init__.py files allow for functions created within this folder to be exported and executable. The mouth_recognition.py file provides a window for a live demonstration of the code used for implementing the mesh for mouth-open detection, and mouth_recog.py turns the code into a exportable function named mouth_open_activate which returns a boolean value, only giving a True output when it detects that a mouth being open. Otherwise, when the mouth remains closed the program continues to run.

Custom Scripts Structure
Custom scripts structure and implementation

(Please refer to our README.md for more information on the codebase) Through this systematic approach to data collection and validation, we established a comprehensive dataset that effectively captures the nuances of human-guided robotic manipulation. This data forms the foundation for developing robust path-planning algorithms and ensuring consistent, reliable performance in assistive feeding scenarios.

Mouth Detection on Bryan
Figure 8: Mouth detection on Bryan
Francesco Eating with Hand
Figure 9: Francesco eating with hand
Boris Eating
Figure 10: Data collection
Robot FPV Feeding Francesco
Figure 11: Robot FPV feeding Francesco
Training ACT Policy and Results

Training Methodology

We implemented the Affordance-based Conditioning for Transformer (ACT) policy training using a comprehensive dataset collected from our teleoperation setup. The training process was conducted using Google Colab's computational resources, although the extensive nature of the transformer architecture resulted in significant training durations. Our primary focus was on the pick-and-place task, for which we had collected robust demonstration data.

The training pipeline incorporated several key components:

  • Temporal sequence processing of joint states and visual features
  • Integration of multi-modal inputs (proprioceptive and visual)
  • Transformer-based attention mechanisms for trajectory prediction
  • Cross-validation using held-out demonstration episodes

Low-Level Policy Language Conditioning

Our approach to policy conditioning followed a structured data collection template:

  • Baseline dataset: 100 episodes of standard feeding interactions
  • Positional variations: 50 episodes from alternative feeding positions
  • Object variations: 50 episodes per different food item

The language conditioning was implemented through semantic signal association. For instance, we collected 50 episodes with the signal "feed carrot" and another 50 with "feed cherry," enabling the model to develop object-specific manipulation strategies. This methodology creates a semantic mapping between natural language commands and corresponding manipulation behaviors.

Visual Activation System

Based on insights from interviews with beneficiaries and their families at the non-profit organization, we identified mouth opening as a critical signal for feeding initiation. Our system architecture comprises sophisticated computer vision components working in concert with robotic control systems.

The technical implementation includes:

  • Video Processing Pipeline:
    • MediaPipe libraries integration for real-time face detection
    • OpenCV-based FaceMesh implementation for facial point tracking
    • Euclidean distance computation between upper and lower lip coordinates
    • Confidence threshold validation for activation triggers
  • Hardware Configuration:
    • Logitech C922 webcam (30 FPS) for live demonstrations
    • Innomaker 720p USB2.0 UVC Camera for video recordings
    • Jetson Nano processor for computational tasks
  • Safety Features:
    • Confidence threshold requirements for activation
    • Continuous monitoring and validation of facial landmarks
    • Real-time verification of mouth state detection

Training Results and Analysis

The training process yielded interesting insights into the challenges of robotic manipulation learning:

  • Loss curves exhibited notable volatility, particularly in the pick-and-place task
  • Trajectory variations in velocity and approach angles contributed to training noise
  • Dataset heterogeneity impacted convergence characteristics
Training Loss Plot

The observed noise in the loss function can be attributed to several factors:

  • Diverse demonstration velocities in the training dataset
  • Varying approach angles during pick-and-place operations
  • Natural variability in human teleoperation styles
  • Multi-modal learning challenges across visual and proprioceptive domains

To ensure the reliability of our trained policies, we conducted extensive validation through physical replay on the original setup, verifying the model's ability to generalize across different scenarios while maintaining safety constraints.

Figure 8: Full system demonstration
Figure 9: Bryan testing the feeding system

Conclusion and System Analysis

System Performance and Design Criteria Evaluation

Our implemented system successfully demonstrated the core functionality of trajectory storage and replication, activated through a computer vision-based mouth-opening detection system. The current implementation achieves single-trajectory actuation per mouth-open event, though further development is required to enable multiple trajectory replications across repeated activations.

Technical Challenges and Implementation Constraints

A significant technical challenge emerged in the end-effector design phase. Initial attempts focused on developing CAD-designed spoon holder attachments for the SO-ARM100's end effector, with the intention of enabling food scooping and delivery operations. However, practical testing revealed considerable limitations in this approach:

  • Inconsistent gripper-to-holder attachment mechanics
  • Unstable motion patterns post-attachment
  • Limited precision in food manipulation tasks
Spoon holder prototype design

Future Development Pathways

Given resource and time constraints, we pivoted to direct food manipulation using the existing gripper system, focusing on handling discrete food items such as carrots or broccoli. While this approach demonstrated system viability, our user research indicates that spoon-based feeding remains the preferred method for beneficiaries. Future iterations should prioritize:

  • Comprehensive redesign of the SO-ARM100 end-effector
  • Integration of specialized spoon-holding mechanisms
  • Software optimization for fluid, spoon-based feeding motions
  • Enhanced trajectory replication capabilities

These improvements would align the system more closely with established feeding practices while maintaining the core benefits of our current implementation.

Team
Team Photo

The picture above was taken on Tuesday RRR week at 5:30 am

The entire team contributed countless hours with all-nighters for data collection, extensive robot maintenance, and shared many laughs along the way.

Bryan Sow

  • Major: EECS
  • Contribution: CAD design, Robot assembly and interface
  • Fun Fact: The "keep track and Be On Time"

Francesco Crivelli

  • Major: EECS
  • Contribution: Framework setup, ACT training, robot assembly
  • Fun Fact: The "Certified Crazy Visionary"

Boris

  • Major: Data Science
  • Contribution: Developed computer vision and trained ACT policy
  • Fun Fact: The Joker

Claire

  • Major: Data Science
  • Contribution: 3D printed and rapidly prototyped robot components
  • Fun Fact: The "Be Nice"

Final Project Video

AL-CANCE High-level robotic feeding assistant

References

[1] TheRobotStudio, Moss, J., Cadene, R., & Alibert, S. (2024). TheRobotStudio/so-ARM100: Standard open arm 100. SO-ARM100. https://github.com/TheRobotStudio/SO-ARM100

[2] Alibert, S., Cadene, R., Soare, A., Gallouedec, Q., Zouitine, A., & Wolf, T. (2024). Lerobot/examples/10_use_so100.md at main · Huggingface/lerobot. LeRobot: State-of-the-art Machine Learning for Real-World Robotics in Pytorch. https://github.com/huggingface/lerobot

[3] Chekuri, S. R. (2024). Meetsitaram/koch-V1-1: A version 1.1 of the alexander koch low cost robot arm with some small changes. Low-Cost Robot Arm: Koch v1.1. https://github.com/meetsitaram/koch-v1-1

cite our work

@misc{crivelli2024lealcance,
        author = {Francesco C., Bryan S., Claire B., and Boris},
        title = {Le-Alcance: A Fork of LeRobot for Low-Cost Robotic Learning},
        howpublished = "\url{https://github.com/francescocrivelli/le-alcance}",
        year = {2024}
        }