Koobdoo - Learning Equilibrium Mascot
The Learning Equilibrium

Navigating AI’s Limitations: Lessons from a cabin crew training POC

By Sanjay MukherjeeOctober 3, 2024

Illustration generated with ChatGPT to depict the importance of domain expert collaboration with AI platforms to generate more accurate and high quality assets for use within professional training environments

I have been experimenting with AI platforms as a general end-user and as a domain specialist end-user. What’s the difference between these two approaches? As a general end-user I am a layperson with no knowledge or proficiency about a domain specialisation, thus interactions on a subject are in general language terms without the context of what terms mean within a particular area (technical and professional jargon). It therefore follows that as a layperson I would not be able to judge the accuracy of AI responses or verify the efficacy of any proposed structure or solution or strategy, nor would I know about the appropriate authoritative sources for that subject matter. 

On the other hand, as a domain specialist I interact as a person with sufficient academic and practical knowledge about various aspects of that particular domain. Thus my queries and task requests on the subject would be within the context of the industry, job areas and related competencies. It therefore follows that as a domain specialist I would be able to judge to a reasonable extent the accuracy of AI responses, verify and correct or query the efficacy of any proposed structure or solution or strategy, and I would also know the appropriate authoritative sources to crosscheck for that subject matter.

When I compare the results of both approaches, I understand the limitations of AI platforms. 

Here is an example. I was recently evaluating image generation platforms for the purpose of creating images or illustrations to use within an airline cabin crew service training module that I was scripting. After multiple iterations on each platform (including prompts with reference images and AI constructed prompts), I concluded that the models had very little experience with service training within a commercial aircraft cabin setting. To the extent that the platforms could not distinguish between flight attendants and passengers, nor could they distinguish between double-aisle and single-aisle aircraft nor the seat layout orientations or service trolleys and type of food served on flights. I performed this research on 6 different platforms and oddly, they made the same mistakes with some differences … which I find very interesting. Displayed within this article is one output each from four of the platforms.

Image generated with MidJourney to depict cabin crew serving meals to passengers in an Airbus 330 economy class cabin. What are the errors?

Image generated with Leonardo to depict cabin crew serving meals to passengers in an A330 economy class cabin. What are the errors?

Image generated with Stability to depict cabin crew serving meals to passengers in an A330 economy class cabin. What are the errors?

For that particular Proof-of-Concept I utilised an alternative solution for the accompanying images. For the long-term, I may have to wait for trained models to emerge (which means time) or decide to train a specific AI model on cabin crew service matters (which means time, effort and money). Meanwhile, of course my options are stock photos, staged photo shoots, illustrations, among others and each of these have an impact on cost, cummulative effort and timeline for project execution.

As a layperson writing a blog or creating a Meme or a general post, I may be able to use one or two images from the lot generated from that research effort if I ignore some parameters that are not critical for general purposes. But for training environments, the details are crucial. And therein comes in one aspect of responsible use of AI that I am realising is crucial if I want to really adopt AI within my work process: providing authoritative feedback (not just repeated modification to the prompt to achieve an acceptable output). It is also important to reach out to peers within an industry to see if there are other platforms which may already be providing outputs up to acceptable quality.

Btw, as an experiment I asked Claude Assistant and ChatGPT for headlines. I eventually went with my own headline but again, the similarities between headline option patterns from both platforms were unmistakable.

CHATGPT Headline Options"AI for General vs. Specialist Users: A Deep Dive into Platform Limitations""Exploring the Difference Between General and Domain-Specialist Interactions with AI""From Layperson to Expert: How Context Changes AI Results""The Challenge of Using AI for Specialized Domains: A Case Study in Airline Training""AI Platforms: Can They Bridge the Gap Between General Use and Domain Expertise?""Why AI Falls Short in Industry-Specific Applications: Lessons from Cabin Crew Training""General vs. Specialist End-Users: How AI Performance Varies Across Use Cases""Navigating AI’s Limitations: A Specialist’s Perspective on Image Generation Platforms""What AI Can’t Yet Do: Reflections on Using AI for Domain-Specific Training""The Specialist’s Dilemma: When AI Isn’t Enough for Professional Use"

CLAUDE AI Assistant Headline Options"AI through Expert Eyes: The Gap Between General and Specialist Use""When AI Falls Short: A Specialist's Perspective on Current Limitations""Navigating AI as a Domain Expert: Insights from Airline Training""The Two Faces of AI: Layperson vs. Specialist Experiences""Beyond the Prompt: Why Domain Expertise Matters in AI Adoption""AI's Blind Spots: A Case Study in Specialized Industry Applications""Responsible AI Use: Balancing General Utility and Specialist Needs""The Hidden Challenges of AI in Specialized Fields: An Insider's View""From Generalist to Specialist: Rethinking AI's Role in Professional Settings""AI's Learning Curve: Bridging the Gap Between General Use and Expert Demands"

I also asked both platforms for an opinion on this article, and I reproduce a bit from each. Again, note the similarities. 

ChatGPT: One of the strengths of the article is its clear demonstration of how the same AI platform can produce vastly different results depending on the user's knowledge and ability to judge those outputs. The distinction between a general user, who may accept AI output at face value, and a domain expert, who can evaluate and challenge the accuracy, is crucial. This insight emphasizes the need for responsible AI use, particularly in professional settings where errors or oversights can have real-world consequences. The example about airline service training is particularly powerful. It highlights AI’s current limitations in understanding industry-specific nuances, even when provided with detailed prompts and reference images. Experience with multiple platforms showing similar failures reinforces the idea that AI still lacks deep contextual learning in specialized fields, which poses challenges for professionals seeking to integrate AI into their workflows.

Claude: The author's experience with image generation for airline cabin crew service training demonstrates that current AI models may lack the specific knowledge required for highly specialized tasks. This limitation is particularly evident in fields where precise details and industry-specific knowledge are crucial. The observation that multiple platforms made similar mistakes is intriguing. It suggests that current AI models may share common knowledge gaps or biases, which could be due to similarities in training data or underlying algorithms. The article highlights the importance of providing authoritative feedback to improve AI systems, rather than just tweaking prompts to achieve acceptable results. This approach contributes to the long-term improvement of AI models.

How would you summarise this article? Why would you do it that way? As a subeditor, I would create different, more nuanced, summaries depending on what kind of publication I was editing for. From experience, I would say that different cabin crew training professionals would summarise the article with greater focus on the training and train asset context than the AI aspect of it.

In a professional setting the context is crucial.