Data quality essential in training ChatGPT

Issue 7 2023 AI & Data Analytics

It is a year since OpenAI launched ChatGPT to the public, with adoption rates skyrocketing at an unprecedented pace. By February 2023, Reuters reported an estimated 100 million active users. Fast forward to September, and the ChatGPT website has attracted nearly 1,5 billion visitors, showcasing the platform’s immense popularity and integral role in today’s digital landscape.

Willem Conradie, CTO of PBT Group, reflects on this journey, noting the significant usage and adoption of ChatGPT across various sectors. “The rise of ChatGPT has highlighted significant concerns. These range from biased outputs, question misinterpretation, inconsistent answers, lack of empathy, and security issues. To navigate these, the concept of Responsible AI has gained momentum, emphasising the importance of applying AI with fair, inclusive, secure, transparent, accountable, and ethical intent. Adopting such an approach is vital, especially when dealing with fabricated information when ChatGPT provides incorrect or outdated information,” says Conradie.

Of course, the platform’s versatility extends beyond public use. It serves as a powerful tool in corporate environments, enhancing various business processes such as customer service enquiries, email drafting, personal assistant tasks, keyword searches, and creating presentations. For the best performance, it is essential that ChatGPT provides accurate responses. This necessitates training on data that is relevant to the company and accurate and timely.

“Consider a scenario where ChatGPT is employed to automatically service customer enquiries, with the aim of enhancing customer experience by delivering personalised responses. If the underlying data quality is compromised, ChatGPT may provide inaccurate responses, ranging from minor errors like incorrect customer names to major issues like providing incorrect self-help instructions on the company’s mobile app. Such inaccuracies could lead to customer frustration, ultimately damaging the customer experience and negating the intended positive outcomes.”

Addressing such data quality concerns is paramount. Ensuring relevance is the first step. This requires the data used for model training to align with the business context in which ChatGPT operates. Timeliness is another critical factor, as outdated data could lead to inaccurate responses. The data must also be complete. Ensuring the dataset is free from missing values, duplicates, or irrelevant entries is important, as these could also result in incorrect responses and actions.

Moreover, continuously improving the model through reinforcement learning incorporating user feedback into model retraining cycles, is essential. This assists ChatGPT, and conversational AI models in general, to learn from their interactions, adapt, and enhance their response quality over time.

“The data quality management practices highlighted here, while not exhaustive, serve as a practical starting point. They are applicable not just to ChatGPT, but to conversational AI and other AI applications like generative AI. All this reinforces the importance of data quality across the spectrum of AI technologies,” concludes Conradie.

Share this article:
Share via emailShare via LinkedInPrint this page

Further reading:

Is voice biometrics in banking secure enough?
Access Control & Identity Management AI & Data Analytics
As incidents of banking fraud grow exponentially and become increasingly sophisticated, it is time to question whether voice banking is a safe option for consumers.

Access & identity expectations for 2024
Technews Publishing IDEMIA ZKTeco Gallagher Salto Systems Africa Regal Distributors SA Reditron Editor's Choice Access Control & Identity Management Information Security AI & Data Analytics
What does 2024 have in store for the access and identity industry? SMART Security Solutions asked several industry players for their brief thoughts on what they expect this year.

Unleash the full potential of AI at the edge
Suprema AI & Data Analytics Access Control & Identity Management
Efficient AI algorithms, when embedded in edge access control devices, enable companies to optimise their use of AI and edge processing to deliver reliable and fast authentication.

Integrated transportation security
Guardian Eye AI & Data Analytics Integrated Solutions Logistics (Industry)
HG Travel installs an AI-powered camera system integrated across 115 vehicles throughout a fleet comprising 160 vehicles of different sizes, along with predictive and self-monitoring tools to track tyre condition, fuel consumption and theft, and overall vehicle maintenance.

Embracing next-generation surveillance for safer cities
Surveillance Integrated Solutions AI & Data Analytics
With the South African government highlighting the importance of building smart cities by integrating advanced technologies to make them more resilient and liveable, the role of next-generation network video and surveillance technologies cannot be ignored.

A simple system for complex protection
Surveillance AI & Data Analytics IoT & Automation
Reconeyez is a company that provides autonomous wireless visual verification systems with built-in artificial intelligence. Running on batteries, each device includes communication capabilities, creating a mesh network between the various devices, linking them to the control room.

Lock down your access control with Alcatraz AI
C3 Shared Services Healthcare (Industry) Access Control & Identity Management AI & Data Analytics
Alcatraz AI, represented in South Africa by C3 Shared Services, changes access control by harnessing the power of artificial intelligence and analytics at the edge, where facial recognition becomes the essential credential autonomously.

All aspects of data protection
Technews Publishing Editor's Choice Information Security Infrastructure AI & Data Analytics
SMART Security Solutions spoke to Kate Mollett, Senior Director, Commvault Africa, about the company and its evolution from a backup specialist to a full data protection specialist, as well as the latest announcements from the company.

First telemedicine platform for South Africa
Guardian Eye Healthcare (Industry) AI & Data Analytics
South African employees often struggle to receive timely, affordable, and accessible healthcare. The challenge for many healthcare initiatives within organisations is the melting pot of cultures.

SAP turns every developer into a generative AI developer
AI & Data Analytics
At the SAP TechEd event in 2023, SAP unveiled innovations in generative AI application development and vector database capabilities, as well as new learning opportunities for developers in the era of AI.