Data quality essential in training ChatGPT

Issue 7 2023 AI & Data Analytics

It is a year since OpenAI launched ChatGPT to the public, with adoption rates skyrocketing at an unprecedented pace. By February 2023, Reuters reported an estimated 100 million active users. Fast forward to September, and the ChatGPT website has attracted nearly 1,5 billion visitors, showcasing the platform’s immense popularity and integral role in today’s digital landscape.

Willem Conradie, CTO of PBT Group, reflects on this journey, noting the significant usage and adoption of ChatGPT across various sectors. “The rise of ChatGPT has highlighted significant concerns. These range from biased outputs, question misinterpretation, inconsistent answers, lack of empathy, and security issues. To navigate these, the concept of Responsible AI has gained momentum, emphasising the importance of applying AI with fair, inclusive, secure, transparent, accountable, and ethical intent. Adopting such an approach is vital, especially when dealing with fabricated information when ChatGPT provides incorrect or outdated information,” says Conradie.

Of course, the platform’s versatility extends beyond public use. It serves as a powerful tool in corporate environments, enhancing various business processes such as customer service enquiries, email drafting, personal assistant tasks, keyword searches, and creating presentations. For the best performance, it is essential that ChatGPT provides accurate responses. This necessitates training on data that is relevant to the company and accurate and timely.

“Consider a scenario where ChatGPT is employed to automatically service customer enquiries, with the aim of enhancing customer experience by delivering personalised responses. If the underlying data quality is compromised, ChatGPT may provide inaccurate responses, ranging from minor errors like incorrect customer names to major issues like providing incorrect self-help instructions on the company’s mobile app. Such inaccuracies could lead to customer frustration, ultimately damaging the customer experience and negating the intended positive outcomes.”

Addressing such data quality concerns is paramount. Ensuring relevance is the first step. This requires the data used for model training to align with the business context in which ChatGPT operates. Timeliness is another critical factor, as outdated data could lead to inaccurate responses. The data must also be complete. Ensuring the dataset is free from missing values, duplicates, or irrelevant entries is important, as these could also result in incorrect responses and actions.

Moreover, continuously improving the model through reinforcement learning incorporating user feedback into model retraining cycles, is essential. This assists ChatGPT, and conversational AI models in general, to learn from their interactions, adapt, and enhance their response quality over time.

“The data quality management practices highlighted here, while not exhaustive, serve as a practical starting point. They are applicable not just to ChatGPT, but to conversational AI and other AI applications like generative AI. All this reinforces the importance of data quality across the spectrum of AI technologies,” concludes Conradie.




Share this article:
Share via emailShare via LinkedInPrint this page



Further reading:

Security ready to move out of the basement
AI & Data Analytics Security Services & Risk Management
Panaseer believes that in 2026, a board member at a major corporation will lose their job amid rising breaches and legal scrutiny, as organisations recognise that cyber risk is a business risk that CISOs cannot shoulder alone.

Read more...
Understanding the promise and perils of AI
AI & Data Analytics
Samuel Turcotte believes AI may kill us all. In this article, a condensed version of a white paper, he discusses AI's development and associated risks, all the while still hoping for a bright future.

Read more...
Access data for business efficiency
Continuum Identity Editor's Choice Access Control & Identity Management AI & Data Analytics Facilities & Building Management
In all organisations, access systems are paramount to securing people, data, places, goods, and resources. Today, hybrid systems deliver significant added value to users at a much lower cost.

Read more...
AI-powered classification across large areas
Axis Communications SA Surveillance Products & Solutions AI & Data Analytics
Axis Communications announced the upcoming launch of two innovative radars. Each device delivers a 180° or 270° horizontal field of detection, with accurate AI-powered classification across large areas, 24/7, in all weather and lighting conditions.

Read more...
Top five AIoT trends in 2026
IoT & Automation AI & Data Analytics
As we enter 2026, the convergence of artificial intelligence (AI) and IoT infrastructure is reshaping industries, unlocking unprecedented opportunities to optimise operations, enhance security, and improve sustainability.

Read more...
Banking’s AI reckoning
Financial (Industry) News & Events AI & Data Analytics
From agentic commerce disputes to quantum-powered risk modelling, SAS experts offer a ‘banker’s dozen,’ 13 industry-defining predictions that will separate institutions that master intelligent banking from those still struggling with the basics.

Read more...
Securing a South African healthcare network
Surveillance Healthcare (Industry) AI & Data Analytics
VIVOTEK partnered with local integrator Chase Networks and distributor Rectron to deliver a fully integrated security ecosystem, providing PathCare with a centralised view of all facilities, simplifying monitoring of sensitive laboratory areas, and ensuring SOP compliance.

Read more...
DeepAlert appoints Howard Harrison as CEO
DeepAlert News & Events AI & Data Analytics
DeepAlert has appointed Howard Harrison as chief executive officer. DeepAlert’s founder and CEO of the past six years, Dr Jasper Horrell, will transition into a newly created role as chief innovation officer.

Read more...
The year of the agent
Information Security AI & Data Analytics
The dominant attack patterns in Q4 2025 included system-prompt extraction attempts, subtle content-safety bypasses, and exploratory probing. Indirect attacks required fewer attempts than direct injections, making untrusted external sources a primary risk vector heading into 2026.

Read more...
AI agent suite for control rooms
Milestone Systems News & Events Surveillance AI & Data Analytics
Visionplatform.ai announced the public launch of its new visionplatform.ai Agent Suite for Milestone XProtect, adding reasoning, context and assisted decision-making on top of existing video analytics and events — without sending video to the cloud.

Read more...










While every effort has been made to ensure the accuracy of the information contained herein, the publisher and its agents cannot be held responsible for any errors contained, or any loss incurred as a result. Articles published do not necessarily reflect the views of the publishers. The editor reserves the right to alter or cut copy. Articles submitted are deemed to have been cleared for publication. Advertisements and company contact details are published as provided by the advertiser. Technews Publishing (Pty) Ltd cannot be held responsible for the accuracy or veracity of supplied material.




© Technews Publishing (Pty) Ltd. | All Rights Reserved.