Data quality essential in training ChatGPT

Issue 7 2023 AI & Data Analytics

It is a year since OpenAI launched ChatGPT to the public, with adoption rates skyrocketing at an unprecedented pace. By February 2023, Reuters reported an estimated 100 million active users. Fast forward to September, and the ChatGPT website has attracted nearly 1,5 billion visitors, showcasing the platform’s immense popularity and integral role in today’s digital landscape.

Willem Conradie, CTO of PBT Group, reflects on this journey, noting the significant usage and adoption of ChatGPT across various sectors. “The rise of ChatGPT has highlighted significant concerns. These range from biased outputs, question misinterpretation, inconsistent answers, lack of empathy, and security issues. To navigate these, the concept of Responsible AI has gained momentum, emphasising the importance of applying AI with fair, inclusive, secure, transparent, accountable, and ethical intent. Adopting such an approach is vital, especially when dealing with fabricated information when ChatGPT provides incorrect or outdated information,” says Conradie.

Of course, the platform’s versatility extends beyond public use. It serves as a powerful tool in corporate environments, enhancing various business processes such as customer service enquiries, email drafting, personal assistant tasks, keyword searches, and creating presentations. For the best performance, it is essential that ChatGPT provides accurate responses. This necessitates training on data that is relevant to the company and accurate and timely.

“Consider a scenario where ChatGPT is employed to automatically service customer enquiries, with the aim of enhancing customer experience by delivering personalised responses. If the underlying data quality is compromised, ChatGPT may provide inaccurate responses, ranging from minor errors like incorrect customer names to major issues like providing incorrect self-help instructions on the company’s mobile app. Such inaccuracies could lead to customer frustration, ultimately damaging the customer experience and negating the intended positive outcomes.”

Addressing such data quality concerns is paramount. Ensuring relevance is the first step. This requires the data used for model training to align with the business context in which ChatGPT operates. Timeliness is another critical factor, as outdated data could lead to inaccurate responses. The data must also be complete. Ensuring the dataset is free from missing values, duplicates, or irrelevant entries is important, as these could also result in incorrect responses and actions.

Moreover, continuously improving the model through reinforcement learning incorporating user feedback into model retraining cycles, is essential. This assists ChatGPT, and conversational AI models in general, to learn from their interactions, adapt, and enhance their response quality over time.

“The data quality management practices highlighted here, while not exhaustive, serve as a practical starting point. They are applicable not just to ChatGPT, but to conversational AI and other AI applications like generative AI. All this reinforces the importance of data quality across the spectrum of AI technologies,” concludes Conradie.




Share this article:
Share via emailShare via LinkedInPrint this page



Further reading:

What is your ‘real’ security posture?
BlueVision Editor's Choice Information Security Infrastructure AI & Data Analytics
Many businesses operate under the illusion that their security controls, policies, and incident response plans will hold firm when tested by cybercriminals, but does this mean you are really safe?

Read more...
IQ and AI
Leaderware Editor's Choice Surveillance AI & Data Analytics
Following his presentation at the Estate Security Conference in October, Craig Donald delves into the challenge of balancing human operator ‘IQ’ and AI system detection within CCTV control rooms.

Read more...
New agent gateway to mitigate shadow MCP risk
AI & Data Analytics
Agent Gateway, a new capability in the Tray AI Orchestration platform, gives IT power to develop approved MCP tools with policies, permissions, versioning and compliance, then publish them via MCP for secure agent use.

Read more...
AI and automation are rewriting the cloud security playbook
Technews Publishing AI & Data Analytics
Old-school security relied on rules-based systems that flagged only what was already known. AI flips the script: it analyses massive volumes of data in real-time, spotting anomalies that humans or static rules would miss.

Read more...
Onsite AI avoids cloud challenges
SMART Security Solutions Technews Publishing Editor's Choice Infrastructure AI & Data Analytics
Most AI programs today depend on constant cloud connections, which can be a liability for companies operating in secure or high-risk environments. That reliance exposes sensitive data to external networks, but also creates a single point of failure if connectivity drops.

Read more...
neaMetrics 2025: Year in review
AI & Data Analytics
With a stronger team, a broader portfolio, and a clear vision for what’s next, neaMetrics is well positioned to continue delivering smarter, more connected security solutions in 2026 and beyond.

Read more...
GenAI fraud forcing banks to shift from identity to intent
AI & Data Analytics Information Security Financial (Industry)
The complexity and velocity of modern fraud schemes, from deepfakes to fraud and scams involving social engineering, demand more than just investment in new tools; they need adaptability and expanding the security net.

Read more...
Who has access to your face?
Access Control & Identity Management AI & Data Analytics
While you may be adjusting your privacy settings on social media or thinking twice about who is recording you at public events, the reality is that your facial features may be used in other contexts.

Read more...
The impact of AI on security
Technews Publishing Information Security AI & Data Analytics
Today’s threat actors have moved away from signature-based attacks that legacy antivirus software can detect, to ‘living-off-the-land’ using legitimate system tools to move laterally through networks. This is where AI has a critical role to play.

Read more...
Who has access to your face?
Access Control & Identity Management Residential Estate (Industry) AI & Data Analytics
While you may be adjusting your privacy settings on social media or thinking twice about who is recording you at public events, the reality is that your facial features may be used in other contexts,

Read more...










While every effort has been made to ensure the accuracy of the information contained herein, the publisher and its agents cannot be held responsible for any errors contained, or any loss incurred as a result. Articles published do not necessarily reflect the views of the publishers. The editor reserves the right to alter or cut copy. Articles submitted are deemed to have been cleared for publication. Advertisements and company contact details are published as provided by the advertiser. Technews Publishing (Pty) Ltd cannot be held responsible for the accuracy or veracity of supplied material.




© Technews Publishing (Pty) Ltd. | All Rights Reserved.