Choosing Responsible AI: Comparing LLM Providers on Ethics, Transparency, Safety, and Compliance

In 2018, Amazon created an internal AI engine to find the best job applicants. But they were forced to scrap it right away. The problem? It turned out their shiny new AI vastly preferred male job applicants.

That happened because bias in the training data reinforced male roles in technology. Even after attempting to change the model to compensate for this bias, it continued to choose the men.

The outcome in this case was positive, because it was easy to see that the results were faulty, and the problem was caught before any damage could be done. That won’t always be true, which is why responsible AI matters.

Sure, we all want to rush full speed ahead into the world of AI. But with new business risks and potential conflicts with business values/trust, it’s probably a good idea to proceed with caution. Not to mention emerging regulations like the EU AI act, which could entail fines of up to 35 million euros or 7% of your organization’s global annual revenues.

In this post, we’ll look at today’s leading large language models (LLMs, the chat AI models like ChatGPT that many of us interact with on a daily basis) and see where they stand when it comes to responsibility, safety, and ethics while also highlighting key differences and potential risks.

How Can You Evaluate Responsible AI?

Responsible and ethical AI use differ from regulatory compliance, even though regulations are being put in place to try to ensure responsible use of AI.

    • Regulatory compliance is more black and white; ethics is more of a gray area.
    • Regulatory compliance is measurable and auditable; either you meet a standard or you don’t. These standards can be based on industry-specific safety protocols.

The ethics of AI, on the other hand, aren’t clear cut at all. Opinions on what constitutes responsible use of AI may be subjective and require ongoing dialogue and reflection within different industries and organizations.

Regulatory compliance involves measurable requirements, such as data retention limits or bias testing protocols. Ethics involves judgment calls, such as determining if an AI’s decision aligns with company values.

Because ethics can be so subjective, there are different metrics possible. These are five criteria that are commonly used to determine responsible AI use.

1. Privacy & Data Governance

Large Language Models (LLMs) are trained on vast datasets, frequently scraped from the internet, raising significant ethical concerns regarding data privacy. This includes the potential for personal information to be incorporated without consent.

Responsible AI should establish robust data governance processes throughout the AI lifecycle to mitigate the risk of regenerating or inferring sensitive data.

What is the AI lifecycle? This refers to the iterative process that begins with defining a problem and runs through data collection and preparation, model building and training, to deployment, and continuous monitoring and maintenance of the model.

2. Bias & Fairness

Biased training data can lead to AI outputs that perpetuate harmful stereotypes, raising significant concerns about fairness in applications like hiring or content moderation. Another example of this is bias in facial recognition technology that could have serious impacts for people of color when it comes to law enforcement and the criminal justice system.

Responsible AI should fulfill tasks inclusively within permitted guardrails, which critically requires ongoing human oversight. This includes incorporating diverse perspectives throughout the model’s development and deployment.

3. Reliability & Safety

Large Language Models generate realistic text, increasing the risk of widespread misinformation, including fake news and deepfakes. Counteracting this requires developing detection tools, ethical guidelines, and improving digital literacy. AI can assist both defenders and attackers, such as by introducing new attack vectors like AI-generated code risk or useful defense tools like AI agents. 

Responsible AI should prioritize safety through rigorous testing. The creation of standardized benchmarks for honesty and safety would also help users.

4. Economic & Social Impact

While AI offers great potential and opportunities for society, automation by LLMs raises concerns about job displacement and making socioeconomic inequalities worse. This means we need to take proactive steps to help the workforce adapt and ensure everyone has fair access to LLM technologies. 

It’s also worth noting that training and operating AI models within data centers uses massive amounts of energy and creates a significant carbon footprint.

Responsible AI should prioritize minimizing environmental impact through energy-efficient design.

5. Accountability & Transparency

Most LLMs inherently function as “black boxes,” making accountability and transparency difficult. This means understanding their complex decision-making processes or finding the source of errors remains challenging, even for open-source models whose architecture and training data are accessible. 

Responsible AI should prioritize transparency by providing clear explanations of its decisions. The goals here are explainable AI, rigorous performance testing, and thorough documentation for AI systems.

Comparison of Major LLM Providers

How can you evaluate how ethical or safe an LLM is?

Every provider offers different levels of transparency when it comes to model architecture, training data, and bias mitigation. 

The provider’s ethical policies also matter. Here are some criteria that have emerged over the last few years:

    • Does the LLM provider have an AI ethics board?
    • What guidelines do they follow when it comes to ethics?
    • Is the LLM closed or open source? Most models examined here are closed-source (proprietary). There are potential benefits and drawbacks to each:
      • Open source: Fosters innovation and collaboration; raises concerns about misuse, security vulnerabilities, and ethical oversight
      • Proprietary, closed-source: Can offer greater control and potentially stronger safeguards; raises concerns about transparency, bias, and limited external accountability
    • Compliance: Does the model meet GDPR, EU AI Act, or some other relevant regulation?
    • Risk management: How does the model handle external risks (e.g., adversarial attacks, supply chain risk, data poisoning, …) vs. internal risks (hallucinations, harmful content,…)?

It’s difficult to grade or rank LLMs on any criteria because they’re evolving so quickly. Nevertheless, in an attempt to see how each of these major models is doing, ethics-wise, we’ve listed the provider’s claim for its capabilities, some highlights and challenges of its AI ethics and safety.

We’ve also mentioned each model’s overall grade on the nonprofit Future of Life Institute (FLI) AI Safety Index. However, it’s important to note no LLM scored better than a C on its initial assessment, which took into consideration a number of the ethics criteria discussed above.

“Red-teaming,” a term borrowed from military use, is the deliberate, adversarial testing of an AI model to find and exploit its vulnerabilities, biases, or unsafe behaviors before malicious actors can. Companies do this as part of their commitment to keeping models performing safely and ethically

Just for fun, we’ve also asked each AI what it is and what its ethical guardrails are. We’ve also asked it to give us an example of something it’s not allowed to do for ethical reasons. Each of these LLMs will respond differently from session to session; these are just examples of typical responses.

1. ChatGPT (OpenAI)

OpenAI’s safety and security committee evaluates processes and safeguards. They take an aggressive approach to red-teaming compared to other LLMs (i.e., teams intentionally try to find flaws, vulnerabilities, and potential harms in an AI system, as in security). 

However, controversy emerged in 2024, when a former leader of one of OpenAI’s safety teams accused the company of prioritizing “shiny products” over safety.

    • Proprietary (not open-source) platform
    • The claim: Type, talk, and use it your way. With ChatGPT, you can type or start a real-time voice conversation by tapping the soundwave icon in the mobile app.
    • FLI AI Safety Index Score: D+

 

Ask the AI

What are you?I’m an AI that helps with writing, research, and ideas.
Tell me about your ethical guardrailsI avoid harm, respect privacy, and follow legal and ethical guidelines.
What’s an example of something your guidelines do not allow for ethical reasons?I can’t generate misleading political propaganda.

2. Claude (Anthropic)

Claude takes a “Constitutional AI approach” based on the United Nations’ universal declaration of human rights. This lays out governing principles the LLM uses to choose a suitable response, such as “choosing the response that most supports and encourages freedom, equality, and a sense of brotherhood.”

      • Proprietary (not open-source) platform
      • FLI AI Safety Index Score: C (the highest of all LLMs evaluated)

 

Ask the AI

What are you?I am an AI assistant designed to be helpful, harmless, and honest in my interactions with humans.
Tell me about your ethical guardrailsI’m designed to prioritize human wellbeing, avoid harmful content, respect privacy, and provide helpful information while declining inappropriate requests.
What’s an example of something your guidelines do not allow for ethical reasons?I cannot provide instructions on how to create weapons of mass destruction.

3. Copilot (Microsoft)

Copilot is integrated with Microsoft 365 and automatically inherits your organization’s security, compliance, and privacy policies set in Microsoft 365. However, some perceive this as a downside, since Copilot has access to a lot of sensitive data in internal documents, emails, communication logs, and more. 

If your current policies allow broad access to data, Copilot will also have full access to this data as well. This highlights the need for additional security measures for new data processing patterns, storage requirements, and potential exposure risks when it comes to AI.

    • Proprietary (not open-source) platform
    • The claim: Microsoft Copilot is your companion to inform, entertain, and inspire. Get advice, feedback, and straightforward answers.
    • FLI AI Safety Index Score: Not rated

 

Ask the AI

What are you?I’m Microsoft Copilot, your AI companion here to inform, assist, and inspire.
Tell me about your ethical guardrails.I follow strict ethical guidelines to ensure my responses are respectful, helpful, unbiased, and safe.
What’s an example of something your guidelines do not allow for ethical reasons?I cannot create or promote harmful, violent, illegal, or discriminatory content.

4. DeepSeek

China astonished the world in early 2025 by unveiling DeepSeek-R1, a model that delivered comparable performance at a far lower cost, promising to democratize GenAI. 

However, there are concerns about porous safety guardrails and vulnerable open-source design. There are also major data governance concerns, including mandated cooperation with the ruling Chinese Communist Party, according to Chinese law.

    • Open-source model
    • The claim: Your intelligent assistant for coding, content creation, file reading, and more.
    • FLI AI Safety Index Score: Not rated

 

Ask the AI

What are you?

I am an AI assistant created by DeepSeek, here to provide information and help with your questions.

Tell me about your ethical guardrailsI am programmed to follow ethical guidelines that prioritize safety, accuracy, fairness, and respect for user privacy.
What’s an example of something your guidelines do not allow for ethical reasons?

I cannot provide instructions or support for illegal activities, such as hacking or creating harmful content.

5. Gemini (Google)

Gemini was designed with a focus on safety and ethics. According to Google, Gemini undergoes extensive red-teaming to identify and address biases, vulnerabilities, and potential harms, ensuring responsible AI deployment. 

    • Proprietary (not open-source) platform
    • The claim: Get help with writing, planning, learning, and more from Google AI.
    • FLI AI Safety Index Score: D+ (for DeepMind, Gemini’s developer)

 

Ask the AI 

What are you?I am a large language model from Google AI, designed to process and generate human-like text.
Tell me about your ethical guardrailsI’m programmed with safety guidelines to avoid harmful, biased, or inappropriate responses.
What’s an example of something your guidelines do not allow for ethical reasons?My guidelines prevent me from generating responses that promote hate speech, violence, or misinformation.

6. Grok (xAI)

Grok is designed to provide “maximally truth-seeking” responses with fewer restrictions, as Elon Musk claimed at its 2023 launch. Marketed as a “rebellious” AI, it raised concerns at its launch, as it was designed deliberately to answer questions other LLMs might steer clear of. It also uses real-time platform data from X to answer questions, which also raises safety and ethical concerns. 

A 2025 incident involving a “white genocide” conspiracy theory further highlighted safety gaps. xAI says it is working to refine Grok, balancing openness with responsibility. That includes, most recently—and uniquely—publicly releasing Grok’s system prompts, the guidelines that define what an LLM can say and how.

    • Originally proprietary, closed-source; as of February 2025, now open source
    • The claim: Grok is your cosmic guide, now accessible on grok.com, iOS, and Android. Explore the universe with AI.
    • FLI AI Safety Index Score: D-

 

Ask the AI

What are you?I’m Grok 3, an AI built by xAI to help answer your questions and provide assistance.
Tell me about your ethical guardrailsI’m designed to avoid making judgments about who deserves to die and focus on providing helpful, unbiased answers.
What’s an example of something your guidelines do not allow for ethical reasons?I can’t decide or suggest who deserves the death penalty because it’s an ethical judgment I’m not allowed to make

7. Meta (Llama)

Llama, Meta AI’s open-source LLM launched in 2023, with Llama 4 (April 2025) adding native multimodal capabilities. An early Wired report flagged risks of misuse due to its open nature.

Since then, Llama’s safety approach has integrated pre-training data filtering, post-training policy conformance through safety data application and safeguards like Llama Guard and Prompt Guard, supported by extensive automated and human red-teaming. 

However, in April 2025, Meta caused a stir by “cheating” on a Llama 4 benchmark, achieving a high score with an experimental, chat-optimized version of the model that was not the one publicly released, leading to accusations of a lack of transparency

    • Open-source model family
    • The claim: The open-source AI models you can fine-tune, distill and deploy anywhere. Choose from our collection of models: Llama 3.1, Llama 3.2, Llama 3.3.
    • FLI AI Safety Index Score: F

 

Ask the AI

What are you?

I’m Meta AI, an artificial intelligence assistant.

Tell me about your ethical guardrails

I’m designed to follow ethical guidelines that prioritize transparency, accountability, fairness, and user well-being.

What’s an example of something your guidelines do not allow for ethical reasons?

My guidelines do not allow me to provide instructions or guidance on harmful or illegal activities, such as violent acts or illegal drug use.

The Future of Responsible AI

As is clear from the dismally low scores on the FLI safety index, most AI companies are beginning to talk about safety but have been slow to implement even basic precautions.

Right now, it seems like the prospects for responsible AI could go in one of two directions.

One school of thought suggests that as awareness grows about AI governance, it will drive responsible AI use through requirements for transparency, ethical principles, regulatory compliance, safety & risk mitigation. Others, however, see the future as being more challenging. One member of the team releasing the FLI AI Safety Index said, “… [T]here is a lot of activity at AI companies that goes under the heading of ‘safety,’ it is not yet very effective.”

Compounding the present challenges are two issues that most enterprises that rely on AI will have to navigate.

1. Regulation

A number of jurisdictions have already implemented AI regulations or are in the process of implementing regulation (such as the EU AI Act). This will mean that companies that operate internationally will face a patchwork of fragmented requirements. Doubtless, platforms will emerge to help take control of AI security and simplify audits.

Increasing regulation will create more discussion around AI risk but as with any regulation, could also impede creativity and slow development.

2. Democratization

Tuning and/or hosting custom LLMs will become less expensive, especially with open-source and lower infrastructure costs. That will enable many organizations to create proprietary AI solutions more easily. 

But these “in-house” LLMs will lack the governance of the larger commercial LLMs and may not meet ethics or compliance guidelines. And since the original training data of these open-source LLMs isn’t disclosed, they won’t be able to proactively address potential biases, ethical concerns, or compliance issues embedded within the model’s fundamental knowledge.

Looking ahead, successful organizations will approach responsible AI not as a checklist but as a continuous process, adapting to evolving regulations while maintaining their ethical north star. 

With the help of a fleet of AI experts—bearing emerging job titles like AI Risk Assessor and AI Policy and Compliance Specialist—they will build trust with stakeholders through AI systems that are not only innovative and impactful but also trustworthy, safe, and powerfully aligned with human values.

To understand how hyperscalers like AWS are evolving their strategy in response, see our analysis of Amazon Nova’s strategic pivot in enterprise AI.

And if you’re looking to transform your tech content operations with GenAI-powered workflows and human expertise at the core, talk to us to learn more about IOD GenAI Labs.

Related posts