Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

Anthropic to fund initiative to develop new third-party AI benchmarks to evaluate AI models

Anthropic on Tuesday announced a new initiative to develop new benchmarks to test the capabilities of advanced artificial intelligence (AI) models. The AI ​​company will fund the project and has invited interested entities to apply. The company said existing benchmarks are insufficient to fully test the capabilities and impact of new large language models (LLMs). As a result, a new set of assessments focused on AI safety, advanced capabilities, and societal impact must be developed, Anthropic said.

Anthropic to Fund New Benchmarks for AI Models

In an editorial office sendAnthropic has highlighted the need for a comprehensive third-party assessment ecosystem to move beyond the limited scope of current benchmarks. The AI ​​company announced that through its initiative, it will fund third-party organizations that want to develop new assessments for AI models that focus on high quality and safety standards.

For Anthropic, high-priority areas include activities and questions that can measure an LLM’s AI safety levels (ASL), advanced capabilities in idea and response generation, and the societal impact of these capabilities.

In the ASL category, the company highlighted several metrics, including the ability of AI models to assist or act autonomously in the execution of cyberattacks, the potential of models to assist in the creation or improvement of knowledge of the creation of chemical, biological, radiological and nuclear (CBRN) risks, national security risk assessment, and more.

In terms of advanced capabilities, Anthropic noted that benchmarks should be able to assess AI’s potential to transform scientific research, participation and rejection toward harmfulness and multilingual capabilities. Additionally, the AI ​​firm said that it is necessary to understand an AI model’s potential to impact society. For this, assessments should be able to target concepts such as “harmful bias, discrimination, overdependence, addiction, attachment, psychological influence, economic impacts, homogenization and other broad societal impacts.”

In addition to this, the AI ​​company also listed some principles for good assessments. It said that assessments should not be available in the training data used by AI as they often turn into a memory test for the models. It also encouraged keeping between 1,000 and 10,000 tasks or questions to test the AI. It also asked organizations to use subject matter experts to create tasks that test performance in a specific domain.


Affiliate links may be automatically generated – please see our ethics statement for more details.

Deja un comentario