Uncategorized

Living guidelines for generative artificial intelligence are needed by scientists

Living Guidelines for Generative Artificial Intelligence – Why Scientists Must Oversee its Use: The HELMA project, the independent scientific auditor and generative AI companies

There is a chance that the data sets will be biased before generative Artificial Intelligence systems are released to the public. It might ask, for example, to what extent do interactions with generative AI distort people’s beliefs3 or vice versa? This will be challenging as more AI products arrive on the market. The problems are highlighted by the HELMA initiative, a living benchmark for improving the transparency of language models that was developed by the Stanford Center for Research on Foundation Models in California.

Artificial intelligence systems can make videos of synthetic faces and voices that are indistinguishable from real people, making them a threat to the internet. People, politicians, the media, and institutions could be hurt by these harms.

  1. The independent scientific auditing body and generative AI companies should have a portal where users who discover biased or inaccurate responses can easily report them (the independent scientific auditing body should have access to this portal and actions taken by the company).

It’s unclear if self-regulation will be effective in the long run. The industry has continuously been inventing itself, and it is at an advanced level in the advancement of AI. Regulations drawn up today will be outdated by the time they become official policy, and might not anticipate future harms and innovations.

  1. Researchers should always acknowledge and specify for which tasks they have used generative AI in (scientific) research publications or presentations.

  2. Human assessment is always used in evaluating research funding proposals.

Source: Living guidelines for generative AI — why scientists must oversee its use

A First Step in Establishing a Certification Body for Artificial Intelligence and its Implications to Open-Science Practices (Extended Abstract)

The Guidelines were co-developed by the team of Huub Dijstelbloem.

What we are suggesting is more than a certification label for a product, even though a first step could be to develop such a mark. To prevent the introduction of harmful Artificial Intelligence products the auditing body should inform policymakers, users and consumers if a product is safe and effective.

  1. The organization and body should at least include, but not be limited to, experts in computer science, behavioural science, psychology, human rights, privacy, law, ethics, science of science and philosophy (and related fields). It is important that the interests of all stakeholders are represented, not only in the private and public sectors. Standards for composition of the team might change over time.

Similar bodies exist in other domains, such as the US Food and Drug Administration, which assesses evidence from clinical trials to approve products that meet its standards for safety and effectiveness. The Center for Open Science is a Virginia based organization that is trying to change the way scientific work is done.

Other fields are applied to these approaches. The Stroke Foundation in Australia has adopted guidelines to allow people to access new medicines quickly. The foundation now updates its guidelines every three to six months, instead of roughly every seven years as it did previously. The Australian National Clinical Evidence Taskforce updates its recommendations every 20 days.

Another example is the Transparency and Openness Promotion (TOP) Guidelines for promoting open-science practices, developed by the Center for Open Science6. A metric called TOP Factor allows researchers to easily check whether journals adhere to open-science guidelines. A similar approach could be used.

How much will we need to spend to audit and regulate artificial intelligence? An expert group analysis of tech companies’ preferences and priorities for scientific research in the 20th century

Investments will be needed. The auditing body will be the most expensive element, because it needs computing power comparable to that of OpenAI or a large university consortium. It is likely that at least one billion dollars will be required by the body to set up. That is roughly the hardware cost of training GPT-5 (a proposed successor to GPT-4, the large language model that underlies ChatGPT).

A $1 million interdisciplinary scientific expert group should be set up in the early 20th century to look at what is needed, and report back within six months. This group should sketch scenarios for how the auditing body and guidelines committee would function, as well as budget plans.

Some investment might come from the public purse, from research institutes and nation states. Tech companies should contribute via a pooled and independently run mechanism.

The scientific auditing body would not be able to enforce the guidelines at first. However, we are hopeful that the living guidelines would inspire better legislation, given interest from leading global organizations in our dialogues. For comparison, the Club of Rome, a research and advocacy organization aimed at raising environmental and societal awareness, has no direct political or economic power, yet still has a large impact on international legislation for limiting global warming.

Tech companies may prefer voluntary guidelines over binding ones if regulations are to hamper innovation. For example, many companies changed their privacy policies only after the European Union put its General Data Protection Regulation into effect in 2016 (see go.nature.com/3ten3du).However, our approach has benefits. Auditing and regulation can make people more trust and less risk of malpractice.

These benefits could be an incentive for tech companies to fund the infrastructure needed to run and test artificial intelligence systems. However, some might be reluctant to do so, because a tool failing quality checks could produce unfavourable ratings or evaluations leading to negative media coverage and declining shares.

Another challenge is maintaining the independence of scientific research in a field dominated by the resources and agendas of the tech industry. Its membership must be managed to avoid conflicts of interests, given that these have been demonstrated to lead to biased results in other fields7,8. A strategy for dealing with such issues needs to be developed9.

A Stanford Study of OpenAI and Other Open AI Language Models: a Comparative Study on OpenAI, Google, Stable Diffusion 2 and Meta

When OpenAI published details of the stunningly capable AI language model GPT-4, which powers ChatGPT, in March, its researchers filled 100 pages. Some important details about how it was actually built or how it works were left out.

That was no accidental oversight, of course. Openai and other big companies are trying to keep their most prized software under wraps to protect it from misuse and to give their competitors a chance to catch up.

The Stanford team looked at 10 different AI systems, mostly large language models like those behind ChatGPT and other chatbots. These include widely used commercial models like GPT-4 from OpenAI, the similar PaLM 2 from Google, and Titan Text from Amazon. The report also surveyed models offered by startups, including Jurassic-2 from AI21 Labs, Claude 2 from Anthropic, Command from Cohere, and Inflection-1 from chatbot maker Inflection.

And they examined “open source” AI models that can be downloaded for free, rather than accessed exclusively in the cloud, including the image-generation model Stable Diffusion 2 and Llama 2, which was released by Meta in July this year. (As WIRED has previously covered, these models are often not quite as open as they might seem.)

The model was scored on 13 criteria, including how transparent it was about data used to train the model, and whether or not it includes copyrighted material. The study looked for information about the hardware used to train and run a model, the software frameworks employed and the project’s energy consumption.

Researchers found that no model achieved more than 50% on their transparency scale across all the criteria. Meta’s Llama 2 was crowned the most open as Amazon’s Titan Text was judged the least transparent. But even an “open source” model like Llama 2 was found to be quite opaque, because Meta has not disclosed the data used for its training, how that data was collected and curated, or who did the work.