ABC of A and I – DigitalisationDiscourses – DiDi

AI Act

Following a legislative process lasting around three years, the European Union adopted standardised EU-wide regulations for the design and use of AI systems in May 2024. At the heart of the legislation is the categorisation of AI systems into risk levels: systems with an unacceptable risk, for example when using certain manipulative techniques or emotion recognition in the workplace, will be banned. In areas where there are risks to health, safety or fundamental rights, such as critical infrastructure or law enforcement, AI systems (high-risk AI systems) are subject to special requirements for data governance, transparency & documentation, human oversight, cyber security & robustness and the implementation of a risk management system. In particular, high-risk AI used by public authorities is required to identify risks for data subjects in a ‘fundamental rights impact assessment’ and to take measures to protect fundamental rights. AI systems with a ‘general purpose’, including large language models such as ChatGPT, are subject to a number of regulations, even if they do not qualify as high-risk systems.

These regulations will gradually enter into force in a transitional phase until August 2026. The EU Commission is responsible for monitoring and enforcing the regulation, and the EU member states must also designate competent authorities.

Algorithm

An algorithm is a clear, finite instruction for solving a problem. In addition to the familiar computer algorithms, a cooking recipe is also an algorithm in this sense. In computer science, however, algorithms are written in programming languages (including mathematical equations). AI models usually contain several algorithms, e.g. for processing data, for learning from the data and for applying the decision rules learnt.

Automation bias

The human tendency to particularly trust in the accuracy and quality of the results of automated processes and rely on them also without critical reflection.

Bias

In general, bias describes an incorrect representation of reality. This can apply to people (cognitive bias), but also to technologies, e.g. a thermometer that systematically displays temperatures that are too high. In the context of AI, the term is often used in the sense of unfairness or discrimination when the results of an AI system systematically deviate from a social norm, for example favouring or disadvantaging members of certain demographic groups. There can be various reasons for this: For example, the data used for training may have a distorted statistical distribution. A well-known example is software tested by Amazon for recruitment purposes, which disadvantaged women – because they were underrepresented in the company's recruitment history used as a data source.

Biases can occur even if the characteristic in question (e.g. gender) is not explicitly included in the data. This is because correlating characteristics (so-called proxies) in the data set, some of which also may be relevant to the task to be solved, allow conclusions to be drawn about the characteristics affected by bias. Technical approaches for bias mitigation based on the removal of the relevant features (so-called ‘fairness through unawareness’) have therefore proven to be ineffective. Instead, various methods are used to recognise and compensate for biases with the help of the affected characteristics.

Chatbots

Computer programmes that can communicate with humans using natural language (in writing). Natural language processing (NLP), the processing and generation of natural languages, has been an important part of AI since the early days of the field. Designing communication between humans and computers in such a way that it comes close to a human conversation is one of the most important measures of AI progress. Initially, chatbots were mostly based on rule-based approaches in which answers were selected from a database based on certain keywords. A well-known example is Joseph Weizenbaum's ELIZA. Today's chatbots such as OpenAI's ChatGPT (so-called large language models, LLMs) are based on machine learning using large amounts of text from the web and are therefore able to conduct very rich conversations. Thanks to the transformer technique, the chatbots can not only use individual words or parts of sentences as the basis for their responses, but also the connections between them. However, being based on statistical patterns in the texts used for training, this technique has its own limitations. For example, the truthfulness of statements made by these chatbots cannot be guaranteed (so-called hallucinations).

Deep Fakes

AI-generated media content (mostly images, audio or video) that appears authentic and is created and distributed with the intention of deceiving recipients about the authenticity of the content presented. The name is derived from the fact that deep neural networks are involved in the production of the content. Simple deep fakes can be created by end users without special technical knowledge with simple applications.

Explainable AI (XAI)

This term covers approaches that make the use and results of AI models transparent and understandable. XAI methods are often suitable for making the reasons for a certain result more comprehensible. In image recognition, for example, the image areas that were decisive for the AI result can be highlighted on heat maps. Related approaches with a different focus are transparency, ‘interpretable AI’ or ‘legibility’.

In addition to the technical possibilities, it is also in the spirit of XAI to communicate openly about the use of AI and to provide users and those affected with relevant information and background information on the AI system. The promotion of AI expertise is also provided for in the EU's AI Act.

Fair Machine Learning/Algorithmic Fairness (fair ML)

Research discipline and field of activism aiming to align the development and use of AI with social norms, with a particular focus on avoiding discriminatory disadvantages as a result of biases. This includes the establishment of legal and ethical requirements and their implementation within the contexts of application and the algorithms embedded in them. In addition to fairness and non-discrimination, important aspects include transparency, participation and accountability.

Generative AI

AI models that can generate content such as written language, audio or graphics in response to an input (known as a prompt). As generative AI enables the creation of deceptively real content, particularly in the area of graphics and videos, such models can be deployed for useful and creative purposes as well as disinformation or manipulation (like deep fakes). According to the EU's AI Act, users must therefore be able to recognise when they are dealing with an AI system and synthetic content must be labelled in a machine-readable manner, i.e. recognisable as synthetic by computers.

High risk AI system

In some areas, the use of AI can lead to particular risks to safety, health or fundamental rights. The EU AI Act therefore lists certain AI applications used in these areas as high-risk AI systems and subjects them to stricter regulations. In addition to the installation or use of AI as a safety component in certain products (such as toys or aviation equipment), these areas include biometrics, critical infrastructure, education, employment, essential services and benefits, law enforcement, migration, administration of justice and democratic processes. However, not every system in these areas is necessarily a high-risk AI system, as only certain applications, described in more detail in Annex III of the AI Act, are covered.

Human in the Loop

Refers to a person who is embedded in a (partially) automated process in order to monitor it and intervene if necessary. A common criticism of the concept holds that the human involved is not always actually able to understand how an AI system works or relies too much on its results (automation bias) and therefore effectively serves only as a scapegoat. Other concepts such as ‘meaningful human control’ are therefore being researched.

Labels

In supervised learning, labels are the values that are assigned to the input data as the correct result or output for a set of inputs. For example, an image data base may contain the labels “apple” or “pear” tagged to the images. With the help of these labels, the AI receives feedback during the training process, which it uses to improve its results. Labels are usually created by humans (labelling), often by clickworkers or via crowdsourcing. Anyone who marks emails as spam or solves CAPTCHA tasks in order to identify themselves as human on the web is in many cases helping to train AI in precisely this way. The term ‘target variable’ is related to this: It refers to the characteristic that the AI should recognise (in our example, ‘type of fruit’).

Machine learning (ML)

Machine learning is one of the most important basic approaches in artificial intelligence and an important basis for the current success of AI applications. In contrast to other AI approaches (e.g. so-called expert systems), ML utilises the statistical correlations available in training data to find efficient solutions for tasks such as classification (assignment of an input to an output) on the basis of only a few explicitly programmed learning functions (hyperparameters). Neural networks, for example, are an important ML method.

Neural networks

A specific type of ML architecture that is modelled on the structure of the human brain. Mathematical functions modelling ‘neurons’ are linked in layers. Each neuron processes inputs it receives using certain weights, which are adjusted during the training process, and forwards them to neurons in the next layer. If neural networks have a particularly complex structure, i.e. include many layers of neurons, they are also referred to as deep neural networks (DNN).

Overfitting

If an AI model bases its functions too strongly on the statistical properties of the training data, it cannot achieve good results with unknown data – in other words, it does not derive a solution from the data that is suitable for generalisation. This is called overfitting. This phenomenon can also result in biases, as the sample in the training data may be biased to the detriment of certain groups. In order to recognise and avoid overfitting, AI models are validated and tested. There are also algorithms that make overfitting less likely. In the contrary case of underfitting, the AI model is not able to capture the complexity of the data, which can also lead to biases.

Reinforcement learning

In reinforcement learning, the AI explores a (mathematically modelled) environment in which it can receive feedback (rewards or punishments) for actions. Based on this, it learns to adapt its behaviour and achieve an optimal result. A popular example of this area of machine learning is the training of robotic hoovers to move efficiently around a room. This method is also used in AI systems for games (such as AlphaGo, which was able to beat the world's best Go players).

Right to an Explanation

Many people think that persons affected by the use of automated systems should have a right to understand how a decision or evaluation affecting them has been reached and what logic was applied in the process. However, the details of the scope and content of such a right differ. In the legal debate in particular, the question of a ‘right to an explanation’ refers to discussions about the extent of information obligations in the General Data Protection Regulation (GDPR):Those in favour conclude from the obligation to provide information about the ‘logic involved and the significance and envisaged effects’ of an automated decision that data subjects have a right to receive an explanation. The main objections to this are that these obligations only apply at all if an important decision is made in a fully automated way and that different requirements can be placed on an ‘explanation’ (for example, the normative justification or statistical justification of the decision), some of which cannot be easily fulfilled technically with AI models (Explainable AI/XAI). The prevailing opinion in legal scholarship is that those affected must at least be provided with an insight into the most important decision criteria and their weighting.

Supervised learning

Supervised learning is an area of machine learning (ML). In supervised learning, the AI learns using training data in which the correct output (target variable) is already known and available as a label. During the training phase, the AI adapts its solution function depending on whether its output matches the feedback from the label – for example, if an image labelled as a ‘dog’ is incorrectly classified as a ‘cat’ during classification. The test phase then serves to check whether the AI predicts the correct label even with previously unknown input data.

Training data

Data used to develop the parameters that an AI model applies to solve a task. In supervised learning, training data is typically divided into several data sets: Training data proper as well as validation and test data. Validation data is used to adjust the hyperparameters of the AI model after initial training has taken place. For this purpose, several models are often trained and the one that delivers the best results with the validation data is selected. Finally, the test data is used to check whether the model also achieves good results on data with which it has not been trained.

Transparency fallacy

Assumption that the fullest possible insight into a situation, such the processing of one’s personal data by a controller, automatically leads to a better decision in terms of one's own goals. Research shows, however, that although transparency is valuable and necessary for self-determined action, the latter is not achieved through transparency alone due to other factors such as people's limited rationality or a lack of alternatives.

Trustworthy AI

A key term for a large number of initiatives from politics, science and civil society. They are united by the goal of developing AI in such a way that it can also be trusted when used in areas that are critical for safety or fundamental rights. According to the EU Commission's High-Level Expert Group on AI, this includes the elements of lawfulness, ethics and robustness. In particular, the EU's regulatory strategy is aimed at trustworthy AI. Competing, partly overlapping concepts are human-centred AI or ‘AI for Good’.

Unsupervised learning

In unsupervised learning, as opposed to supervised learning, no target variables are provided as the desired output. Instead, AI is used to explore less structured data and, for example, to sort it based on similar characteristics (known as clustering).