[2411.08003] Can adversarial attacks by large language models be attributed?

0 1 minute read

[Submitted on 12 Nov 2024 (v1), last revised 9 Jul 2025 (this version, v2)]

PDF view of the paper entitled can infection attacks be attacked by large language models?

PDF HTML (experimental) view

a summary:Supporting outputs from large language models (LLMS) in hostile settings-where electronic attacks and misleading information campaigns are presented-great challenges are likely to grow in importance. We are dealing with this support problem from both theoretical and experimental perspective, depending on the official language theory (border definition) and data -based analysis of the LLM expanded ecosystem. By modeling the possible LLM output collection as an official language, we analyze whether limited text samples can determine the original model uniquely. Our results show that, in light of moderate assumptions about the capabilities between the models, some groups of LLMS cannot be defined mainly from their outputs alone. We define four systems of theoretical guidance: (1) An endless category of LLM LLM languages (the classic result of gold for 1967); (2) An endless category cannot be identified from the probability LLMS (by extending the inevitable state); (3) A limited category of LLMS is an inevitable (in line with Angluin’s Tell-Tale standard); And (4) Even a limited category of possibility llms can be unconnected (we offer a new counter -example that determines this negative result). To complete these theoretical ideas, we determine the explosion in the number of reasonable form (the space of the hypothesis) to remove a specific in recent years. Even in light of the conservative assumptions, the Open Open Source Form has been seized on most of the new data set, most of which are doubled the number of distinguished candidate models almost every 0.5 years, and allowing multiple accurate deals to 0.28 years. This consensual growth, in addition to the exceptional calculations for supporting the possibility of brute force in all models and potential users, makes the comprehensive support not possible in practice.