AI

The future of open human feedback

  • Radford, A. et al. Language models are unsupervised multitask learners. OpenAI Blog 1, 9 (2019).

    Google Scholar 

  • Ivanova, A. A. et al. Elements of World Knowledge (EWOK): a cognition-inspired framework for evaluating basic world knowledge in language models. Preprint at https://doi.org/10.48550/arXiv.2405.09605 (2024).

  • Schick, T. et al. Toolformer: language models can teach themselves to use tools. Adv. Neural Inf. Process. Syst. 36, 332 (2024).

    Google Scholar 

  • Imran, M. & Almusharraf, N. Analyzing the role of chatGPT as a writing assistant at higher education level: a systematic review of the literature. Contemp. Educ. Technol. 15, ep464 (2023).

    Article 

    Google Scholar 

  • Barke, S., James, M. B. & Polikarpova, N. Grounded copilot: how programmers interact with code-generating models. Proc. ACM Program. Lang. 7, 85–111 (2023).

    Article 

    Google Scholar 

  • Askell, A. et al. A General language assistant as a laboratory for alignment. Preprint at https://doi.org/10.48550/arXiv.2112.00861 (2021).

  • Ouyang, L. et al. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst. 35, 27730–27744 (2022).

  • Dang, J et al. RLHF can speak many languages: unlocking multilingual preference optimization for LLMs. In Proc. 2024 Conference on Empirical Methods in Natural Language Processing 13134–13156 (ACL, 2024).

  • Bai, Y. et al. Training a helpful and harmless assistant with reinforcement learning from human feedback. Preprint at https://doi.org/10.48550/arXiv.2204.05862 (2022).

  • Thoppilan, R. et al. LaMDA: language models for dialog applications. Preprint at https://doi.org/10.48550/arXiv.2201.08239 (2022).

  • Nakano, R. et al. WebGPT: browser-assisted question-answering with human feedback. Preprint at https://doi.org/10.48550/arXiv.2112.09332 (2021).

  • Wang, Z. et al. HelpSteer2: open-source dataset for training top-performing reward models. Preprint at https://doi.org/10.48550/arXiv.2406.08673 (2024).

  • Ahmadian, A. et al. Back to basics: revisiting REINFORCE-style optimization for learning from human feedback in LLMs. In Proc. 62nd Annual Meeting of the Association for Computational Linguistics Vol. 1 (ACL, 2024).‏

  • Patel, D. & Ahmad, A. Google “we have no moat, and neither does openAI”. SemiAnalysis https://semianalysis.com/2023/05/04/google-we-have-no-moat-and-neither/ (4 May 2023).

  • Introducing Meta Llama 3: the most capable openly available LLM to date. MetaAI https://ai.meta.com/blog/meta-llama-3/ (2024).

  • Boubdir, M., Kim, E., Ermis, B., Fadaee, M. & Hooker, S. Which prompts make the difference? Data prioritization for efficient human LLM evaluation. Preprint at https://doi.org/10.48550/arXiv.2310.14424 (2023).

  • Singh, S. et al. Aya dataset: an open-access collection for multilingual instruction tuning. In Proc. 62nd Annual Meeting of the Association for Computational Linguistics Vol. 1 (ACL, 2024).‏

  • Li, N. et al. The WMDP benchmark: measuring and reducing malicious use with unlearning. Preprint at https://doi.org/10.48550/arXiv.2403.03218 (2024).

  • AI @ Meta Llama Team. The Llama 3 herd of models. (2024).

  • Stiennon, N. et al. Learning to summarize with human feedback. Adv. Neural Inf. Process. Syst. 33, 3008–3021 (2020).

    Google Scholar 

  • Lambert, N., Tunstall, L., Rajani, N. & Thrush, T. HuggingFace H4 stack exchange preference dataset. Hugging Face https://huggingface.co/datasets/HuggingFaceH4/stack-exchange-preferences (2023).

  • Cui, G. et al. ULTRAFEEDBACK: boosting language models with scaled AI feedback. In International Conference on Machine Learning (PMLR, 2024).‏

  • Taori, R. et al. Stanford Alpaca: an instruction-following Llama model. GitHub https://github.com/tatsu-lab/stanford_alpaca (2023).

  • Aakanksha, A. A. et al. The multilingual alignment prism: aligning global and local preferences to reduce harm. In Proc. 2024 Conference on Empirical Methods in Natural Language Processing 12027–12049 (ACL, 2024).

  • Zhao, W. et al. Wildchat: 1m ChatGPT interaction logs in the wild. In 12th International Conference on Learning Representations (ICLR, 2024).

  • Zheng, L. et al. Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. Adv. Neural Inf. Process. Syst. 36, 46595–46623 (2023).

    Google Scholar 

  • Kirk, H. R. et al. The PRISM alignment dataset: What participatory, representative and individualised human feedback reveals about the subjective and multicultural alignment of large language models. Adv. Neural Inf. Process Syst. 37, 105236–105344 (2024).

    Google Scholar 

  • Aroyo, L. et al. DICES dataset: diversity in conversational AI evaluation for safety. Preprint at https://doi.org/10.48550/arXiv.2306.11247 (2023).

  • Don-Yehiya, S., Choshen, L. & Abend, O. The ShareLM collection and plugin: contributing human-model chats for the benefit of the community. Preprint at https://doi.org/10.48550/arXiv.2408.08291 (2024).

  • Köpf, A. et al. OpenAssistant Conversations—democratizing large language model alignment. Preprint at https://doi.org/10.48550/arXiv.2304.07327 (2024).

  • Agnew, W. et al. The illusion of artificial inclusion. In Proc. CHI Conference on Human Factors in Computing Systems 1–12 (ACM, 2024).

  • White, M. et al. The model openness framework: promoting completeness and openness for reproducibility, transparency and usability in AI. Preprint at https://doi.org/10.48550/arXiv.2403.13784 (2024).

  • Liesenfeld, A. & Dingemanse, M. Rethinking open source generative AI: open washing and the EU AI Act. In The 2024 ACM Conference on Fairness, Accountability, and Transparency 1774–1787 (2024).

  • Zheng, L. et al. LMSYS-Chat-1M: a large-scale real-world LLM conversation dataset. In The Twelfth International Conference on Learning Representations (2024).

  • Benkler, Y. The Wealth of Networks: How Social Production Transforms Markets and Freedom (Yale Univ. Press, 2007).

  • Halfaker, A. & Geiger, R. S. ORES: lowering barriers with participatory machine learning in Wikipedia. In Proc. ACM Human–Computer Interaction Vol. 4 https://doi.org/10.1145/3415219 (2020).

  • Palen, L., Soden, R., Anderson, T. J. & Barrenechea, M. Success & scale in a data-producing organization: the socio-technical evolution of OpenStreetMap in response to humanitarian events. In Proc. 33rd Annual ACM Conference on Human Factors in Computing Systems 4113–4122 (ACM, 2015).

  • Bryant, S. L., Forte, A. & Bruckman, A. Becoming Wikipedian: transformation of participation in a collaborative online encyclopedia. In Proc. 2005 ACM International Conference on Supporting Group Work 1–10 (ACM, 2005).

  • Balestra, M., Cheshire, C., Arazy, O. & Nov, O. Investigating the motivational paths of peer production newcomers. In Proc. 2017 CHI Conference on Human Factors in Computing Systems 6381–6385 (ACM, 2017).

  • Kriplean, T., Beschastnikh, I. & McDonald, D. W. Articulations of wikiwork: uncovering valued work in Wikipedia through barnstars. In Proc. 2008 ACM Conference on Computer Supported Cooperative Work 47–56 (ACM, 2008).

  • Mamykina, L., Manoim, B., Mittal, M., Hripcsak, G. & Hartmann, B. Design lessons from the fastest Q&A site in the west. In Proc. SIGCHI Conference on Human Factors in Computing Systems, 2857–2866 (ACM, 2011).

  • Movshovitz-Attias, D., Movshovitz-Attias, Y., Steenkiste, P. & Faloutsos, C. Analysis of the reputation system and user contributions on a question answering website: Stack Overflow. In Proc. 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 886–893 (ACM, 2013).

  • Deci, E. L. Effects of externally mediated rewards on intrinsic motivation. J. Personal. Soc. Psychol. 18, 105 (1971).

    Article 

    Google Scholar 

  • Ryan, R. M. & Deci, E. L. Intrinsic and extrinsic motivation from a self-determination theory perspective: definitions, theory, practices, and future directions. Contemp. Educ. Psychol. 61, 101860 (2020).

    Article 

    Google Scholar 

  • Heltweg, P. & Riehle, D. A systematic analysis of problems in open collaborative data engineering. Trans. Soc. Comput. https://doi.org/10.1145/3629040 (2023).

  • Fang, J., Liang, J.-W. & Wang, H.-C. How people initiate and respond to discussions around online community norms: a preliminary analysis on meta stack overflow discussions. In Companion Publication of the 2023 Conference on Computer Supported Cooperative Work and Social Computing 221–225 (ACM, 2023).

  • Butler, B., Joyce, E. & Pike, J. Don’t look now, but we’ve created a bureaucracy: the nature and roles of policies and rules in Wikipedia. In Proc. SIGCHI Conference on Human Factors in Computing Systems 1101–1110 (ACM, 2008).

  • Zuckerman, E. & Rajendra-Nicolucci, C. From community governance to customer service and back again: re-examining pre-web models of online governance to address platforms’ crisis of legitimacy. Soc. Media Soc. 9, 20563051231196864 (2023).

    Article 

    Google Scholar 

  • Hwang, S. & Shaw, A. Rules and rule-making in the five largest wikipedias. In Proc. International AAAI Conference on Web and Social Media Vol. 16, 347–357 (2022).

  • Kuo, T.-S. et al. Wikibench: community-driven data curation for ai evaluation on wikipedia. In Proc. CHI Conference on Human Factors in Computing Systems (ACM, 2024).

  • Masakhane, M. et al. Masakhane—machine translation for Africa. Preprint at https://doi.org/10.48550/arXiv.2003.11529 (2020).

  • Peng, B. et al. RWKV: reinventing RNNs for the transformer era. In Findings of the Association for Computational Linguistics: EMNLP 14048–14077 (ACL, 2023).

  • Scao, T. L. et al. BLOOM: a 176b-parameter open-access multilingual language model. Preprint at https://doi.org/10.48550/arXiv.2211.05100 (2022).

  • Biderman, S. et al. Pythia: a suite for analyzing large language models across training and scaling. In International Conference on Machine Learning 2397–2430 (PMLR, 2023).

  • Ding, J., Akiki, C., Jernite, Y., Steele, A. L. & Popo, T. Towards openness beyond open access: user journeys through 3 open AI collaboratives. Preprint at https://doi.org/10.48550/arXiv.2301.08488 (2023).

  • Pistilli, G., Muñoz Ferrandis, C., Jernite, Y. & Mitchell, M. Stronger together: on the articulation of ethical charters, legal tools, and technical documentation in ML. In Proc. 2023 ACM Conference on Fairness, Accountability, and Transparency 343–354 (ACM, 2023).

  • Hughes, S. et al. The BigCode project governance card. Preprint at https://doi.org/10.48550/arXiv.2312.03872 (2023).

  • The open source definition. (v1.9) OSI https://opensource.org/osd/ (2007).

  • Brown, E. M. et al. Measuring software innovation with open source software development data. Preprint at https://doi.org/10.48550/arXiv.2411.05087 (2024).

  • Langenkamp, M. & Yue, D. N. How open source machine learning software shapes AI. In Proc. 2022 AAAI/ACM Conference on AI, Ethics, and Society 385–395 (2022).

  • Osborne, C., Ding, J. & Kirk, H. R. The AI community building the future? A quantitative analysis of development activity on Hugging Face hub. J. Comput. Soc. Sci. 7, 1–39 (2024).

    Google Scholar 

  • Kherroubi Garcia, I. et al. Ten simple rules for good model-sharing practices. PLoS Computat. Biol. 21, e1012702 (2025).

    Article 

    Google Scholar 

  • Bonaccorsi, A. & Rossi, C. Comparing motivations of individual programmers and firms to take part in the open source movement: from community to business. Knowl. Technol. Policy 18, 40–64 (2006).

    Article 

    Google Scholar 

  • Osborne, C. Why companies “democratise” artificial intelligence: the case of open source software donations. Preprint at https://doi.org/10.48550/arXiv.2409.17876 (2024).

  • Lakhani, K. R. & Wolf, R. G. Why hackers do what they do: understanding motivation and effort in free/open source software projects. Preprint at SSRN https://doi.org/10.2139/ssrn.443040 (2005).

  • Shah, S. K. Motivation, governance, and the viability of hybrid forms in open source software development. Manag. Sci. 52, 1000–1014 (2006).

    Article 

    Google Scholar 

  • Subramanyam, R. & Xia, M. Free/libre open source software development in developing and developed countries: a conceptual framework with an exploratory study. Decis. Support Syst. 46, 173–186 (2008).

    Article 

    Google Scholar 

  • Takhteyev, Y. Coding Places: Software Practice in a South American City (MIT Press, 2012).

  • Von Krogh, G., Haefliger, S., Spaeth, S. & Wallin, M. W. Carrots and rainbows: motivation and social practice in open source software development. MIS Q. 649–676 (2012).

  • Li, X. et al. Systematic literature review of commercial participation in open source software. ACM Trans. Softw. Eng. Methodol. 34, 33 (2024).

    Google Scholar 

  • Lindman, J., Juutilainen, J.-P. & Rossi, M. Beyond the business model: incentives for organizations to publish software source code. In IFIP International Conference on Open Source Systems (eds Boldyreff, C. et al.) 47–56 (Springer, 2009).

  • Birkinbine, B. Incorporating the Digital Commons: Corporate Involvement in Free and Open Source Software (Univ. Westminster Press, 2020).

  • Fink, M. The Business and Economics of Linux and Open Source (Prentice Hall Professional, 2003).

  • Lerner, J. & Tirole, J. Some simple economics of open source. J. Ind. Econ. 50, 197–234 (2002).

    Article 

    Google Scholar 

  • Woods, D. & Guliani, G. Open Source for the Enterprise: Managing Risks, Reaping Rewards (‘O’Reilly Media, 2005).

  • Osborne, C. et al. Characterising open source co-opetition in company-hosted open source software projects: the cases of PyTorch, TensorFlow, and transformers. Preprint at https://doi.org/10.48550/arXiv.2410.18241 (2024).

  • Pitt, L. F., Watson, R. T., Berthon, P., Wynn, D. & Zinkhan, G. The penguin’s window: corporate brands from an open-source perspective. J. Acad. Mark. Sci. 34, 115–127 (2006).

    Article 

    Google Scholar 

  • Osborne, C. Public–private funding models in open source software development: a case study on scikit-learn. Preprint at https://doi.org/10.48550/arXiv.2404.06484 (2024).

  • Ågerfalk, P. J. & Fitzgerald, B. Outsourcing to an unknown workforce: exploring opensurcing as a global sourcing strategy. MIS Q. 385–409 (2008).

  • West, J. & Gallagher, S. Challenges of open innovation: the paradox of firm investment in open-source software. R&D Manag. 36, 319–331 (2006).

    Article 

    Google Scholar 

  • O’Mahony, S. & Bechky, B. A. Boundary organizations: enabling collaboration among unexpected allies. Admin. Sci. Q. 53, 422–459 (2008).

    Article 

    Google Scholar 

  • Germonprez, M., Allen, J. P., Warner, B., Hill, J. & McClements, G. Open source communities of competitors. ACM Interact. 20, 54–59 (2013).

    Article 

    Google Scholar 

  • Goggins, S., Lumbard, K. & Germonprez, M. Open source community health: analytical metrics and their corresponding narratives. In 2021 IEEE/ACM 4th International Workshop on Software Health in Projects, Ecosystems and Communities 25–33 (IEEE, 2021).

  • Pipatanakul, K. et al. Typhoon: Thai large language models. Preprint at https://doi.org/10.48550/arXiv.2312.13951 (2023).

  • Birhane, A. et al. Power to the people? Opportunities and challenges for participatory AI. In Proc. 2nd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization 1–8 (ACM, 2022).

  • Sloane, M., Moss, E., Awomolo, O. & Forlano, L. Participation is not a design fix for machine learning. In Proc. 2nd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization 1–6 (ACM, 2022).

  • Krishnamurthy, S. & Tripathi, A. K. Bounty programs in free/libre/open source software. In The Economics of Open Source Software Development https://api.semanticscholar.org/CorpusID:107939629 (2006).

  • Chen, S., Epps, J., Ruiz, N. & Chen, F. Eye activity as a measure of human mental effort in HCI. In Proc. 16th International Conference on Intelligent User Interfaces 315–318 (ACM, 2011).

  • Ash, J., Anderson, B., Gordon, R. & Langley, P. Digital interface design and power: friction, threshold, transition. Environ. Plann. D 36, 1136–1153 (2018).

    Article 

    Google Scholar 

  • Lin, B. Y. et al. WildBench: benchmarking LLMs with challenging tasks from real users in the wild. Preprint at https://doi.org/10.48550/arXiv.2406.04770 (2024).

  • Hancock, B., Bordes, A., Mazare, P.-E. & Weston, J. Learning from dialogue after deployment: feed yourself, chatbot! In Proc. 57th Annual Meeting of the Association for Computational Linguistics (eds Korhonen, A. et al.) 3667–3684 (ACL, 2019).

  • Don-Yehiya, S., Choshen, L. & Abend, O. Naturally occurring feedback is common, extractable and useful. Preprint at https://doi.org/10.48550/arXiv.2407.10944 (2024).

  • Gougherty, A. V. & Clipp, H. L. Testing the reliability of an AI-based large language model to extract ecological information from the scientific literature. npj Biodiversity 3, 13 (2024).

    Article 

    Google Scholar 

  • Pokrywka, J., Kaczmarek, J. & Gorzela’nczyk, E. GPT-4 passes most of the 297 written Polish board certification examinations. https://api.semanticscholar.org/CorpusID:269588160 (2024).

  • Merlyn mind’s education-domain language models. Merlyn Mind AI Team https://www.merlyn.org/blog/merlyn-minds-education-specific-language-models (2023).

  • Rein, D. et al. GPQA: a graduate-level google-proof Q&A benchmark. Preprint at https://doi.org/10.48550/arXiv.2311.12022 (2023).

  • Wu, S. et al. BloombergGPT: a large language model for finance. Preprint at https://doi.org/10.48550/arXiv.2303.17564 (2023).

  • Liu, X.-Y., Wang, G. & Zha, D. FinGPT: democratizing internet-scale data for financial large language models. Preprint at https://doi.org/10.48550/arXiv.2307.10485 (2023).

  • Klie, J.-C. et al. Lessons learned from a citizen science project for natural language processing. Preprint at https://doi.org/10.48550/arXiv.2304.12836 (2023).

  • Pavlick, E., Post, M., Irvine, A., Kachaev, D. & Callison-Burch, C. The language demographics of Amazon Mechanical Turk. Trans. Assoc. Comput. Linguist. 2, 79–92 (2014).

    Article 

    Google Scholar 

  • Zhao, W. et al. UNcommonsense reasoning: abductive reasoning about uncommon situations. In Proc. 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Vol. 1 (eds Duh, K. et al.) 8487–8505 (ACL, 2024).

  • Seth, A., Ahuja, S., Bali, K. & Sitaram, S. DOSA: a dataset of social artifacts from different Indian geographical subcultures. Preprint at https://doi.org/10.48550/arXiv.2403.14651 (2024).

  • Emerson, R. W. Convenience sampling, random sampling, and snowball sampling: how does sampling affect the validity of research? J. Vis. Impair. Blind. 109, 164–168 (2015).

    Article 

    Google Scholar 

  • Watts, I. et al. Pariksha: a scalable, democratic, transparent evaluation platform for assessing indic large language models. Preprint at https://doi.org/10.48550/arXiv.2406.15053 (2024).

  • Quaye, J. et al. Adversarial nibbler: an open red-teaming method for identifying diverse harms in text-to-image generation. In The 2024 ACM Conference on Fairness, Accountability, and Transparency 388–406 (ACM, 2024).

  • Tsatsou, P. Digital divides revisited: what is new about divides and their research? Media Cult. Soc 33, 317–331 (2011).

    Article 

    Google Scholar 

  • Avle, S., Quartey, E. & Hutchful, D. Research on mobile phone data in the global south: opportunities and challenges. Preprint at UMass Amherst https://doi.org/10.1093/oxfordhb/9780190460518.013.33 (2018).

  • Lu, Y., Zhu, W., Li, L., Qiao, Y. & Yuan, F. LLaMAX: scaling linguistic horizons of llm by enhancing translation capabilities beyond 100 languages. Preprint at https://doi.org/10.48550/arXiv.2407.05975 (2024).

  • Peters, D. et al. Participation is not enough: towards indigenous-led co-design. In Proc. 30th Australian Conference on Computer-Human Interaction 97–101 (ACM, 2018).

  • Santurkar, S. et al. Whose opinions do language models reflect? In International Conference on Machine Learning 29971–30004 (PMLR, 2023).

  • Pozzobon, L., Ermis, B., Lewis, P. & Hooker, S. Goodtriever: adaptive toxicity mitigation with retrieval-augmented models. Preprint at https://doi.org/10.48550/arXiv.2310.07589 (2023).

  • Kiela, D. et al. Dynabench: rethinking benchmarking in NLP. In Proc. 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (eds Toutanova, K. et al.) 4110–4124 (ACL, 2021).

  • White, C. et al. LiveBench: a challenging, contamination-free LLM benchmark. In 13th International Conference on Learning Representations.‏‏ (ICLR, 2015).

  • BigCode. Am I in The Stack? Hugging Face https://huggingface.co/spaces/bigcode/in-the-stack (2024).

  • European parliament and Council of the European Union. Regulation (EU) 2016/679 of the European Parliament and of the Council https://data.europa.eu/eli/reg/2016/679/oj (2016).

  • Illman, E. & Temple, P. California consumer privacy act. Bus. Lawyer 75, 1637–1646 (2019).

    Google Scholar 

  • Accountability Act. Health Insurance Portability and Accountability Act of 1996. Public law 104, 191 (1996).

    Google Scholar 

  • Kumar, M., Moser, B., Fischer, L. & Freudenthaler, B. Towards practical secure privacy-preserving machine (deep) learning with distributed data. In International Conference on Database and Expert Systems Applications 55–66 (Springer, 2022).

  • Raji, I. D. et al. Closing the AI accountability gap: defining an end-to-end framework for internal algorithmic auditing. In Proc. 2020 Conference on Fairness, Accountability, and Transparency 33–44 (ACM, 2020).

  • Bender, E. M., Gebru, T., McMillan-Major, A. & Shmitchell, S. On the dangers of stochastic parrots: can language models be too big? In Proc. 2021 ACM Conference on Fairness, Accountability, and Transparency 610–623 (ACM, 2021).

  • The NIST Privacy Framework: A Tool for Improving Privacy through Enterprise Risk Management (NIST, 2020); https://www.nist.gov/privacy-framework/privacy-framework

  • Narayanan, A. & Shmatikov, V. Robust de-anonymization of large sparse datasets. In 2008 IEEE Symposium on Security and Privacy 111–125 (IEEE, 2008).

  • Dwork, C., McSherry, F., Nissim, K. & Smith, A. Calibrating noise to sensitivity in private data analysis. In Proc. Theory of Cryptography: Third Theory of Cryptography Conference 265–284 (Springer, 2006).

  • Dwork, C. et al. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9, 211–407 (2014).

    Article 
    MathSciNet 

    Google Scholar 

  • Cummings, R. et al. Challenges towards the next frontier in privacy. Preprint at https://doi.org/10.48550/arXiv.2304.06929 (2023).

  • Liu, Z., Iqbal, U. & Saxena, N. Opted out, yet tracked: are regulations enough to protect your privacy? Preprint at https://doi.org/10.48550/arXiv.2304.06929 (2022).

  • Tran, V. H. et al. Measuring compliance with the california consumer privacy act over space and time. In Proc. CHI Conference on Human Factors in Computing Systems 1–19 (2024).

  • Bourtoule, L. et al. Machine unlearning. In 2021 IEEE Symposium on Security and Privacy 141–159 (IEEE 2021).

  • Lynch, A., Guo, P., Ewart, A., Casper, S. & Hadfield-Menell, D. Eight methods to evaluate robust unlearning in LLMs. Preprint at https://doi.org/10.48550/arXiv.2402.16835 (2024).

  • Shi, W. et al. MUSE: machine unlearning six-way evaluation for language models. Preprint at https://doi.org/10.48550/arXiv.2407.06460 (2024).

  • Guadamuz, A. Artificial intelligence and copyright. Wipo Mag. 5, 14–19 (2017).

    Google Scholar 

  • Terms of use. OpenAI https://openai.com/policies/terms-of-use/ (2024).

  • Kop, M. AI & intellectual property: towards an articulated public domain. Univ. Texas School Law Texas Intellect. Prop. Law J. 28, (2020).

  • Kim, M. The creative commons and copyright protection in the digital era: uses of creative commons licenses. J. Comput. Mediat. Commun. 13, 187–209 (2007).

    Article 

    Google Scholar 

  • Bonatti, P., Kirrane, S., Polleres, A. & Wenning, R. Transparent personal data processing: the road ahead. In Proc. Computer Safety, Reliability, and Security: SAFECOMP 2017 Workshops, ASSURE, DECSoS, SASSUR, TELERISE, and TIPS 337–349 (Springer, 2017).

  • Gebru, T. et al. Datasheets for datasets. Commun. ACM 64, 86–92 (2021).

    Article 

    Google Scholar 

  • Shimorina, A. & Belz, A. The Human Evaluation Datasheet 1.0: a template for recording details of human evaluation experiments in NLP. Preprint at https://doi.org/10.48550/arXiv.2103.09710 (2021).

  • Pushkarna, M., Zaldivar, A. & Kjartansson, O. Data cards: purposeful and transparent dataset documentation for responsible AI. In Proc. 2022 ACM Conference on Fairness, Accountability, and Transparency 1776–1826 (2022).

  • Iren, D. & Bilgen, S. Cost of quality in crowdsourcing. Hum. Comput. https://doi.org/10.15346/hc.v1i2.14 (2014).

  • Hettiachchi, D. et al. Investigating and mitigating biases in crowdsourced data. In Companion Publication of the 2021 Conference on Computer Supported Cooperative Work and Social Computing 331–334 (ACM, 2021).

  • Barbosa, N. M. & Chen, M. Rehumanized crowdsourcing: a labeling framework addressing bias and ethics in machine learning. In Proc. 2019 CHI Conference on Human Factors in Computing Systems 1–12 (ACM, 2019).

  • Chintala, S. Unapologetically open science—the complexity and challenges of making openness win! ICML https://icml.cc/virtual/2024/invited-talk/35249 (2024).

  • Chiang, Wei-Lin, et al. Chatbot arena: An open platform for evaluating llms by human preference. In 41st International Conference on Machine Learning (PMLR, 2024).‏

  • Nov, O., Arazy, O. & Anderson, D. Scientists@ home: what drives the quantity and quality of online citizen science participation? PLoS ONE 9, e90375 (2014).

    Article 

    Google Scholar 

  • Chen, Y., Harper, F. M., Konstan, J. & Li, S. X. Social comparisons and contributions to online communities: a field experiment on movielens. Am. Econ. Rev. 100, 1358–1398 (2010).

    Article 

    Google Scholar 

  • Pustejovsky, J. & Stubbs, A. Natural Language Annotation for Machine Learning: A Guide to Corpus-building for Applications (O’Reilly Media, 2012).

  • Thorat, P. B., Goudar, R. M. & Barve, S. Survey on collaborative filtering, content-based filtering and hybrid recommendation system. Int. J. Comput. Appl. 110, 31–36 (2015).

    Google Scholar 

  • Touvron, H. et al. Llama 2: open foundation and fine-tuned chat models. Preprint at https://doi.org/10.48550/arXiv.2307.09288 (2023).

  • Achiam, J. et al. GPT-4 technical report. Preprint at https://doi.org/10.48550/arXiv.2303.08774 (2023).

  • Model card and evaluations for Claude models. Anthropic https://www.anthropic.com/news/claude-2 (2023).

  • The Claude 3 model family: Opus, Sonnet, Haiku. Anthropic https://www.anthropic.com/claude-3-model-card (2024).

  • Gemini Team et al. Gemini: a family of highly capable multimodal models. Preprint at https://doi.org/10.48550/arXiv.2312.11805 (2023).

  • Reid, M. et al. Gemini 1.5: unlocking multimodal understanding across millions of tokens of context. Preprint at https://doi.org/10.48550/arXiv.2403.05530 (2024).

  • Don’t miss more hot News like this! Click here to discover the latest in AI news!

    2025-06-20 00:00:00

    Related Articles

    Back to top button