Unravelling Acceptability in Code-Mixed Sentences

2 2 minutes read

[Submitted on 9 May 2024 (v1), last revised 5 May 2025 (this version, v2)]

View a PDF file from the paper entitled from human rulings to predictive models: Revelation of acceptance in the sentences that have been mixed with a symbol, by Prashant Kodali and 7 other authors

PDF HTML (experimental) view

a summary:The current mathematical methods of sentence analysis or generation of sentences are not explicitly designed “natural” or “acceptable” for mixed sentences, but they depend on companies training to reflect the distribution of acceptable sentences for the blade. Human rule modeling can help accept the text that has been mixed with the symbol in distinguishing between the natural texts mixed with the symbol and enabling the generation of texts integrated into quality. To this end, we build a Clin-Data set containing the provisions of human acceptance of the wet text-EN-Hi. Cline is the largest of its kind with 16,642 sentences, consisting of samples from two sources: a text made of code made of code and samples collected from social media via the Internet. Our analysis proves that the scales of the famous code code such as CMI, the number of switching points, and the scams, which are used to liquidate/organize/compare companies mixed with the country have a low relationship with the provisions of human acceptance, which confirms the necessity of our data set. Experiments that CLINE uses show that Simple Multilayer Perceptron (MLP) models when they are trained only using code mixing measures as in features that outperform large multi -language language models (MLMS). Specifically, between XLM-Roberta and Bernice encryption models they outperform Enderform through various configurations. Among the MBART coding models, MBART performs better than MT5, however encoded coding models are unable to outperform the encryption models only. Decoder models only perform the best when compared to all other MLMS, as Llama 3.2 – 3B models outperform the similar QWEN models. It appears compared to Zero and a little Capabileits from Chatgpt that MLMS has been seized on larger data that exceeds Chatgpt, providing an area for improvement in the tasks mixed code. The transfer of zero from EN-Hi to the rulings of accepting en-Te is better than the random foundation lines.