World Leadership Academy

LOADING

For Peace, Justice and Sustainable Development

(Ideology:World is One)

Author Search Sources

Source details

Big Data and Cognitive Computing

WLA coverage years: from 2022 to Present

Publisher: MDPI

ISSN: 2504-2289 E-ISSN: 2504-2289

Subject Area:international, interdisciplinary scholarly scientific, and open access journal on big data and cognitive computing for computer science.

Source Type: Journal

Source Homepage

Documents

Paper Title	Author	Year	Download	Cited
Analyzing Political Polarization on Social Media by Deleting Bot Spamming Show Abstract Abstract Social media platforms are part of everyday life, allowing the interconnection of people around the world in large discussion groups relating to every topic, including important social or political issues. Therefore, social media have become a valuable source of information-rich data, commonly referred to as Social Big Data, effectively exploitable to study the behavior of people, their opinions, moods, interests and activities. However, these powerful communication platforms can be also used to manipulate conversation, polluting online content and altering the popularity of users,through spamming activities and misinformation spreading. Recent studies have shown the use on social media of automatic entities, defined as social bots, that appear as legitimate users by imitating human behavior aimed at influencing discussions of any kind, including political issues. In this paper we present a new methodology, namely TIMBRE (Time-aware opInion Mining via Bot REmoval), aimed at discovering the polarity of social media users during election campaigns characterized by the rivalry of political factions. This methodology is temporally aware and relies on a keyword-based classification of posts and users. Moreover, it recognizes and filters out data produced by social media bots, which aim to alter public opinion about political candidates, thus avoiding heavily biased information. The proposed methodology has been applied to a case study that analyzes the polarization of a large number of Twitter users during the 2016 US presidential election. The achieved results show the benefits brought by both removing bots and taking into account temporal aspects in the forecasting process, revealing the high accuracy and effectiveness of the proposed approach.Finally, we investigated how the presence of social bots may affect political discussion by studying the 2016 US presidential election. Specifically, we analyzed the main differences between human and artificial political support, estimating also the influence of social bots on legitimate users.	Riccardo Cantini, Fabrizio Marozzo, Domenico Talia, Paolo Trunfio,	0	Download Full Paper	0
Analyzing COVID-19 Medical Papers Using Artificial Intelligence: Insights for Researchers and Medical Professionals Show Abstract Abstract Since the beginning of the COVID-19 pandemic almost two years ago, there have been more than 700,000 scientific papers published on the subject. An individual researcher cannot possibly get acquainted with such a huge text corpus and, therefore, some help from artificial intelligence (AI) is highly needed. We propose the AI-based tool to help researchers navigate the medical papers collections in a meaningful way and extract some knowledge from scientific COVID-19 papers. The main idea of our approach is to get as much semi-structured information from text corpus as possible, using named entity recognition (NER) with a model called PubMedBERT and Text Analytics for Health service, then store the data into NoSQL database for further fast processing and insights generation. Additionally, the contexts in which the entities were used (neutral or negative) are determined. Application of NLP and text-based emotion detection (TBED) methods to COVID-19 text corpus allows us to gain insights on important issues of diagnosis and treatment (such as changes in medical treatment over time, joint treatment strategies using several medications, and the connection between signs and symptoms of coronavirus, etc.).	Tatiana Petrova, Dmitry Soshnikov, Andrey Grunin, Vickie Soshnikova,	0	Download Full Paper	0
A Hierarchical Hadoop Framework to Process Geo-Distributed Big Data Show Abstract Abstract In the past twenty years, we have witnessed an unprecedented production of data world wide that has generated a growing demand for computing resources and has stimulated the design of computing paradigms and software tools to efficiently and quickly obtain insights on such a Big Data. State-of-the-art parallel computing techniques such as the MapReduce guarantee high perfor mance in scenarios where involved computing nodes are equally sized and clustered via broadband network links, and the data are co-located with the cluster of nodes. Unfortunately, the mentioned techniques have proven ineffective in geographically distributed scenarios, i.e., computing contexts where nodes and data are geographically distributed across multiple distant data centers. In the literature, researchers have proposed variants of the MapReduce paradigm that obtain awareness of the constraints imposed in those scenarios (such as the imbalance of nodes computing power and of interconnecting links) to enforce smart task scheduling strategies. We have designed a hierarchical computing framework in which a context-aware scheduler orchestrates computing tasks that leverage the potential of the vanilla Hadoop framework within each data center taking part in the computation. In this work, after presenting the features of the developed framework, we advocate the opportunityof fragmenting the data in a smart way so that the scheduler produces a fairer distribution of the workload among the computing tasks. To prove the concept, we implemented a software prototype of the framework and ran several experiments on a small-scale testbed. Test results are discussed in the last part of the paper.	Giuseppe Di Modica, Orazio Tomarchio,	0	Download Full Paper	0
On Developing Generic Models for Predicting Student Outcomes in Educational Data Mining Show Abstract Abstract Poor academic performance of students is a concern in the educational sector, especially if it leads to students being unable to meet minimum course requirements. However, with timely prediction of students’ performance, educators can detect at-risk students, thereby enabling early interventions for supporting these students in overcoming their learning difficulties. However, the majority of studies have taken the approach of developing individual models that target a single course while developing prediction models. These models are tailored to specific attributes of each course amongst a very diverse set of possibilities. While this approach can yield accurate models in some instances, this strategy is associated with limitations. In many cases, overfitting can take place when course data is small or when new courses are devised. Additionally, maintaining a large suite of models per course is a significant overhead. This issue can be tackled by developing a generic and course-agnostic predictive model that captures more abstract patterns and is able to operate across all courses, irrespective of their differences. This study demonstrates how a generic predictive model can be developed that identifies at-risk students across a wide variety of courses. Experiments were conducted using a range of algorithms, with the generic model producing an effective accuracy.The findings showed that the CatBoost algorithm performed the best on our dataset across the F-measure, ROC (receiver operating characteristic) curve and AUC scores; therefore, it is an excellent candidate algorithm for providing solutions on this domain given its capabilities to seamlessly handlecategorical and missing data, which is frequently a feature in educational datasets.	Gomathy Ramaswami, Teo Susnjak, Anuradha Mathrani,	0	Download Full Paper	0
Infusing Autopoietic and Cognitive Behaviors into Digital Automata to Improve Their Sentience, Resilience, and Intelligence Show Abstract Abstract All living beings use autopoiesis and cognition to manage their “life” processes from birth through death. Autopoiesis enables them to use the specification in their genomes to instantiate themselves using matter and energy transformations. They reproduce, replicate, and manage their stability. Cognition allows them to process information into knowledge and use it to manage its interactions between various constituent parts within the system and its interaction with the environ ment. Currently, various attempts are underway to make modern computers mimic the resilience and intelligence of living beings using symbolic and sub-symbolic computing. We discuss here the limitations of classical computer science for implementing autopoietic and cognitive behaviors in digital machines. We propose a new architecture applying the general theory of information (GTI) and pave the path to make digital automata mimic living organisms by exhibiting autopoiesis and cognitive behaviors. The new science, based on GTI, asserts that information is a fundamental constituent of the physical world and that living beings convert information into knowledge using physical structures that use matter and energy. Our proposal uses the tools derived from GTI to provide a common knowledge representation from existing symbolic and sub-symbolic computing structures to implement autopoiesis and cognitive behaviors.	Rao Mikkilineni,	0	Download Full Paper	0
An Empirical Comparison of Portuguese and Multilingual BERT Models for Auto-Classification of NCM Codes in International Trade Show Abstract Abstract Classification problems are common activities in many different domains and supervised learning algorithms have shown great promise in these areas. The classification of goods in inter national trade in Brazil represents a real challenge due to the complexity involved in assigning the correct category codes to a good, especially considering the tax penalties and legal implications of a misclassification. This work focuses on the training process of a classifier based on bidirectional encoder representations from transformers (BERT) for tax classification of goods with MCN codes which are the official classification system for import and export products in Brazil. In particular, this article presents results from using a specific Portuguese-language-pretrained BERT model, as well as results from using a multilingual-pretrained BERT model. Experimental results show that Portuguese model had a slightly better performance than the multilingual model, achieving an MCC 0.8491, and confirms that the classifiers could be used to improve specialists’ performance in the classification of goods.	Roberta Rodrigues de Lima, Anita M. R. Fernandes, Bruno Alves da Silva, James Roberto Bombasar, Paul Crocker, Valderi Reis Quietinho Leithardt,	0	Download Full Paper	0
An Efficient Multi-Scale Anchor Box Approach to Detect Partial Faces from a Video Sequence Show Abstract Abstract In recent years, face detection has achieved considerable attention in the field of computer vision using traditional machine learning techniques and deep learning techniques. Deep learning is used to build the most recent and powerful face detection algorithms. However, partial face detection still remains to achieve remarkable performance. Partial faces are occluded due to hair, hat, glasses,hands, mobile phones, and side-angle-captured images. Fewer facial features can be identified from such images. In this paper, we present a deep convolutional neural network face detection method using the anchor boxes section strategy. We limited the number of anchor boxes and scales and chose only relevant to the face shape. The proposed model was trained and tested on a popular and challenging face detection benchmark dataset, i.e., Face Detection Dataset and Benchmark (FDDB), and can also detect partially covered faces with better accuracy and precision. Extensive experiments were performed, with evaluation metrics including accuracy, precision, recall, F1 score, inference time, and FPS. The results show that the proposed model is able to detect the face in the image, including occluded features, more precisely than other state-of-the-art approaches, achieving 94.8% accuracy and 98.7% precision on the FDDB dataset at 21 frames per second (FPS).	Dweepna Garg, Priyanka Jain, Ketan Kotecha, Parth Goel, Vijayakumar Varadarajan,	0	Download Full Paper	0
Extraction of the Relations among Significant Pharmacological Entities in Russian-Language Reviews of Internet Users on Medications Show Abstract Abstract Nowadays, the analysis of digital media aimed at prediction of the society’s reaction to particular events and processes is a task of a great significance. Internet sources contain a large amount of meaningful information for a set of domains, such as marketing, author profiling, social situation analysis, healthcare, etc. In the case of healthcare, this information is useful for the pharmacovigilance purposes, including re-profiling of medications. The analysis of the mentioned sources requires the development of automatic natural language processing methods. These methods, in turn, require text datasets with complex annotation including information about named entities and relations between them. As the relevant literature analysis shows, there is a scarcity of datasets in the Russian language with annotated entity relations, and none have existed so far in the medical domain. This paper presents the first Russian-language textual corpus where entities have labels of different contexts within a single text, so that related entities share a common context. therefore this corpus is suitable for the task of belonging to the medical domain. Our second contribution is a method for the automated extraction of entity relations in Russian-language texts using the XLM-RoBERTa language model preliminarily trained on Russian drug review texts. A comparison with other machine learning methods is performed to estimate the efficiency of the proposed method. The method yields state-of-the-art accuracy of extracting the following relationship types: ADR–Drugname, Drugname–Diseasename, Drugname–SourceInfoDrug, Diseasename–Indication. As shown on the presented subcorpus from the Russian Drug Review Corpus, the method developed achieves a mean F1-score of 80.4% (estimated with cross-validation, averaged over the four relationship types). This result is 3.6% higher compared to the existing language model RuBERT, and 21.77% higher compared to basic ML classifiers.	Alexander Sboev, Anton Selivanov, Ivan Moloshnikov, Roman Rybka, Artem Gryaznov, Sanna Sboeva, Gleb Rylkov,	0	Download Full Paper	0
Context-Aware Explainable Recommendation Based on Domain Knowledge Graph Show Abstract Abstract With the rapid growth of internet data, knowledge graphs (KGs) are considered as efficient form of knowledge representation that captures the semantics of web objects. In recent years, reasoning over KG for various artificial intelligence tasks have received a great deal of research interest. Providing recommendations based on users’ natural language queries is an equally difficult undertaking. In this paper, we propose a novel, context-aware recommender system, based on domain KG, to respond to user-defined natural queries. The proposed recommender system consists of three stages. First, we generate incomplete triples from user queries, which are then segmented using logical conjunction (∧) and disjunction (∨) operations. Then, we generate candidates by utilizing a KGE-based framework (Query2Box) for reasoning over segmented logical triples, with ∧,∨, and ∃ operators; finally, the generated candidates are re-ranked using neural collaborative filtering (NCF) model by exploiting contextual (auxiliary) information from GraphSAGE embedding. Our approach demonstrates to be simple, yet efficient, at providing explainable recommendations on user’s queries, while leveraging user-item contextual information. Furthermore, our framework has shown to be capable of handling logical complex queries by transforming them into a disjunctive normal form (DNF) of simple queries. In this work, we focus on the restaurant domain as an application domain and use the Yelp dataset to evaluate the system. Experiments demonstrate that the proposed recommender system generalizes well on candidate generation from logical queries and effectively re-ranks those candidates, compared to the matrix factorization model.	Muzamil Hussain Syed, Tran Quoc Bao Huy, Sun-Tae Chung,	0	Download Full Paper	0
Scalable Extended Reality: A Future Research Agenda Show Abstract Abstract Extensive research has outlined the potential of augmented, mixed, and virtual reality applications. However, little attention has been paid to scalability enhancements fostering practical adoption. In this paper, we introduce the concept of scalable extended reality (XRS ), i.e., spacesscaling between different displays and degrees of virtuality that can be entered by multiple, possibly distributed users. The development of such XRS spaces concerns several research fields. To provide bidirectional interaction and maintain consistency with the real environment, virtual reconstructions of physical scenes need to be segmented semantically and adapted dynamically. Moreover, scalable interaction techniques for selection, manipulation, and navigation as well as a world-stabilizedrendering of 2D annotations in 3D space are needed to let users intuitively switch between handheld and head-mounted displays. Collaborative settings should further integrate access control and awareness cues indicating the collaborators’ locations and actions. While many of these topics were investigated by previous research, very few have considered their integration to enhance scalability. Addressing this gap, we review related previous research, list current barriers to the development of XRS spaces, and highlight dependencies between them.	Vera Marie Memmesheimer, Achim Ebert,	0	Download Full Paper	0
Fuzzy Neural Network Expert System with an Improved Gini Index Random Forest-Based Feature Importance Measure Algorithm for Early Diagnosis of Breast Cancer in Saudi Arabia Show Abstract Abstract Breast cancer is one of the common malignancies among females in Saudi Arabia and has also been ranked as the one most prevalent and the number two killer disease in the country.However, the clinical diagnosis process of any disease such as breast cancer, coronary artery diseases, diabetes, COVID-19, among others, is often associated with uncertainty due to the complexity and fuzziness of the process. In this work, a fuzzy neural network expert system with an improved gini index random forest-based feature importance measure algorithm for early diagnosis of breast cancer in Saudi Arabia was proposed to address the uncertainty and ambiguity associated with the diagnosis of breast cancer and also the heavier burden on the overlay of the network nodes of the fuzzy neural network system that often happens due to insignificant features that are used to predict or diagnose the disease. An Improved Gini Index Random Forest-Based Feature Importance Measure Algorithm was used to select the five fittest features of the diagnostic wisconsin breast cancer database out of the 32 features of the dataset. The logistic regression, support vector machine, k-nearest neighbor, random forest, and gaussian naïve bayes learning algorithms were used to develop two sets of classification models. Hence, the classification models with full features (32) and models with the 5 fittest features.The two sets of classification models were evaluated, and the results of the evaluation were compared. The result of the comparison shows that the models with the selected fittest features outperformed their counterparts with full features in terms of accuracy, sensitivity, and sensitivity. Therefore, a fuzzy neural network based expert system was developed with the five selected fittest features and the system achieved 99.33% accuracy, 99.41% sensitivity, and 99.24% specificity. Moreover, based on the comparison of the system developed in this work against the previous works that used fuzzy neural network or other applied artificial intelligence techniques on the same dataset for diagnosis of breast cancer using the same dataset, the system stands to be the best in terms of accuracy, sensitivity, and specificity, respectively. The z test was also conducted, and the test result shows that there is significant accuracy achieved by the system for early diagnosis of breast cancer.	Ebrahem A. Algehyne, Muhammad Lawan Jibril, Naseh A. Algehainy, Osama Abdulaziz Alamri, Abdullah K. Alzahrani,	0	Download Full Paper	0
Google Street View Images as Predictors of Patient Health Outcomes, 2017–2019 Show Abstract Abstract Collecting neighborhood data can both be time- and resource-intensive, especially across broad geographies. In this study, we leveraged 1.4 million publicly available Google Street View (GSV) images from Utah to construct indicators of the neighborhood built environment and evaluate their associations with 2017–2019 health outcomes of approximately one-third of the population living in Utah. The use of electronic medical records allows for the assessment of associations between neighborhood characteristics and individual-level health outcomes while controlling for predisposing factors, which distinguishes this study from previous GSV studies that were ecological in nature.Among 938,085 adult patients, we found that individuals living in communities in the highest tertiles of green streets and non-single-family homes have 10–27% lower diabetes, uncontrolled diabetes, hypertension, and obesity, but higher substance use disorders—controlling for age, White race,Hispanic ethnicity, religion, marital status, health insurance, and area deprivation index. Conversely, the presence of visible utility wires overhead was associated with 5–10% more diabetes, uncontrolled diabetes, hypertension, obesity, and substance use disorders. Our study found that non-single-family and green streets were related to a lower prevalence of chronic conditions, while visible utility wires and single-lane roads were connected with a higher burden of chronic conditions. These contextual characteristics can better help healthcare organizations understand the drivers of their patients’ health by further considering patients’ residential environments, which present both risks and resources.	Heran Mane, Quynh C. Nguyen, Pallavi Dwivedi, Jessica Keralis, Xiaohe Yue, Thu T. Nguyen, Tom Belnap, Kim D. Brunisholz, Amir Hossein Nazem Deligani,	0	Download Full Paper	0
A Dataset for Emotion Recognition Using Virtual Reality and EEG (DER-VREEG): Emotional State Classification Using Low-Cost Wearable VR-EEG Headsets Show Abstract Abstract Emotions are viewed as an important aspect of human interactions and conversations, and allow effective and logical decision making. Emotion recognition uses low-cost wearable electroen cephalography (EEG) headsets to collect brainwave signals and interpret these signals to provide information on the mental state of a person, with the implementation of a virtual reality environ ment in different applications; the gap between human and computer interaction, as well as theunderstanding process, would shorten, providing an immediate response to an individual’s mental health. This study aims to use a virtual reality (VR) headset to induce four classes of emotions (happy,scared, calm, and bored), to collect brainwave samples using a low-cost wearable EEG headset, and to run popular classifiers to compare the most feasible ones that can be used for this particular setup. Firstly, we attempt to build an immersive VR database that is accessible to the public and that can potentially assist with emotion recognition studies using virtual reality stimuli. Secondly, we use a low-cost wearable EEG headset that is both compact and small, and can be attached to the scalp without any hindrance, allowing freedom of movement for participants to view their surroundings inside the immersive VR stimulus. Finally, we evaluate the emotion recognition system by using popular machine learning algorithms and compare them for both intra-subject and inter-subject classification. The results obtained here show that the prediction model for the four-class emotion classification performed well, including the more challenging inter-subject classification, with the support vector machine (SVM Class Weight kernel) obtaining 85.01% classification accuracy. This shows that using less electrode channels but with proper parameter tuning and selection features affects the performance of the classifications.	Nazmi Sofian Suhaimi, James Mountstephens, Jason Teo,	0	Download Full Paper	0