BigData Projects – ElysiumPro

BigData Projects ElysiumPro

BigData Projects

CSE Projects
Description
B Big data is a term for data sets that are so large or complex that traditional data processing software is inadequate to deal with them. We offer projects on the challenges such as capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, and information privacy.
Download Project List

Quality Factor

  • 100% Assured Results
  • Best Project Explanation
  • Tons of Reference
  • Cost optimized
  • Controlpanel Access


1Secure k-NN Query on Encrypted Cloud Data with Multiple Keys
The k-nearest neighbors (k-NN) query is a fundamental primitive in spatial and multimedia databases. It has extensive applications in location-based services, classification & clustering and so on. With the promise of confidentiality and privacy, massive data are increasingly outsourced to cloud in the encrypted form for enjoying the advantages of cloud computing (e.g., reduce storage and query processing costs). Recently, many schemes have been proposed to support k-NN query on encrypted cloud data. However, prior works have all assumed that the query users (QUs) are fully-trusted and know the key of the data owner (DO), which is used to encrypt and decrypt outsourced data. The assumptions are unrealistic in many situations, since many users are neither trusted nor knowing the key. In this paper, we propose a novel scheme for secure k-NN query on encrypted cloud data with multiple keys, in which the DO and each QU all hold their own different keys, and do not share them with each other; meanwhile, the DO encrypts and decrypts outsourced data using the key of his own. Our scheme is constructed by a distributed two trapdoors public-key cryptosystem (DT-PKC) and a set of protocols of secure two-party computation, which not only preserves the data confidentiality and query privacy but also supports the offline data owner. Our extensive theoretical and experimental evaluations demonstrate the effectiveness of our scheme in terms of security and performance.

2Big Data Analysis based Security Situational Awareness for Smart Grid
Advanced communications and data processing technologies bring great benefits to the smart grid. However, cyber-security threats also extend from the information system to the smart grid. The existing security works for smart grid focus on traditional protection and detection methods. However, a lot of threats occur in a very short time and overlooked by exiting security components. These threats usually have huge impacts on smart gird and disturb its normal operation. Moreover, it is too late to take action to defend against the threats once they are detected, and damages could be difficult to repair. To address this issue, this paper proposes a security situational awareness mechanism based on the analysis of big data in the smart grid. Fuzzy cluster based analytical method, game theory and reinforcement learning are integrated seamlessly to perform the security situational analysis for the smart grid. The simulation and experimental results show the advantages of our scheme in terms of high efficiency and low error rate for security situational awareness.

3A Survey on Geographically Distributed Big-Data Processing using MapReduce
Hadoop and Spark are widely used distributed processing frameworks for large-scale data processing in an efficient and fault-tolerant manner on private or public clouds. These big-data processing systems are extensively used by many industries, e.g., Google, Facebook, and Amazon, for solving a large class of problems, e.g., search, clustering, log analysis, different types of join operations, matrix multiplication, pattern matching, and social network analysis. However, all these popular systems have a major drawback in terms of locally distributed computations, which prevent them in implementing geographically distributed data processing. The increasing amount of geographically distributed massive data is pushing industries and academia to rethink the current big-data processing systems. The novel frameworks, which will be beyond state-of-the-art architectures and technologies involved in the current system, are expected to process geographically distributed data at their locations without moving entire raw datasets to a single location. In this paper, we investigate and discuss challenges and requirements in designing geographically distributed data processing frameworks and protocols. We classify and study batch processing (MapReduce-based systems), stream processing (Spark-based systems), and SQL-style processing geo-distributed frameworks, models, and algorithms with their overhead issues.

4STaRS: Simulating Taxi Ride Sharing at Scale
As urban populations grow, cities face many challenges related to transportation, resource consumption, and the environment. Ride sharing has been proposed as an effective approach to reduce traffic congestion, gasoline consumption, and pollution. However, despite great promise, researchers and policy makers lack adequate tools to assess the tradeoffs and benefits of various ride-sharing strategies. In this paper, we propose a real-time, data-driven simulation framework that supports the efficient analysis of taxi ride sharing. By modeling taxis and trips as distinct entities, our framework is able to simulate a rich set of realistic scenarios. At the same time, by providing a comprehensive set of parameters, we are able to study the taxi ride-sharing problem from different angles, considering different stakeholders’ interests and constraints. To address the computational complexity of the model, we describe a new optimization algorithm that is linear in the number of trips and makes use of an efficient indexing scheme, which combined with parallelization, makes our approach scalable. We evaluate our framework through a study that uses data about 360 million trips taken by 13,000 taxis in New York City during 2011 and 2012. We describe the findings of the study which demonstrate that our framework can provide insights into strategies for implementing city-wide ride-sharing solutions. We also carry out a detailed performance analysis which shows the efficiency of our approach.

5Heterogeneous Data Storage Management with Deduplication in Cloud Computing
Cloud storage as one of the most important services of cloud computing helps cloud users break the bottleneck of restricted resources and expand their storage without upgrading their devices. In order to guarantee the security and privacy of cloud users, data are always outsourced in an encrypted form. However, encrypted data could incur much waste of cloud storage and complicate data sharing among authorized users. We are still facing challenges on encrypted data storage and management with deduplication. Traditional deduplication schemes always focus on specific application scenarios, in which the deduplication is completely controlled by either data owners or cloud servers. They cannot flexibly satisfy various demands of data owners according to the level of data sensitivity. In this paper, we propose a heterogeneous data storage management scheme, which flexibly offers both deduplication management and access control at the same time across multiple Cloud Service Providers (CSPs). We evaluate its performance with security analysis, comparison and implementation. The results show its security, effectiveness and efficiency towards potential practical usage.

6Secure Authentication in Cloud Big Data with Hierarchical Attribute Authorization Structure
With the fast growing demands for the big data, we need to manage and store the big data in the cloud. Since the cloud is not fully trusted and it can be accessed by any users, the data in the cloud may face threats. In this paper, we propose a secure authentication protocol for cloud big data with a hierarchical attribute authorization structure. Our proposed protocol resorts to the tree-based signature to significantly improve the security of attribute authorization. To satisfy the big data requirements, we extend the proposed authentication protocol to support multiple levels in the hierarchical attribute authorization structure. Security analysis shows that our protocol can resist the forgery attack and replay attack. In addition, our protocol can preserve the entities privacy. Comparing with the previous studies, we can show that our protocol has lower computational and communication overhead

7Cloud Infrastructure Resource Allocation for Big Data Applications
Increasing popular big data applications bring about invaluable information, but along with challenges to industrial community and academia. Cloud computing with unlimited resources seems to be the way out. However, this panacea cannot play its role if we do not arrange fine allocation for cloud infrastructure resources. In this paper, we present a multi-objective optimization algorithm to trade off the performance, availability, and cost of Big Data application running on Cloud. After analyzing and modeling the interlaced relations among these objectives, we design and implement our approach on experimental environment. Finally, three sets of experiments show that our approach can run about 20% faster than traditional optimization approaches, and can achieve about 15% higher performance than other heuristic algorithms, while saving 4% to 20% cost.

8Privacy-Preserving Data Encryption Strategy for Big Data in Mobile Cloud Computing
Privacy has become a considerable issue when the applications of big data are dramatically growing in cloud computing. The benefits of the implementation for these emerging technologies have improved or changed service models and improve application performances in various perspectives. However, the remarkably growing volume of data sizes has also resulted in many challenges in practice. The execution time of the data encryption is one of the serious issues during the data processing and transmissions. Many current applications abandon data encryptions in order to reach an adoptive performance level companioning with privacy concerns. In this paper, we concentrate on privacy and propose a novel data encryption approach, which is called Dynamic Data Encryption Strategy (D2ES). Our proposed approach aims to selectively encrypt data and use privacy classification methods under timing constraints. This approach is designed to maximize the privacy protection scope by using a selective encryption strategy within the required execution time requirements. The performance of D2ES has been evaluated in our experiments, which provides the proof of the privacy enhancement..

9A Pre-Authentication Approach to Proxy Re-encryption in Big Data Context
With the growing amount of data, the demand of big data storage significantly increases. Through the cloud center, data providers can conveniently share data stored in the center with others. However, one practically important problem in big data storage is privacy. During the sharing process, data is encrypted to be confidential and anonymous. Such operation can protect privacy from being leaked out. To satisfy the practical conditions, data transmitting with multi receivers is also considered. Furthermore, this paper proposes the notion of pre-authentication for the first time, i.e., only users with certain attributes that have already. The pre-authentication mechanism combines the advantages of proxy conditional re-encryption multi-sharing mechanism with the attribute-based authentication technique, thus achieving attributes authentication before re-encryption, and ensuring the security of the attributes and data. Moreover, this paper finally proves that the system is secure and the proposed pre-authentication mechanism could significantly enhance the system security level.

10Game Theory Based Correlated Privacy Preserving Analysis in Big Data
Privacy preservation is one of the greatest concerns in big data. As one of extensive applications in big data, privacy preserving data publication (PPDP) has been an important research field. One of the fundamental challenges in PPDP is the trade-off problem between privacy and utility of the single and independent data set. However, recent research has shown that the advanced privacy mechanism, i.e., differential privacy, is vulnerable when multiple data sets are correlated. In this case, the trade-off problem between privacy and utility is evolved into a game problem, in which payoff of each player is dependent on his and his neighbors’ privacy parameters. In this paper, we firstly present the definition of correlated differential privacy to evaluate the real privacy level of a single data set influenced by the other data sets. Then, we construct a game model of multiple players, in which each publishes data set sanitized by differential privacy. Next, we analyze the existence and uniqueness of the pure Nash Equilibrium. We refer to a notion, i.e., the price of anarchy, to evaluate efficiency of the pure Nash Equilibrium. Finally, we show the correctness of our game analysis via simulation experiments

11Security-Aware Resource Allocation for Mobile Social Big Data: A Matching-Coalitional Game Solution ?
As both the scale of mobile networks and the population of mobile users keep increasing, the applications of mobile social big data have emerged where mobile social users can use their mobile devices to exchange and share contents with each other. The security resource is needed to protect mobile social big data during the delivery. However, due to the limited security resource, how to allocate the security resource becomes a new challenge. Therefore, in this paper we propose a joint match-coalitional game based security-aware resource allocation scheme to deliver mobile social big data. In the proposed scheme, firstly a coalition game model is introduced for base stations (BSs) to form groups to provide both wireless and security resource, where the resource efficiency and profits can be improved. Secondly, a matching theory based model is employed to determine the selecting process between communities and the coalitions of BSs so that mobile social users can form communities to select the optimal coalition to obtain security resource. Thirdly, a joint matching-coalition algorithm is presented to obtain the stable security-aware resource allocation. At last, the simulation experiments prove that the proposal scheme outperforms other existing schemes.

12Local Gaussian Processes for Efficient Fine-Grained Traffic Speed Prediction ?
Traffic speed is a key indicator for the efficiency of an urban transportation system. Accurate modeling of the spatiotemporally varying traffic speed thus plays a crucial role in urban planning and development. This paper addresses the problem of efficient fine-grained traffic speed prediction using big traffic data obtained from static sensors. Gaussian processes (GPs) have been previously used to model various traffic phenomena, including flow and speed. However, GPs do not scale with big traffic data due to their cubic time complexity. In this work, we address their efficiency issues by proposing local GPs to learn from and make predictions for correlated subsets of data. The main idea is to quickly group speed variables in both spatial and temporal dimensions into a finite number of clusters, so that future and unobserved traffic speed queries can be heuristically mapped to
one of such clusters. A local GP corresponding to that cluster can then be trained on the fly to make predictions in real-time. We call this method localization. We use non-negative matrix factorization for localization and propose simple heuristics for cluster mapping. We additionally leverage on the expressiveness of GP kernel functions to model road network topology and incorporate side information. Extensive experiments using real-world traffic data collected in the two U.S. cities of Pittsburgh and Washington, D.C., show that our proposed local GPs significantly improve both runtime performances and prediction accuracies compared to the baseline global and local GPs.

13Dependency-aware Data Locality for Map Reduce
Map Reduce effectively partitions and distributes computation workloads to a cluster of servers, facilitating today’s big data processing. Given the massive data to be dispatched, and the intermediate results to be collected and aggregated, there have been a significant studies on data locality that seeks to co-locate computation with data, so as to reduce cross-server traffic in Map Reduce. They generally assume that the input data have little dependency with each other, which however is not necessarily true for that of many real-world applications, and we show strong evidence that the finishing time of Map Reduce tasks can be greatly prolonged with such data dependency. In this paper, we present DALM (Dependency-Aware Locality for Map Reduce) for processing the real-world input data that can be highly skewed and dependent. DALM accommodates data-dependency in a data-locality framework, organically synthesizing the key components from data reorganization, replication, and placement. Beside algorithmic design within the framework, we have also closely examined the deployment challenges, particularly in public virtualized cloud environments, and have implemented DALM on Hadoop 1.2.1 with Giraph 1.0.0. Its performance has been evaluated through both simulations and real-world experiments, and compared with that of state-of-the-art solutions

14A Context-aware Service Evaluation Approach over Big Data for Cloud Applications ?
Cloud computing has promoted the success of big data applications such as medical data analyses. With the abundant resources provisioned by cloud platforms, the QoS (quality of service) of services that process big data could be boosted significantly. However, due to unstable network or fake advertisement, the QoS published by service providers is not always trusted. Therefore, it becomes a necessity to evaluate the service quality in a trustable way, based on the services’ historical QoS records. However, the evaluation efficiency would be low and cannot meet users’ quick response requirement, if all the records of a service are recruited for quality evaluation. Moreover, it may lead to ‘Lagging Effect’ or low evaluation accuracy, if all the records are treated equally, as the invocation contexts of different records are not exactly the same. In view of these challenges, a novel approach named Partial-HR (Partial Index Terms—big data, cloud, context-aware service evaluation, historical QoS record, weight Historical Records-based service evaluation approach) is put forward in this paper. In Partial-HR, each historical QoS record is weighted based on its service invocation context. Afterwards, only partial important records are employed for quality evaluation. Finally, a group of experiments are deployed to validate the feasibility of our proposal, in terms of evaluation accuracy and efficiency.

15Fair Resource Allocation for Data-Intensive Computing in the Cloud
To address the computing challenge of ’big data’, a number of data-intensive computing frameworks (e.g., MapReduce, Dryad, Storm and Spark) have emerged and become popular. YARN is a de facto resource management platform that enables these frameworks running together in a shared system. However, we observe that, in cloud computing environment, the fair resource allocation policy implemented in YARN is not suitable because of its memoryless resource allocation fashion leading to violations of a number of good properties in shared computing systems. This paper attempts to address these problems for YARN. Both singlelevel and hierarchical resource allocations are considered. For single-level resource allocation, we propose a novel fair resource allocation mechanism called Long-Term Resource Fairness (LTRF) for such computing. For hierarchical resource allocation, we propose Hierarchical Long-Term Resource Fairness (H-LTRF) by extending LTRF. We show that both LTRF and H-LTRF can address these fairness problems of current resource allocation policy and are thus suitable for cloud computing. Finally, we have developed LTYARN by implementing LTRF and H-LTRF in YARN, and our experiments show that it leads to a better resource fairness than existing fair schedulers of YARN

16Rank Map: A Framework for Distributed Learning From Dense Data Sets
This paper introduces RankMap, a platform-aware end-to-end framework for efficient execution of a broad class of iterative learning algorithms for massive and dense data sets. Our framework exploits data structure to scalably factorize it into an ensemble of lower rank subspaces. The factorization creates sparse low-dimensional representations of the data, a property which is leveraged to devise effective mapping and scheduling of iterative learning algorithms on the distributed computing machines. We provide two APIs, one matrix-based and one graph-based, which facilitate automated adoption of the framework for performing several contemporary learning applications. To demonstrate the utility of RankMap, we solve sparse recovery and power iteration problems on various real-world data sets with up to 1.8 billion nonzeros. Our evaluations are performed on Amazon EC2 and IBM iDataPlex servers using up to 244 cores. The results demonstrate up to two orders of magnitude improvements in memory usage, execution speed, and bandwidth compared with the best reported prior work, while achieving the same level of learning accuracy.

17Computation partitioning for mobile cloud computing in big data environment
The growth of mobile cloud computing (MCC) is challenged by the need to adapt to the resources and environment that are available to mobile clients while addressing the dynamic changes in network bandwidth. Big data can be handled via MCC. In this paper, we propose a model of computation partitioning for stateful data in the dynamic environment that will improve performance. First, we constructed a model of stateful data streaming and investigated the method of computation partitioning in a dynamic environment. We developed a definition of direction and calculation of the segmentation scheme, including single frame data flow, task scheduling and executing efficiency. We also defined the problem for a multi-frame data flow calculation segmentation decision that is optimized for dynamic conditions and provided an analysis. Second, we proposed a computation partitioning method for single frame data flow

18Attribute-Based Storage Supporting Secure Deduplication of Encrypted Data in Cloud
Attribute-based encryption (ABE) has been widely used in cloud computing where a data provider outsources his/her encrypted data to a cloud service provider, and can share the data with users possessing specific credentials (or attributes). However, the standard ABE system does not support secure deduplication, which is crucial for eliminating duplicate copies of identical data in order to save storage space and network bandwidth. In this paper, we present an attribute-based storage system with secure deduplication in a hybrid cloud setting, where a private cloud is responsible for duplicate detection and a public cloud manages the storage. Compared with the prior data deduplication systems, our system has two advantages. Firstly, it can be used to confidentially share data with users by specifying access policies rather than sharing decryption keys. Secondly, it achieves the standard notion of semantic security for data confidentiality while existing systems only achieve it by defining a weaker security notion. In addition, we put forth a methodology to modify a ciphertext over one access policy into ciphertexts of the same plaintext but under other access policies without revealing the underlying plaintext

19A Secure and Verifiable Access Control Scheme for Big Data Storage in Clouds
Due to the complexity and volume, outsourcing ciphertexts to a cloud is deemed to be one of the most effective approaches for big data storage and access. Nevertheless, verifying the access legitimacy of a user and securely updating a ciphertext in the cloud based on a new access policy designated by the data owner are two critical challenges to make cloud-based big data storage practical and effective. Traditional approaches either completely ignore the issue of access policy update or delegate the update to a third party authority; but in practice, access policy update is important for enhancing security and dealing with the dynamism caused by user join and leave activities. In this paper, we propose a secure and verifiable access control scheme based on the NTRU cryptosystem for big data storage in clouds. We first propose a new NTRU decryption algorithm to overcome the decryption failures of the original NTRU, and then detail our scheme and analyze its correctness, security strengths, and computational efficiency. Our scheme allows the cloud server to efficiently update the ciphertext when a new access policy is specified by the data owner, who is also able to validate the update to counter against cheating behaviors of the cloud. It also enables (i) the data owner and eligible users to effectively verify the legitimacy of a user for accessing the data, and (ii) a user to validate the information provided by other users for correct plaintext recovery. Rigorous analysis indicates that our scheme can prevent eligible users from cheating and resist various attacks such as the collusion attack.

20Distributed Feature Selection for Efficient Economic Big Data Analysis
With the rapidly increasing popularity of economic activities, a large amount of economic data is being collected. Although such data offers super opportunities for economic analysis, its low-quality, high-dimensionality and huge-volume pose great challenges on efficient analysis of economic big data. The existing methods have primarily analyzed economic data from the perspective of econometrics, which involves limited indicators and demands prior knowledge of economists. When embracing large varieties of economic factors, these methods tend to yield unsatisfactory performance. To address the challenges, this paper presents a new framework for efficient analysis of high-dimensional economic big data based on innovative distributed feature selection. Specifically, the framework combines the methods of economic feature selection and econometric model construction to reveal the hidden patterns for economic development. The functionality rests on three pillars: (i) novel data pre-processing techniques to prepare high-quality economic data, (ii) an innovative distributed feature identification solution to locate important and representative economic indicators from multidimensional data sets, and (iii) new econometric models to capture the hidden patterns for economic development. The experimental results on the economic data collected in Dalian, China, demonstrate that our proposed framework and methods have superior performance in analyzing enormous economic data.

21Revocable Identity-Based Access Control for Big Data with Verifiable Outsourced Computing
To be able to leverage big data to achieve enhanced strategic insight, process optimization and make informed decision, we need to be an efficient access control mechanism for ensuring end-to-end security of such information asset. Signcryption is one of several promising techniques to simultaneously achieve big data confidentiality and authenticity. However, signcryption suffers from the limitation of not being able to revoke users from a large-scale system efficiently. We put forward, in this paper, the first identity-based (ID-based) signcryption scheme with efficient revocation as well as the feature to outsource unsigncryption to enable secure big data communications between data collectors and data analytical system(s). Our scheme is designed to achieve end-to-end confidentiality, authentication, non-repudiation, and integrity simultaneously, while providing scalable revocation functionality such that the overhead demanded by the private key generator (PKG) in the key-update phase only increases logarithmically based on the cardiality of users. Although in our scheme the majority of the unsigncryption tasks are outsourced to an untrusted cloud server, this approach does not affect the security of the proposed scheme. We then prove the security of our scheme, as well as demonstrating its utility using simulations.

22Privacy Protection and Intrusion Avoidance for Cloudlet-based Medical Data Sharing
With the popularity of wearable devices, along with the development of clouds and cloudlet technology, there has been increasing need to provide better medical care. The processing chain of medical data mainly includes data collection, data storage and data sharing, etc. Traditional healthcare system often requires the delivery of medical data to the cloud, which involves users’ sensitive information and causes communication energy consumption. Practically, medical data sharing is a critical and challenging issue. Thus in this paper, we build up a novel healthcare system by utilizing the flexibility of cloudlet. The functions of cloudlet include privacy protection, data sharing and intrusion detection. In the stage of data collection, we first utilize Number Theory Research Unit (NTRU) method to encrypt user’s body data collected by wearable devices. Those data will be transmitted to nearby clodlet in an energy efficient fashion. Secondly, we present a new trust model to help users to select trustable partners who want to share stored data in the cloudlet. The trust model also helps similar patients to communicate with each other about their diseases. Thirdly, we divide users’ medical data stored in remote cloud of hospital into three parts, and give them proper protection. Finally, in order to protect the healthcare system from malicious attacks, we develop a novel collaborative intrusion detection system (IDS) method based on cloudlet mesh, which can effectively prevent the remote healthcare big data cloud from attacks. Our experiments demonstrate the effectiveness of the proposed scheme

23SEEN: A Selective Encryption Method to Ensure Confidentiality for Big Sensing Data Streams
Resource constrained sensing devices are being used widely to build and deploy self-organizing wireless sensor networks for a variety of critical applications such as smart cities, smart health, precision agriculture and industrial control systems. Many such devices sense the deployed environment and generate a variety of data and send them to the server for analysis as data streams. A Data Stream Manager (DSM) at the server collects the data streams (often called big data) to perform real time analysis and decision-making for these critical applications. A malicious adversary may access or tamper with the data in transit. One of the challenging tasks in such applications is to assure the trustworthiness of the collected data so that any decisions are made on the processing of correct data. Assuring high data trustworthiness requires that the system satisfies two key security properties: confidentiality and integrity. To ensure the confidentiality of collected data, we need to prevent sensitive information from reaching the wrong people by ensuring that the right people are getting it. Sensed data are always associated with different sensitivity levels based on the sensitivity of emerging applications or the sensed data types or the sensing devices.

24Big Data Privacy in Biomedical Research
Biomedical research often involves studying patient data that contain personal information. Inappropriate use of these data might lead to leakage of sensitive information, which can put patient privacy at risk. The problem of preserving patient privacy has received increasing attentions in the era of big data. Many privacy methods have been developed to protect against various attack models. This paper reviews relevant topics in the context of biomedical research. We discuss privacy preserving technologies related to (1) record linkage, (2) synthetic data generation, and (3) genomic data privacy. We also discuss the ethical implications of big data privacy in biomedicine and present challenges in future research directions for improving data privacy in biomedical research

25Big Data Analytics for User Activity Analysis and User Anomaly Detection in Mobile Wireless Network
The next generation wireless networks are expected to operate in fully automated fashion to meet the burgeoning capacity demand and to serve users with superior quality of experience. Mobile wireless networks can leverage spatio-temporal information about user and network condition to embed the system with end-to-end visibility and intelligence. Big data analytics has emerged as a promising approach to unearth meaningful insights and to build artificially intelligent models with assistance of machine learning tools. Utilizing aforementioned tools and techniques, this paper contributes in two ways. First, we utilize mobile network data (big data) – call detail record (CDR) – to analyze anomalous behavior of mobile wireless network. For anomaly detection purposes, we use unsupervised clustering techniques namely k-means clustering and hierarchical clustering. We compare the detected anomalies with ground truth information to verify their correctness. From the comparative analysis, we observe that when the network experiences abruptly high (unusual) traffic demand at any location and time, it identifies that as anomaly. This helps in identifying regions of interest (RoI) in the network for special action such as resource allocation, fault avoidance solution etc. Second, we train a neural-network based prediction model with anomalous and anomaly-free data to highlight the effect of anomalies in data while training/building intelligent models. In this phase, we transform our anomalous data to anomaly-free and we observe that the error in prediction while training the model with anomaly-free data has largely decreased as compared to the case when the model was trained with anomalous data

26Mutual Privacy Preserving k-Means Clustering in Social Participatory Sensing
In this paper, we consider the problem of mutual privacy-protection in social participatory sensing in which individuals contribute their private information to build a (virtual) community. Particularly, we propose a mutual privacy preserving k-means clustering scheme that neither discloses individual’s private information nor leaks the community’s characteristic data (clusters). Our scheme contains two privacy-preserving algorithms called at each iteration of the k-means clustering. The first one is employed by each participant to find the nearest cluster while the cluster centers are kept secret to the participants; and the second one computes the cluster centers without leaking any cluster center information to the participants while preventing each participant from figuring out other members in the same cluster. An extensive performance analysis is carried out to show that our approach is effective for k-means clustering, can resist collusion attacks, and can provide mutual privacy protection even when the data analyst colludes with all except one participant.

27A New Method for Time-Series Big Data Effective Storage
Today, one of the main challenges of big data research is the processing of big time-series data. Moreover, time data analysis is of considerable importance, because previous trends are useful for predictingthe future. Due to the considerable delay when the volume of the data increases, the presence of redundancy, and the innate lack of time-series structures, the traditional relational data model does not seem to be adequately able of analyzing time data. Moreover, many of traditional data structures do not support time operators, which results in an inefficient access to time data. Therefore, relational database management systems have difficulty in dealing with big data—it may require massively parallel software that runs on many servers. This has led us to implement Chronos Software, an in-memory background-based time database for key-value pairs; this software was implemented using C++ language. An independent design has been suggested through appropriately using temporal algorithms, parallelism algorithms, and methods of data storage in RAM. Our results indicate that the employment of RAM for storing the data, and of the Timeline Index algorithm for getting access to the time background of the keys in Chronos translate into an increase of about 40%-90% in the efficiency as compared to other databases like MySQL and MongoDB

28Optimizing Share Size in Efficient and Robust Secret Sharing Scheme for Big Data
Secret sharing scheme has been applied commonly in distributed storage for Big Data. It is a method for protecting outsourced data against data leakage and for securing key management systems. The secret is distributed among a group of participants where each participant holds a share of the secret. The secret can be only reconstructed when a sufficient number of shares are reconstituted. Although many secret sharing schemes have been proposed, they are still inefficient in terms of share size, communication cost and storage cost; and also lack robustness in terms of exact-share repair. In this paper, for the first time, we propose a new secret sharing scheme based on Slepian-Wolf coding. Our scheme can achieve an optimal share size utilizing the simple binning idea of the coding. It also enhances the exact-share repair feature whereby the shares remain consistent even if they are corrupted. We show, through experiments, how our scheme can significantly reduce the communication and storage cost while still being able to support direct share repair leveraging lightweight exclusive-OR (XOR) operation for fast computation.

29DiP-SVM : Distribution Preserving Kernel Support Vector Machine for Big Data
In literature, the task of learning a support vector machine for large datasets has been performed by splitting the dataset into manageable sized “partitions” and training a sequential support vector machine on each of these partitions separately to obtain local support vectors. However, this process invariably leads to the loss in classification accuracy as global support vectors may not have been chosen as local support vectors in their respective partitions. We hypothesize that retaining the original distribution of the dataset in each of the partitions can help solve this issue. Hence, we present DiP-SVM, a distribution preserving kernel support vector machine where the first and second order statistics of the entire dataset are retained in each of the partitions. This helps in obtaining local decision boundaries which are in agreement with the global decision boundary, thereby reducing the chance of missing important global support vectors. We show that DiP-SVM achieves a minimal loss in classification accuracy among other distributed support vector machine techniques on several benchmark datasets. We further demonstrate that our approach reduces communication overhead between partitions leading to faster execution on large datasets and making it suitable for implementation in cloud environments

30Cloud Infrastructure Resource Allocation for Big Data Applications
Increasing popular big data applications bring about invaluable information, but along with challenges to industrial community and academia. Cloud computing with unlimited resources seems to be the way out. However, this panacea cannot play its role if we do not arrange fine allocation for cloud infrastructure resources. In this paper, we present a multi-objective optimization algorithm to trade off the performance, availability, and cost of Big Data application running on Cloud. After analyzing and modeling the interlaced relations among these objectives, we design and implement our approach on experimental environment. Finally, three sets of experiments show that our approach can run about 20% faster than traditional optimization approaches, and can achieve about 15% higher performance than other heuristic algorithms, while saving 4% to 20% cost.

31On Distributed Fuzzy Decision Trees for Big Data
Fuzzy decision trees (FDTs) have shown to be an effective solution in the framework of fuzzy classification. The approaches proposed so far to FDT learning, however, have generally neglected time and space requirements. In this paper, we propose a distributed FDT learning scheme shaped according to the MapReduce programming model for generating both binary and multi-way FDTs from big data. The scheme relies on a novel distributed fuzzy discretizer that generates a strong fuzzy partition for each continuous attribute based on fuzzy information entropy. The fuzzy partitions are therefore used as input to the FDT learning algorithm, which employs fuzzy information gain for selecting the attributes at the decision nodes. We have implemented the FDT learning scheme on the Apache Spark framework. We have used ten real-world publicly available big datasets for evaluating the behavior of the scheme long three dimensions: i) performance in terms of classification accuracy, model complexity and execution time, ii) scalability varying the number of computing units and iii) ability to efficiently accommodate an increasing dataset size. We have demonstrated that the proposed scheme turns out to be suitable for managing big datasets even with modest commodity hardware support. Finally, we have used the distributed decision tree learning algorithm implemented in the MLLib library and the Chi-FRBCS-BigData algorithm, a MapReduce distributed fuzzy rule-based classification system, for comparative analysis.

32NPP: A New Privacy-Aware Public Auditing Scheme for Cloud Data Sharing with Group Users
Today, cloud storage becomes one of the critical services, because users can easily modify and share data with others in cloud. However, the integrity of shared cloud data is vulnerable to inevitable hardware faults, software failures or human errors. To ensure the integrity of the shared data, some schemes have been designed to allow public verifiers (i.e., third party auditors) to efficiently audit data integrity without retrieving the entire users’ data from cloud. Unfortunately, public auditing on the integrity of shared data may reveal data owners’ sensitive information to the third party auditor. In this paper, we propose a new privacy-aware public auditing mechanism for shared cloud data by constructing a homomorphic verifiable group signature. Unlike the existing solutions, our scheme requires at least t group managers to recover a trace key cooperatively, which eliminates the abuse of single-authority power and provides nonframeability. Moreover, our scheme ensures that group users can trace data changes through designated binary tree; and can recover the latest correct data block when the current data block is damaged. In addition, the formal security analysis and experimental results indicate that our scheme is provably secure and efficient.

33DiNoDB: an Interactive-speed Query Engine for Ad-hoc Queries on Temporary Data
As data sets grow in size, analytics applications struggle to get instant insight into large datasets. Modern applications involve heavy batch processing jobs over large volumes of data and at the same time require efficient ad-hoc interactive analytics on temporary data. Existing solutions, however, typically focus on one of these two aspects, largely ignoring the need for synergy between the two. Consequently, interactive queries need to re-iterate costly passes through the entire dataset (e.g., data loading) that may provide meaningful return on investment only when data is queried over a long period of time. In this paper, we propose DiNoDB, an interactive-speed query engine for ad-hoc queries on temporary data. DiNoDB avoids the expensive loading and transformation phase that characterizes both traditional RDBMSs and current interactive analytics solutions. It is tailored to modern workflows found in machine learning and data exploration use cases, which often involve iterations of cycles of batch and interactive analytics on data that is typically useful for a narrow processing window. The key innovation of DiNoDB is to piggyback on the batch processing phase the creation of metadata that DiNoDB exploits to expedite the interactive queries. Our experimental analysis demonstrates that DiNoDB achieves very good performance for a wide range of ad-hoc queries compared to alternatives

34Velocity-Aware Parallel Encryption Algorithm with Low Energy Consumption for Streams
In the environment of cloud computing, the data produced by massive users form a data stream and need to be protected by encryption for maintaining confidentiality. Traditional serial encryption algorithms are poor in performance and consume more energy without considering the property of streams. Therefore, we propose a velocity-aware parallel encryption algorithm with low energy consumption (LECPAES) for streams in cloud computing. The algorithm parallelizes Advanced Encryption Standard (AES) based on heterogeneous many-core architecture, adopts a sliding window to stabilize burst flows, senses the velocity of streams using the thresholds of the window computed by frequency ratios, and dynamically scales the frequency of Graphics Processing Units (GPUs) to lower down energy consumption. The experiments for streams at different velocities and the comparisons with other related algorithms show that the algorithm can reduce energy consumption, but only slightly increases retransmission rate and slightly decreases throughput. Therefore, LECPAES is an excellent algorithm for fast and energy-saving stream encryption.

35Efficient Recommendation of De-identification Policies using MapReduce
Many data owners are required to release the data in a variety of real world application, since it is of vital importance to discovery valuable information stay behind the data. However, existing re-identification attacks on the AOL and ADULTS datasets have shown that publish such data directly may cause tremendous threads to the individual privacy. Thus, it is urgent to resolve all kinds of re-identification risks by recommending effective de-identification policies to guarantee both privacy and utility of the data. De-identification policies is one of the models that can be used to achieve such requirements, however, the number of de-identification policies is exponentially large due to the broad domain of quasi-identifier attributes. To better control the trade off between data utility and data privacy, skyline computation can be used to select such policies, but it is yet challenging for efficient skyline processing over large number of policies. In this paper, we propose one parallel algorithm called SKY-FILTER-MR, which is based on MapReduce to overcome this challenge by computing skylines over large scale de-identification policies that is represented by bit-strings. To further improve the performance, a novel approximate skyline computation scheme was proposed to prune unqualified policies using the approximately domination relationship. With approximate skyline, the power of filtering in the policy space generation stage was greatly strengthened to effectively decrease the cost of skyline computation over alternative policies. Extensive experiments over both real life and synthetic datasets demonstrate that our proposed SKY-FILTER-MR algorithm substantially outperforms the baseline approach by up to four times faster in the optimal case, which indicates good scalability over large policy sets.

36An Adaptive Pattern Learning Framework to Personalize Online Seizure Prediction
The sudden and spontaneous occurrence of epileptic seizures can impose a significant burden on patients with epilepsy. If seizure onset can be prospectively predicted, it could greatly improve the life of patients with epilepsy and also open new therapeutic avenues for epilepsy treatment. However, discovering effective predictive patterns from massive brainwave signals is still a challenging problem. The prediction of epileptic seizures is still in its early stage. Most existing studies actually investigated the predictability of seizures offline instead of a truly prospective online prediction, and also the high inter-individual variability was not fully considered in prediction. In this study, we propose a novel adaptive pattern learning framework with a new online feature extraction approach to achieve personalized online prospective seizure prediction. In particular, a two-level online feature extraction approach is applied to monitor intracranial electroencephalogram (EEG) signals and construct a pattern library incrementally. Three prediction rules were developed and evaluated based on the continuously-updated patient-specific pattern library for each patient, including the adaptive probabilistic prediction (APP), adaptive lineardiscriminant- analysis-based prediction (ALP), and adaptive Naive Bayes-based prediction (ANBP). The proposed online pattern learning and prediction system achieved impressive prediction results for 10 patients with epilepsy using longterm EEG recordings. The best testing prediction accuracy averaged over the 10 patients were 79%, 78%, and 82% for the APP, ALP, and ANBP prediction scheme, respectively.

37Faster MapReduce Computation on Clouds through Better Performance Estimation
Processing Big Data in cloud is on the increase. An important issue for efficient execution of Big Data processing jobs on a cloud platform is selecting the best fitting virtual machine (VM) configuration(s) among the miscellany of choices that cloud providers offer. Wise selection of VM configurations can lead to better performance, cost and energy consumption. Therefore, it is crucial to explore the available configurations and opt for the best ones that well suit each MapReduce application. Profiling the given application on all the configurations is costly, time and energy consuming. An alternative is to run the application on a subset of configurations (sample configurations) and estimate its performance on other configurations based on the obtained values by sample configurations. We show that the choice of these sample configurations highly affects accuracy of later estimations. Our Smart Configuration Selection (SCS) scheme chooses better representatives from among all configurations by once-off analysis of given performance figures of the benchmarks so as to increase the accuracy of estimations of missing values, and consequently, to more accurately choose the configuration providing the highest performance. The results show that the SCS choice of sample configurations is very close to the best choice, and can reduce estimation error to 11.58% from the original 19.72% of random configuration selection. More importantly, using SCS estimations in a makespan minimization algorithm improves the execution time by up to 36.03% compared with random sample selection.

38Mutual Privacy Preserving k-Means Clustering in Social Participatory Sensing
In this paper, we consider the problem of mutual privacy-protection in social participatory sensing in which individuals contribute their private information to build a (virtual) community. Particularly, we propose a mutual privacy preserving k-means clustering scheme that neither discloses individual’s private information nor leaks the community’s characteristic data (clusters). Our scheme contains two privacy-preserving algorithms called at each iteration of the k-means clustering. The first one is employed by each participant to find the nearest cluster wshile the cluster centers are kept secret to the participants; and the second one computes the cluster centers without leaking any cluster center information to the participants while preventing each participant from figuring out other members in the same cluster. An extensive performance analysis is carried out to show that our approach is effective for k-means clustering, can resist collusion attacks, and can provide mutual privacy protection even when the data analyst colludes with all except one participant.

39Cloud Finder: A System for Processing Big Data Workloads on Volunteered Federated Clouds
The proliferation of private clouds that are often underutilized and the tremendous computational potential of these clouds when combined has recently brought forth the idea of volunteer cloud computing (VCC), a computing model where cloud owners contribute underutilized computing and/or storage resources on their clouds to support the execution of applications of other members in the community. This model is particularly suitable to solve big data scientific problems. Scientists in data-intensive scientific fields increasingly recognize that sharing volunteered resources from several clouds is a cost-effective alternative to solve many complex, data- and/or compute-intensive science problems. Despite the promise of the idea of VCC, it still remains at the vision stage at best. Challenges include the heterogeneity and autonomy of member clouds, access control and security, complex inter-cloud virtual machine scheduling, etc. In this paper, we present CloudFinder, a system that supports the efficient execution of big data workloads on volunteered federated clouds (VFCs). Our evaluation of the system indicates that VFCs are a promising cost-effective approach to enable big data science

40Secure k-NN Query on Encrypted Cloud Data with Multiple Keys ?
The k-nearest neighbors (k-NN) query is a fundamental primitive in spatial and multimedia databases. It has extensive applications in location-based services, classification & clustering and so on. With the promise of confidentiality and privacy, massive data are increasingly outsourced to cloud in the encrypted form for enjoying the advantages of cloud computing (e.g., reduce storage and query processing costs). Recently, many schemes have been proposed to support k-NN query on encrypted cloud data. However, prior works have all assumed that the query users (QUs) are fully-trusted and know the key of the data owner (DO), which is used to encrypt and decrypt outsourced data. The assumptions are unrealistic in many situations, since many users are neither trusted nor knowing the key. In this paper, we propose a novel scheme for secure k-NN query on encrypted cloud data with multiple keys, in which the DO and each QU all hold their own different keys, and do not share them with each other; meanwhile, the DO encrypts and decrypts outsourced data using the key of his own. Our scheme is constructed by a distributed two trapdoors public-key cryptosystem (DT-PKC) and a set of protocols of secure two-party computation, which not only preserves the data confidentiality and query privacy but also supports the offline data owner. Our extensive theoretical and experimental evaluations demonstrate the effectiveness of our scheme in terms of security and performance.

41A Scalable Data Chunk Similarity Based Compression Approach for Efficient Big Sensing Data Processing on Cloud
Big sensing data is prevalent in both industry and scientific research applications where the data is generated with high volume and velocity. Cloud computing provides a promising platform for big sensing data processing and storage as it provides a flexible stack of massive computing, storage, and software services in a scalable manner. Current big sensing data processing on Cloud have adopted some data compression techniques. However, due to the high volume and velocity of big sensing data, traditional data compression techniques lack sufficient efficiency and scalability for data processing. Based on specific on-Cloud data compression requirements, we propose a novel scalable data compression approach based on calculating similarity among the partitioned data chunks. Instead of compressing basic data units, the compression will be conducted over partitioned data chunks. To restore original data sets, some restoration functions and predictions will be designed. MapReduce is used for algorithm implementation to achieve extra scalability on Cloud. With real world meteorological big sensing data experiments on U-Cloud platform, we demonstrate that the proposed scalable compression approach based on data chunk similarity can significantly improve data compression efficiency with affordable data accuracy loss

42Revocable Identity-Based Access Control for Big Data with Verifiable Outsourced Computing
To be able to leverage big data to achieve enhanced strategic insight, process optimization and make informed decision, we need to be an efficient access control mechanism for ensuring end-to-end security of such information asset. Signcryption is one of several promising techniques to simultaneously achieve big data confidentiality and authenticity. However, signcryption suffers from the limitation of not being able to revoke users from a large-scale system efficiently. We put forward, in this paper, the first identity-based (ID-based) signcryption scheme with efficient revocation as well as the feature to outsource unsigncryption to enable secure big data communications between data collectors and data analytical system(s). Our scheme is designed to achieve end-to-end confidentiality, authentication, non-repudiation, and integrity simultaneously, while providing scalable revocation functionality such that the overhead demanded by the private key generator (PKG) in the key-update phase only increases logarithmically based on the cardiality of users. Although in our scheme the majority of the unsigncryption tasks are outsourced to an untrusted cloud server, this approach does not affect the security of the proposed scheme. We then prove the security of our scheme, as well as demonstrating its utility using simulations

43Managing Big Interval Data with CINTIA: the Checkpoint INTerval Array
Intervals have become prominent in data management as they are the main data structure to represent a number of key data types such as temporal or genomic data. Yet, there exists no solution to compactly store and efficiently query big interval data. In this paper we introduce CINTIA—the Checkpoint INTerval Index Array—an efficient data structure to store and query interval data, which achieves high memory locality and outperforms state-of-the art solutions. We also propose a low-latency, Big Data system that implements CINTIA on top of a popular distributed file system and efficiently manages large interval data on clusters of commodity machines. Our system can easily be scaled-out and was designed to accommodate large delays between the various components of a distributed infrastructure. We experimentally evaluate the performance of our approach on several datasets and show that it outperforms current solutions by several orders of magnitude in distributed settings

44HDM: A Composable Framework for Big Data Processing
Over the past years, frameworks such as MapReduce and Spark have been introduced to ease the task of developing big data programs and applications. However, the jobs in these frameworks are roughly defined and packaged as executable jars without any functionality being exposed or described. This means that deployed jobs are not natively composable and reusable for subsequent development. Besides, it also hampers the ability for applying optimizations on the data flow of job sequences and pipelines. In this paper, we present the Hierarchically Distributed Data Matrix (HDM) which is a functional, strongly-typed data representation for writing composable big data applications. Along with HDM, a runtime framework is provided to support the execution, integration and management of HDM applications on distributed infrastructures. Based on the functional data dependency graph of HDM, multiple optimizations are applied to improve the performance of executing HDM jobs. The experimental results show that our optimizations can achieve improvements between 10% to 40% of the Job- Completion- Time for different types of applications when compared with the current state of art, Apache Spark.

45Online Similarity Learning for Big Data with Overfitting
In this paper, we propose a general model to address the overfitting problem in online similarity learning for big data, which is generally generated by two kinds of redundancies: 1) feature redundancy, that is there exists redundant (irrelevant) features in the training data; 2) rank redundancy, that is non-redundant (or relevant) features lie in a low rank space. To overcome these, our model is designed to obtain a simple and robust metric matrix through detecting the redundant rows and columns in the metric matrix and constraining the remaining matrix to a low rank space. To reduce feature redundancy, we employ the group sparsity regularization, i.e., the `2;1 norm, to encourage a sparse feature set. To address rank redundancy, we adopt the low rank regularization, the max norm, instead of calculating the SVD as in traditional models using the nuclear norm. Therefore, our model can not only generate a low rank metric matrix to avoid overfitting, but also achieves feature selection simultaneously. For model optimization, an online algorithm based on the stochastic proximal method is derived to solve this problem efficiently with the complexity of O(d2). To validate the effectiveness and efficiency of our algorithms, we apply our model to online scene categorization and synthesized data and conduct experiments on various benchmark datasets with comparisons to several state-of-the-art methods. Our model is as efficient as the fastest online similarity learning model OASIS, while performing generally as well as the accurate model OMLLR. Moreover, our model can exclude irrelevant / redundant feature dimension simultaneously.

46A Novel Methodology to Acquire Live Big Data Evidence from the Cloud
In the last decade Digital Forensics has experienced several issues when dealing with network evidence. Collecting network evidence is difficult due to its volatility. In fact, such information may change over time, may be stored on a server out jurisdiction or geographically far from the crime scene. On the other hand, the explosion of the Cloud Computing as the implementation of the Software as a Service (SaaS) paradigm is pushing users toward remote data repositories such as Dropbox, Amazon Cloud Drive, Apple iCloud, Google Drive, Microsoft OneDrive. In this paper is proposed a novel methodology for the collection of network evidence. In particular, it is focused on the collection of information from online services, such as web pages, chats, documents, photos and videos. The methodology is suitable for both expert and non-expert analysts as it “drives” the user through the whole acquisition process. During the acquisition, the information received from the remote source is automatically collected. It includes not only network packets, but also any information produced by the client upon its interpretation (such as video and audio output). A trusted-third-party, acting as a digital notary, is introduced in order to certify both the acquired evidence (i.e., the information obtained from the remote service) and the acquisition process (i.e., all the activities performed by the analysts to retrieve it). A proof-of-concept prototype, called LINEA, has been implemented to perform an experimental evaluation of the methodology

47Tales of Two Cities: Using Social Media to Understand Idiosyncratic Lifestyles in Distinctive Metropolitan Areas
Lifestyles are a valuable model for understanding individuals’ physical and mental lives, comparing social groups, and making recommendations for improving people's lives. In this paper, we examine and compare lifestyle behaviors of people living in cities of different sizes, utilizing freely available social media data as a large-scale, low-cost alternative to traditional survey methods. We use the Greater New York City area as a representative for large cities, and the Greater Rochester area as a representative for smaller cities in the United States. We employed matrix factor analysis as an unsupervised method to extract salient mobility and work-rest patterns for a large population of users within each metropolitan area. We discovered interesting human behavior patterns at both a larger scale and a finer granularity than is present in previous literature, some of which allow us to quantitatively compare the behaviors of individuals of living in big cities to those living in small cities. We believe that our social media-based approach to lifestyle analysis represents a powerful tool for social computing in the big data age

48Toward Efficient and Flexible Metadata Indexing of Big Data System
In Big Data era, applications are generating orders of magnitude more data in both volume and quantity. While many systems emerge to address such data explosion, the fact that these data’s descriptors, i.e., metadata, are also “big” is often overlooked. The conventional approach to address the big metadata issue is to disperse
metadata into multiple machines. However, it is extremely difficult to preserve both load-balance and data-locality in this approach. To this end, in this work we propose hierarchical indirection layers for indexing the underlying distributed metadata. By doing this, data locality is achieved efficiently by the indirection while load-balance is preserved. Three key challenges exist in this approach, however: first, how to achieve high resilience; second, how to ensure flexible granularity; third, how to restrain performance overhead. To address above challenges, we design Dindex, a distributed indexing service for metadata. Dindex incorporates a hierarchy of coarse-grained aggregation and horizontal key-coalition. Theoretical analysis shows that the overhead of building Dindex is compensated by only two or three queries. Dindex has been implemented by a lightweight distributed key-value store and integrated to a fully-fledged distributed filesystem. Experiments demonstrated that Dindex accelerated metadata queries by up to 60 percent with a negligible overhead.

49Learning to Classify Fine-Grained Categories with Privileged Visual-Semantic Misalignment
Image categorisation is an active yet challenging research topic in computer vision, which is to classify the images according to their semantic content. Recently, fine-grained object categorisation has attracted wide attention and remains difficult due to feature inconsistency caused by smaller inter-class and larger intra-class variation as well as large varying poses. Most of the existing frameworks focused on exploiting a more discriminative imagery representation or developing a more robust classification framework to mitigate the suffering. The concern has recently been paid to discovering the dependency across fine-grained class labels based on Convolutional Neural Networks. Encouraged by the success of semantic label embedding to discover the fine-grained class labels’ correlation, this paper exploits the misalignment between visual feature space and semantic label embedding space and incorporates it as a privileged information into a cost-sensitive learning framework. Owing to capturing both the variation of imagery feature representation and also the label correlation in the semantic label embedding space, such a visual-semantic misalignment can be employed to reflect the importance of instances, which is more informative that conventional cost-sensitivities. Experiment results demonstrate the effectiveness of the proposed framework on public fine-grained benchmarks with achieving superior performance to state-of-the-arts

LiveZilla Live Chat Software

Hi there! Click one of our representatives below and we will get back to you as soon as possible.

Chat with us on WhatsApp
Project Title & Abstract