Pachyderm has been acquired by Hewlett Packard Enterprise - Learn more

Glossary

What Are Data Tests?

Data tests, or data-driven testing, are computer software testing methods for assessing a program's output. Data testing can involve challenging a system's capabilities under specific conditions or determining whether it can execute commands as the developer team intended. ...

Read More...

What Are Skew Tests?

Skew tests or tests of skew measure the asymmetry of an ideally symmetrical probability distribution. They test the normalcy of a particular dataset, and the resulting figure, known as skewness, displays the degree and direction of skew or deviation from horizontal symmetry. The further the skewness is from zero, the more the dataset deviates from the normal distribution....

Read More...

What Is a Data Bug?

A data bug refers to an error or flaw in a dataset. If left unaddressed, data bugs significantly affect a model’s predictions or outcomes, which may be either favorable or unfavorable to the developer....

Read More...

What Is a Data Repository?

A data repository, also known as a data library or data archive, refers to a database infrastructure—composed of several databases—that collects, manages,...

Read More...

What Is a Dataset?

When a collection of data is organized in a specific manner, such as a table or other schema, a dataset is created. Organizing the data helps you interpret its most critical elements and gain new insight, such as patterns and trends....

Read More...

What Is Automatic Speech Recognition?

Automatic speech recognition (ASR) is a technology that processes human speech and converts it to text in real-time. Unlike voice recognition, its...

Read More...

What Is Bias-Variance Tradeoff?

Bias-variance tradeoff refers to determining a machine learning model that balances complexity to minimize errors due to bias and variance. The ideal...

Read More...

What Is Data Structure? 

Data structures are formats used for storing and organizing data according to certain purposes. Structured data allows users to access and use it easily. In programming, data structures are used together with algorithms to write computer programs. The algorithms can perform certain tasks using the structured data. ...

Read More...

What Is Data Versioning?

Data versioning is when different versions of the same data are kept in different places, based on when it was made and how it was changed. A new version is created with modifications in a dataset’s contents, structure, or condition. Versioning is one way to keep track of changes that happen when you reprocess, correct, or add new data....

Read More...

What Is Data-Centric Development?

Data-centric development is a methodology that focuses on defining which projects or systems should be produced using available data....

Read More...

What Is DataOps?

DataOps, short for data operations, is a term for an agile, process-oriented way to organize, deliver, and leverage data analytics. It is a group activity that aims to improve the communication, integration, and automation of data flows across an organization....

Read More...

What Is Deep Learning?

Deep learning is a subset of machine learning. It is powered by artificial neural networks (ANNs), which are algorithms modeled to simulate the human brain’s capability by learning from large amounts of data....

Read More...

What Is Input Space?

In a supervised learning model, the input space comprises all potential sets of values for input. It is usually larger than the output space containing all possible outputs of a model. When training a neural network, input space can be found in each layer, making it an independent entity....

Read More...

What Is Lineage?

When referring to data, the term lineage pertains to the data’s life cycle or line of descent. Data lineage describes the route of data from source to destination and its transformations along the way. ...

Read More...

What Is MLOps?

Machine learning operations (MLOps) are practices for deploying and maintaining machine learning models in production. By using these practices, teams that manage the machine learning lifecycle, such as operations professionals, data scientists, and IT, will collaborate and communicate more effectively....

Read More...

What Is Model Prediction?

A model prediction is the anticipated outcome of a machine learning model based on the analysis of available data. It is the result of predictive models built on algorithms that determine trends, patterns, and insights within past and recent datasets, given the quality of assumptions and data analysis....

Read More...

What Is Model Training?

In machine learning, model training provides data to a machine learning algorithm from which it can learn and improve. It also involves determining the optimal parameters for a specific prediction range. Machines learn using a loss function, which is a method for identifying how effectively a particular algorithm represents the provided data....

Read More...

What Is Natural Language Processing?

Natural language processing (NLP) pertains to the branch of artificial intelligence that enables computers to understand, process, interpret, and generate text and spoken words of the human language....

Read More...