Pachyderm has been acquired by Hewlett Packard Enterprise - Learn more

What Is Data Structure? 

« Back to Glossary Index

Data structures are formats used for storing and organizing data according to certain purposes. Structured data allows users to access and use it easily. In programming, data structures are used together with algorithms to write computer programs. The algorithms can perform certain tasks using the structured data. 

 

Types of Data Structures

There are two basic types of data structures: primitive and non-primitive.

Primitive data structures can only store one type of data. You use machine-level instructions to operate them. Primitive data structures include: 

  • Floats: The structure used for data types that hold decimal values.
  • Integers: Structures for data types containing negative or positive whole numbers.
  • Character: For data types with single character values, both upper and lower case.
  • Boolean: This structure is used for data types with True or False values. 

Meanwhile, non-primitive data structures are created using primitive data structures. As such, they’re more complex and can often store more than one data type. Some examples of non-primitive data structures are:

  • Array: A data structure containing a fixed number of primitive data types. However, it can only hold one type. For example, it cannot have both integers and floats. 
  • Strings: This is essentially a character array but terminates with the null character “\0”.
  • Stacks: This data structure is used for operations that follow a linear sequence. 

 

What Is the Difference between Structured and Unstructured Data?

Structured data refers to data types that are easily identifiable in machine language, and are typically the types of data found in a spreadsheet, database, or data warehouse. They’re typically quantitative data, which makes them more searchable in databases. Some examples of structured data include: 

  • Addresses
  • Social Security numbers
  • Geolocations
  • Telemetry 

Structured data is easily ingested and transformed by many data pipeline tools. Here is a walkthrough of building a machine learning pipeline with Pachyderm and Snowflake: 

 

Meanwhile, unstructured data is data that cannot be easily analyzed with machine language. This can make the data more difficult to organize in relational databases and process for machine learning applications, because it includes data that is very challenging for computers to interpret, and can also be very challenging to track with data versioning. Examples of unstructured data include: 

  • Audio files
  • Text files
  • Image and video files

See how Pachyderm client RTL uses data-driven pipelines to automatically scale video file processing in this talk. 

 

Handle Unstructured Data with Pachyderm

Many database platforms only handle structured data. However, most of us deal with unstructured data at work but don’t have the means to manage them efficiently. With Pachyderm, you can work with any file type and scale it as needed to optimize your storage space. Think Pachyderm is a good fit for your data needs? Request a Demo today. 

« Back to Glossary Index