
Glossary
Here we have collected all the key terms you need to deepen your understanding of spatial data science and location intelligence.
A
Agglomerative Clustering
Agglomerative clustering is a type of hierarchical clustering where clusters are built from the bottom up.
AI Agent
An AI Agent is an intelligent system, typically powered by a Large Language Model (LLM), that can autonomously perform tasks, make decisions, and interact with users.
Alteryx
Alteryx provides a user-friendly interface to perform tasks such as data cleansing, transformation, and predictive analytics.
Analytics Toolbox
CARTO's Analytics Toolbox is a set of user-defined functions and stored procedures to unlock spatial analytics on top of your cloud data warehouse platform.
Apache Iceberg
Apache Iceberg is an open-source table format that brings the reliability and scalability of data warehouses to data lakes. It's a key standard for a lakehouse architecture.
Artificial Intelligence (AI)
In spatial analytics, AI can be used to uncover trends in location data, automate decision-making, and optimize everything from delivery routes to urban planning.
B
Bayes Theorem
Bayes theorem is a mathematical formula used to determine the conditional probability of events. A theorem developed by English mathematician Thomas Bayes.
Bring Your Own LLM (BYOLLM)
Bring Your Own LLM (BYOLLM) allows organizations to connect and use their own Large Language Model (LLM), instead of being limited to default or proprietary models.
C
Cloud Data Warehouse
A cloud data warehouse is a specialized database system hosted and managed as a service in a cloud computing environment.
Cloud Native
Cloud-native is a modern approach to building and deploying applications that fully leverage cloud computing capabilities for scalability, resilience, and agility.
Cross-Validation
Cross-validation is a model validation technique for assessing how the results of a statistical model will generalize to new data.
D
Data Lake
A data lake is a collection of data or files generally stored in a cloud file storage or blob infrastructure. This data may or may not be unprocessed or clean.
Data Warehouse
A data warehouse is an OLAP that can distribute query workloads across compute infrastructure and where data is organized in a structured manner.
Databricks
Databricks Data Intelligence Platform simplifies the process of building and deploying big data applications.
DBSCAN
DBSCAN is a clustering method that groups data that are close to each other based on a distance metric and a minimum number of data points.
Deck.gl
deck.gl is a popular frontend visualization library originally developed as a project within Uber. It has been open-sourced and maintained by a large community.
E
Esri
Esri develops GIS software, provides geospatial solutions, and offers consulting services for various industries such as government, utilities, and others.
EU AI Act
The EU AI Act classifies AI systems based on their risk level. For example, "high-risk" systems require data governance, transparency, and auditability.
G
Generative AI
Generative AI uses deep learning models to create new content — text, images, code, and data — by learning patterns from training data. In spatial analytics, it enables synthetic data generation, scenario simulation, and natural language interfaces.
Geocoding
Geocoding is the process of converting addresses or place names into geographic coordinates (latitude and longitude) for use in mapping and spatial analysis.
Geographic Coordinates
Geographic coordinates use latitude and longitude to specify a location on the Earth’s curved surface. Geographic coordinates provide a global reference system.
Geographically Weighted Regression GWR
Geographically Weighted Regression GWR quantifies the strength of relationships across space between a target and correlation variables.
GeoJSON
GeoJSON is a type of JSON format designed for encoding geographic data structures. It supports various geometry types such as points, lines, and polygons.
GeoParquet
GeoParquet is a project aimed at extending the Parquet file format to directly support geometry data. A benefit of GeoParquet is its cloud data interoperability
Geospatial Foundation Models
Geospatial foundation models are AI models trained on vast datasets like satellite imagery to understand the physical and human world, adapting to diverse tasks.
Geospatial Reasoning
Geospatial Reasoning is the ability of an AI system to understand a complex spatial problem, independently devise a multi-step analytical plan, execute it, and interpret the results to provide an actionable solution.
Google Cloud's BigQuery
Google Cloud's BigQuery is the serverless, cost-effective, and multi-cloud data warehouse offered by Google Cloud Platform (GCP).
GPKG
Geopackage GPKG is an open standard vector file format developed by the Open Geospatial Consortium. Designed to be platform-independent and self-contained.
H
HEAVY.AI
HEAVY.AI is a San Francisco-based company that specializes in providing advanced analytics solutions for businesses and government organizations.
Hotspot Analysis
Hotspot analysis is a spatial analysis technique used to identify and evaluate statistically significant spatial clusters of high or low values in a dataset.
HTML
HTML plays a crucial role in being used to embed interactive maps, geospatial visualizations, and geospatial data in web applications.
K
Kinetica
Kinetica is a real-time GPU-accelerated analytic database for big data. Kinetica leverages GPUs and modern many-core CPUs to improve performance.
KML
KML stands for Keyhole Markup Language and it is an XML-based file format used to display geographic data. Originally developed for Google Earth.
Kriging
Kriging is a spatial interpolation method for predicting missing values, taking into account the distance and spatial correlation of known data points.
L
Lakehouse
A lakehouse is a modern data architecture that combines the flexibility and low-cost storage of a data lake with the transactional capabilities and performance of a data warehouse.
Large Language Model (LLM)
Large Language Model (LLM) is a type of generative AI trained on massive datasets to understand and produce human-like language.
Location Intelligence
Location Intelligence is the practice of deriving actionable business insights from spatial data to answer questions about where, why, and what happens at specific places.
M
Machine Learning (ML)
Machine Learning (ML) is a subset of AI where algorithms learn from data to make predictions or decisions, improving over time without being explicitly programmed.
Map Software
Map software refers to computer programs or applications designed to create, display, and manipulate spatial information in the form of maps.
Mapbox
Mapbox enables users to create and integrate custom maps into their applications and websites. It provides a range of tools and APIs for map design and geocoding.
Markov Chain Monte Carlo MCMC
Markov Chain Monte Carlo (MCMC) is a class of simulation methods used to approximate the posterior distribution by randomly sampling in a probabilistic space.
Model Context Protocol (MCP)
The Model Context Protocol (MCP) is an open standard that enables AI applications to securely connect with external tools, data sources, and systems through a unified interface.
Model-Agnostic
Model-Agnostic refers to CARTO's philosophy of not depending on a single AI provider. It gives the user the freedom to choose the one that best fits their needs and adheres to their governance requirements.
P
Projected Coordinates
Projected Coordinates use a two-dimensional Cartesian coordinate system to represent locations on a flat surface, such as a map or a plane.
Python
Manipulate, visualize, and analyze spatial data with Python, a programming language for spatial analysis. Relevant for GeoPandas, Shapely, and PySAL libraries.
R
R (Programming Language)
R is a programming language specifically designed for statistical computing and data analysis. R offers spatially focused packages like sf, sp, and raster.
Raster Data
Raster data is represented as a grid of cells or pixels, with each cell containing a value. It represents elevation, temperature, satellite imagery, and more.
Retrieval-Augmented Generation (RAG)
Retrieval-Augmented Generation (RAG) is a technique that enhances LLM responses. This allows responses to be more accurate, reliable, and based on real-time data.
S
Snowflake
Snowflake is a single, global platform that powers the Data Cloud. Its architecture separates computing and storage, allowing independent cost optimization.
Spatial Data
Spatial data is any data with a geographic component that connects information to a specific location on Earth, such as coordinates, addresses, or boundaries.
Spatial Data Catalog
The Spatial Data Catalog is a great resource for Data Scientists, Analysts, and Developers who want to enrich their internal data and enhance spatial analysis.
Spatial Data Governance
Spatial data governance is a framework that provides a clear, structured approach to managing your spatial data, making it trustworthy, secure, and compliant.
Spatial Data Science
Spatial data science is a subset of data science that applies statistical and machine learning methods to geographic data to understand why things happen where they do.
Spatial Indexes
Spatial indexes are hierarchical grid systems that divide the Earth's surface into discrete cells, enabling fast spatial queries and scalable geospatial analysis.
Spatial SQL
Spatial SQL extends standard SQL with functions for querying and analyzing geographic data types like geometries and geographies in databases and data warehouses.
SQL
Structured Query Language SQL is used for managing and querying relational database management systems RDBMS, as well as running analysis.
