Public Sector
Spain
Spain's Tax Agency Leverages Cloudera for Data Innovation
The Agencia Tributaria (AEAT) is the Spanish public entity in charge of the correct application of the country's tax system for over 48 million citizens. The agency is responsible for managing and collecting state taxes, including customs duties, as well as preventing and detecting fraud, and enforcing sanctions for non-compliance with tax laws. The agency collaborates across different Spanish regions and works within the European Union's requirements and is responsible for many assistance services to taxpayers. As with tax agencies around the world, they face growing data volumes, changing regulations and new opportunities for AI.
Managing Massive Data Growth and Advanced Analytics Needs
The agency needed to grapple with a growing volume of data from various sources, a challenge exacerbated by Spain's population exceeding 48 million. This necessitated not just the ability to manage large data volumes but also advanced analytics and AI capabilities. The agency needed to be able to perform agile queries and develop more data engineering and machine learning algorithms in an environment with billions of records.
They needed a solution that could be easily integrated with existing systems but also had scalability, high availability and guaranteed information governance and security. They also needed a platform to simplify data administration tasks, add new analytics capabilities, and optimize and streamline tax and customs administration.
Implementing Data Lakehouse for Advanced Analytics and Compliance
The organization strategically chose Cloudera as a partner to create its Big Data platform. Recognizing the importance of a data lakehouse architecture to adapt to management and continuous growth needs, the Agencia Tributaria deployed the services on-premises within its proprietary infrastructure of high-availability servers.
Under this infrastructure, the Tax Agency works on several levels. First, Cloudera allows them to create isolated, controlled and performance-optimized data spaces through data partitioning and replication. In addition, they are able to index information from millions of documents that they were previously unable to fully capitalize on, as they are now able to search all of this content by terms.
They also rely on Cloudera to run complex algorithms that require the processing capabilities of distributed systems. Thanks to Hive and Impala databases and parallel processing with Spark, they perform operations on data tables with billions of records, doing massive and complex crosswalks, pattern searches, etc.
Finally, governance and regulatory compliance are essential for the organization, which must comply with the National Security Scheme of Spain. Using Cloudera Shared Data Experience, they grant access to data only to the appropriate users, enabling the governance of teams for each business area.
Enhanced Data Management and Preparing for Future Growth
Now, Cloudera is deployed in four different clusters to meet the needs of the organization's different environments. In total, more than forty dedicated processing nodes are operational. The platform hosts several hundred terabytes of information.
The Tax Agency relies on Cloudera for data management and a foundation for performing advanced data analysis in a constantly growing environment. The platform helps the organization constantly improve its analytical capabilities and ensure regulatory compliance.
The organization is now able to index information from millions of documents, run complex algorithms, and create isolated and controlled data spaces.
In the future, the Agency anticipates a further increase in data, driven by new sources and ongoing growth. As a result, their data and analytics needs will continue to expand, and they plan to leverage tools such as Cloudera's to fulfill the state-level functions they perform.