Skip to content
  • Announcements regarding our community

    7 Topics
    7 Posts
    ruimsramosR

    Google I/O kicks off tomorrow!
    Tune in live May 20-21

    Livestream Playlist:

    https://www.youtube.com/playlist?list=PLOU2XLYxmsIJEQRQDtuYKVUmxYvRfWN5m
  • A place to talk about whatever you want

    5 Topics
    6 Posts
    ruimsramosR

    Iceberg and the Rise of the Lakehouse

    Apache Iceberg is a powerful and flexible open-source table format that works with cloud object storage such as Amazon S3 and Google Cloud Storage. Iceberg doesn’t directly define how your data is stored (like in parquet or ORC format) but defines how data gets organized logically, like a blueprint for structuring and accessing data efficiently. Iceberg is exciting because it offers high-performance data warehousing features to object storage like S3

    Article: https://hightouch.com/blog/iceberg-rise-of-the-lakehouse

    Anyone using Apache Iceberg at the moment that would like to share their experience ?

  • Share your projects and useful findings here.

    1 Topics
    1 Posts
    ruimsramosR

    If you use sqlite you may know that it updates data at the row level. I've just bumped into this project that aims to bring column-oriented storage to SQLite and would like to share with you.

    This is the project description:

    Stanchion

    Column-oriented tables in SQLite

    Why?

    Stanchion is a SQLite 3 extension that brings the power of column-oriented storage to SQLite, the most widely deployed database. SQLite exclusively supports row-oriented tables, which means it is not an ideal fit for all workloads. Using the Stanchion plugin brings all of the benefits of column-oriented storage and data warehousing to anywhere that SQLite is already deployed, including your existing tech stack.

    There are a number of situations where column-oriented storage outperforms row-oriented storage:

    Storing and processing metric, log, and event data Timeseries data storage and analysis Analytical queries over many rows and a few columns (e.g. calculating the average temperature over months of hourly weather data) Change tracking, history/temporal tables Anchor modeling / Datomic-like data models

    Stanchion is an ideal fit for analytical queries and wide tables because it only scans data from the columns that are referenced by a given query. It uses compression techniques like run length and bit-packed encodings that significantly reduce the size of stored data, greatly reducing the cost of large data sets. This makes it an ideal solution for storing large, expanding datasets.

    You can find more information on the official Github repo:

    https://github.com/dgllghr/stanchion
  • A place to share job openings or availability for new challenges

    6 Topics
    6 Posts
    ruimsramosR

    Hi everyone,

    Vestas is looking for a Lead Data Engineer for the Porto Office.

    https://www.linkedin.com/jobs/view/3998624331

  • Learning Assets, Courses or reference Articles

    2 Topics
    2 Posts
    ruimsramosR

    Catch up Google IO keynotes, technical sessions, and on-demand learning sessions.

    https://io.google/2023/program/?q=workshop,codelab,learning-pathway,demo
  • Member-contributed articles

    3 Topics
    3 Posts
    ruimsramosR

    If you guys haven't start looking into DuckDB yet, you should check the launch of Motherduck service.

    DuckDB recently announced the launch of after six years of development, providing a very stable version. Complementing this, MotherDuck introduced a PaaS solution that offers a serverless execution model, storage, and a service layer for collaborative work. This is an excellent solution for those who want to avoid the complexities of underlying service management while empowering their users.

    The following article provides more details:

    https://rramos.github.io/2024/06/12/motherduck
  • Got a question? Ask away!

    0 Topics
    0 Posts
    No new posts.