dice.camp is one of the many independent Mastodon servers you can use to participate in the fediverse.
A Mastodon server for RPG folks to hang out and talk. Not owned by a billionaire.

Administered by:

Server stats:

1.5K
active users

#dataengineering

1 post1 participant0 posts today

Hi #GetFediHired, I'm looking for a #remote role in the US (or #sweden if you provide visa assistance!).

I've worked mostly in #SoftwareEngineering space, but I do lean closer to the #DataEngineering side of things (past 3 years). Before that I was varying levels of doing SWE things inside a #BusinessIntelligence role (~5 years).

Looking for something that demands strong #Python skills (~5+ years of heavy, daily use), though wouldn't mind having to learn something new. Quite comfortable in a few #SQL flavors. I can actually read most #regex, if that's a thing worth bragging about. Love writing #xpath in personal webscraping projects. Somewhat familiar with #SpringBoot and #Kotlin (1 year, occasional) and would like to eventually use more Java in work, but not a hard requirement.

I love refactoring/improving old code, and I have lots of experience with CI/CD, coding best practices, testing, web scraping, backend (Flask) & frontend (React, Typescript).

Send me a message if this sounds like I'd be a great fit on your team!

Why normalize databases?
Yesterday, my tutoring student asked me why databases need to be normalized at all. She said: “Wouldn’t it be easier to just have one big table with all the information?”

It’s a common first question when learning about relational databases.
At first, one big table (e.g. customer name, order date, product name, price) seems easiest.

I told her:
:blobcoffee: Because that quickly leads to data redundancy, anomalies, and integrity issues when inserting, updating, or deleting records.
:blobcoffee: Normalization means structuring data into separate, related tables, so that each fact is stored only once. This reduces redundancy & preserves consistency.

My weekly newsletter is out! 🚀

This week's agenda:
🔹 Open Source of the Week - The dagster project
🔹 New learning resources - Forecasting with linear regression, multi-model LLM, multiprocessing with Python
🔹 Book of the week - Visualization for Social Data Science by Roger Beecham

📌 Join 29k subscribers and subscribe to get weekly updates 🗞️👇🏼
ramikrispin.substack.com/p/the

Rami's Data Newsletter · The Dagster Project, Visualization for Social Data Science, Forecasting with Linear RegressionBy Rami Krispin

🧙‍♂️ One does not simply build reports on OLTP data…

This week on The Drill Down with Ahmad & James, our special guest
Kristyna Ferris will be presenting a session titled "The Fellowship of the Star Schema: Transforming OLTP Data for Power BI"

🛠️ This session is packed with:
- Clear distinctions between OLTP & OLAP
- Tips for building Power BI-ready models
- A sprinkle of Slowly Changing Dimension magic

💡Whether you’re a data wizard 🧙, business hobbit 🧝‍♀️, or SQL ranger 🏹 — this is your quest.

🗓️ Join us LIVE on LinkedIn | Wednesday, July 2nd @ 2PM Central
lnkd.in/eWh4SsBb

pro tip for user interface designers:

if you have hundreds of millions of dollars of venture capital and you want to make a user facing data analytics tool of some kind and you think it's reasonable to ask an average human being to type this:

CAST('2023-05-01' AS TIMESTAMP)

to do literally anything with a date or time in your application's user interface, just stop right there. do not pass go, do not collect $200, and do not ever attempt to offer feedback to a UX designer ever again. something is deeply broken inside you that means there are certain mysteries of the universe that even the guys who designed the postgres command line can access that you will never know, and that's ok. You can still live a really rad life.

New Open-Source Tool Spotlight 🚨🚨🚨

Transform any URL into an LLM-ready input with `Reader`. Just prefix the URL with `r.jina.ai/` for clean, readable content extraction. Perfect for enhancing agents & RAG pipelines. #LLM #NLP

Need web search results for your LLM? Prepend queries with `s.jina.ai/` to fetch top results—content included. E.g., `s.jina.ai/your+query` brings knowledge directly to your model. #AItools #DataEngineering

Reader API now supports images! Captions are auto-generated for images missing alt tags, giving LLMs better context for reasoning and summarizing multimedia pages. #MachineLearning #AI

🔗 Project link on #GitHub 👉 github.com/jina-ai/reader

#Infosec #Cybersecurity #Software #Technology #News #CTF #Cybersecuritycareer #hacking #redteam #blueteam #purpleteam #tips #opensource #cloudsecurity

✨
🔐 P.S. Found this helpful? Tap Follow for more cybersecurity tips and insights! I share weekly content for professionals and people who want to get into cyber. Happy hacking 💻🏴‍☠️

🔔 Slides zu Legal Data Engineering 🔔

Was ist Legal Data Engineering? Wie sieht die Praxis juristischer Daten in Deutschland aus? Welche rechtlichen Probleme ergeben sich im Zusammenhang mit Legal Data Engineering? Diese Präsentation bietet eine Einführung zu Legal Data Engineering und sucht Antworten auf diese Fragen.

Slides: zenodo.org/records/15575231/fi

Legal Data Engineering ist der Schwerpunkt eines jeden Legal Data Science Projekts. Kern von Data Engineering ist der ETL-Prozess: Extraktion, Transformation und das (Hoch-)Laden von Daten. Die Slides bieten dazu einen allgemeinverständlichen Überblick.

Weitere praktische Themen sind die Verfügbarkeit juristischer Daten in Deutschland (insbesondere strukturierter Daten und Programmierschnittstellen), Probleme bei der Tokenisierung in Large Language Models und die Fehlerkennung von Gen-Namen in Microsoft Excel.

Bei den rechtlichen Fragen des Legal Data Engineering behandle ich die tradierte Rechtslage, das neue Datennutzungsgesetz (DNG) und Bayern als Negativbeispiel einer verschlossenen juristischen Datenkultur. Eine Diskussion der Datenschutzklage gegen OpenJur und der Open Data-Klage der Gesellschaft für Freiheitsrechte (GFF) gegen die Bundespolizei klären über aktuelle Entwicklungen in diesem Rechtsbereich auf.

🔔 Vortrag 28. Mai 🔔

Morgen am 28. Mai spreche ich um 19 Uhr online beim Legal Tech Lab Cologne über "Legal Data Engineering" - alle sind willkommen!

Wir sprechen über die Grundlagen von Legal Data Engineering (als Teilbereich von Legal Data Science), Legal Data Engineering in der Praxis und die rechtlichen Rahmenbedingungen von Legal Data in Deutschland.

Es wird auch Möglichkeit zum Austausch und Networking mit Gleichgesinnten geben.

Zugangsdaten: seanfobbe.com/de/posts/2025-05

Seán Fobbe · [28. Mai 2025] Vortrag zu Legal Data Engineering (Online)
More from Seán Fobbe

Picked up "Python Polars the definitive guide" by Jeroen Janssens and Thijs Nieuwdorp. The polar bear was already used on another O'Reilly book, but the Iberian lynx is cool.

Never sure how tech books will pan out, but Jeroen's book data science at the command line was a good one, so I am hopeful.

#DataStreaming is tough! Despite 10+ years of attempting to simplify it, teams often spend up to 80% of their time wrangling bad data at the lake and optimizing real-time pipelines.

Discover the basic challenges of Data Streaming and a few design & architecture patterns used to tackle these challenges.

This is about pragmatic solutions to build fast, scalable & manageable Data Streaming Pipelines! Watch the #InfoQ video now: bit.ly/3Zmjrrd