Remembering My Target Audience

development

software

Author

Tony Dunsworth, Ph.D.

Published

August 17, 2025

I like soliciting feedback for this blog so I can refine my work and ensure that I’m reaching my target audience effectively. I received some valuable feedback recently that made me reflect on my target audience and how I communicate with them. I meant for this to appeal to professionals in the 9-1-1 and public safety fields, along with people interested in statistics, analytics, and data science overall. The feedback I received after my most recent post suggested that I needed to reflect on my target audience more carefully.

My feedback came from two highly intelligent professionals, both Toastmasters for whom I have deep respect. One is a 9-1-1 professional and the other a retired medical professional. Both of them gave me similar feedback. They both enjoy my writing style and felt that I write well. However, they also pointed out that I lost them in some of the technical details. I went a little too deep into the details. I think that’s because I haven’t deployed my framework to GitHub yet.

So, I want to back my own bus up a bit, so to speak, and recast my last post in a little better format. I think this will reach my target audience more effectively and, hopefully, make the technical choices clearer to my actual target audience.

While you can perform data analysis in many different languages and you can do a fair piece of data analysis in Excel, there are three main choices, programmatically, to do basic, intermediate, and advanced data analysis. These are Python, R, and Julia. R has a reputation for being a great statistical programming language and can be closely associated with academia. Julia is the new kid on the block and has a lot of promise. However, I don’t see as much work daily work in it. If you are interested in Julie, I recommend reading Emma Boudreau’s Medium Blog and following her GitHub Repositories. They give an excellent view of what Julia can do beyond basic data analysis. Python is an excellent general programming language with a lot of supplemental libraries that can speed up its performance and make it an excellent choice for data analysis, data engineering, machine learning modelling, and even Large Language Model (LLM), and AI development. So I chose Python because of the libraries I know that I need as well as ones that I think I will need as the project grows through the phases that I’ve envisioned for it.

Because a typical Python project requires several libraries, there is always a high-likelihood that some of the libraries will not play well together at the most up-to-date version of Python. Currently, the most recent release according to the python.org website is 3.13.13. Many of them will work together at some earlier version, so a key piece of building the infrastructure for the project is version management and control. There are, obviously, many ways to do that. I’ve tried several different methods and I’ve settled on one that really works well for me. I use uv by Astral Software. I also use their linting and formatting tool Ruff, but that will be discussed a little bit later. I chose it because, in many cases, you can, after installation, use many of the same pip or venv commands to configure your environment and install libraries and only preface them on the command line with uv. Astral wrote uv in Rust which is known for creating very fast software. This speeds up Python’s built-in tools and expands your capabilities by adding commands like {bash} $ uv tool install ruff or {bash} $ uvx ruff to install tools and executing scripts with a command like {bash} $ uv run synth911gen.py.

Other environments, like Anaconda, provide full environments, but the libraries can lag behind and it is appears to have a lot of overhead. Poetry, like uv, supports greater collaboration since the environment can be packaged in a single toml file and shipped to another workstation. From there, both can use their version of toml to synchronize the environment and download the necessary libraries with all the needed dependencies. I just found uv easier to use and less argumentative than Poetry. Having said that, remember that your milage may vary and if one works better for you than the other, by all means, use what works best for you. I know there may be readers who will ask if you can do most of this by prepending ‘uv’ in front of other Python commands, why not just use pip and venv? My answer is that it works quickly and seems more consistent to me. While writing this, I actually found a different environment manager that could be interesting since it touts its ability to be multilingual. It is called pixi. I have no experience wiht it, but who knows. I am going to read the docs on it to see if it might be useful for me in the future.

Once the environment is set, then I ensure that I install a version of Python for that environment. Right now, I’ve stayed with the latest patch to 3.11 which is 3.11.13. It has an end of life date in 2027, so I can work with it and I know that it is still in support while I build the project. Most of the libraries that I’ve chosen are compatible with it and with each other at that version. I have considered moving to 3.12, but I haven’t tested that all of my libraries will work together and with 3.12. As I move forward in the project, I will test version and patch levels to see if everything will work as I would like it to. After the environment is set, I need to start working on the libraries that I want to use in the project. The first library choice is the most fundamental; choosing the library that creates and manages data frames. The most common library is pandas. However, the biggest complaint with this library is speed, or lack thereof. I’ve used Fireducks in the past with some really good results. I also know that Polars is a very fast data frame library. So, since it is designed to handle large data sets, I have decided to use it for this project. I have noticed a trend lately of libraries being written in Rust to speed up performance. It makes me think about cutting out the middle man and finding a way to use Rust directly for data analysis, but that is a project for another day. For much of the heavy statistical math lifting, I plan on using NumPy along with SciPy and Statsmodels. These three libraries cover most of my statistical and mathematical needs and are standard libraries in the data science world.

When it comes to graphics, while there are tried and true standards like Seaborn, I have always preferred both plotly and plotnine, I prefer using plotnine because it implements the grammar of graphics as outlined in The Grammar of Graphics by Leland Wilkinson. If you are familiar with R, it is similar to ggplot2 which is based on the same principles. Once you become accustomed to itm the Grammar of Graphics is easy to understand. This Toards Data Science post explains the principles very nicely, including a good representation of the multiple levels starting with the data and moving through to the coordinate system. It makes a very effective way to build your visualizations and it allows for the reuse of many components by allowing for minor changes to the aesthetics or the geometries. Geometries or geoms are how you display data points. Are you using bars, points, lines, x’s, etc.

Finally, other libraries for all phases of the project include pydantic for type safety, quarto for publishing documents, and Jupyter for notebooks that allow code testing. Python is not a type safe language. You can assign any value to a variable and it will infer the type; string, integer, boolean, etc from the value assigned to it. You can change the variable value which can change the type and Python will happily try to apply the variable to the function. As an example, you can creats a variable x and assign the value 1 to it. Now, you can add 1 to it and save the output as y. If you print y, you should get 2. Python inferred that 1 is an integer and added it to another integer and came up with an integer result. However, you can then change x to ‘1’ and run the same code and you’ll get ‘11’. Python saw a string, cast the second 1 to a string and concatenated the two strings together. You can specify, in a function definition, that the input value for x has to be an integer and it will throw an error if it finds a string. However, when you’re working fast, you might forget to define your types in the function. Pydantic takes care of that by reading the variable and watching the use and flagging where the usage may be questionable and helps ensure what you program is what you meant to write. Quarto is a great document publishing system that can use R, Python, Julia, or even Javascript to execute code in a document and display the results. You can even display the code that generates the output. In fact, this website and blog use Quarto as the foundation. Eventually, I can get fancy with CSS and Javascript to make it even prettier. You can use this platform to create many different types of output. You can use pandoc to create Microsoft Word compatible documents, you can use either LaTeX or typst to create pdfs or publication-ready documents formatted to a specific journals requirements. You can use it with Shiny to create interactive dashboards, and you can create PowerPoint slides with it. Jupyter, on the other hand, is good for creating ‘notebooks’ that can be shared and reproduced to show how code is working. I prefer to use it like a scratch pad when I’m creating code for testing purposes. After I’m satisfied, I can take the code and move it over to a document. However, using quarto, I could turn a jupyter notebook into a document with ease. Again, it all depends on what you want to do and how you want to do it.

Phase I of this project is really this: building the infrastructure. After the infrastructure is built, then all of the other phases can flow into it to ensure that everything gets completed when each step is ready to deploy. Phase II, after getting the infrastructure built, is to create a base document that will serve as the model for future reports. Initially, the report will come out weekly and will cover summaries of different data slices. My goal is to create a report that could almost mimic a newsletter. Here are the highlights and the standard KPI reports. That will expand after a time to include a comparison with the week prior, then 4 weeks prior, then 8 weeks prior and at the 4 and 8 wees, a comparison with the same week number for the previous year will be included. Trends and historical information are both important to ensure that centre operations are progressing in the desired directions. This will open the door for correlation analysis to see if trends and historical data can be used to find additional information about operations.

Phase III will open up the archives for users to select a specific week while developing and deploying other templates for users to access. The idea is to create a website that allows a user to specify a report from a series of templates and then select the dataset they wish to access. I would continue to use quarto to develop the website. That makes the most sense because it will already be present and I have experience with it, as evidenced by this site.

Phase IV will focus on using my data engineering skills to build an ingestion pipeline to consume the archival data and make it available through the website to users. I think that I will use duckdb in the backend to create the dataframes from the data lake and then apply the data to pre-built templates to create more informative reports. I think that duckdb will allow me to leverage my SQL skills and be more productive with less lines of code. At this stage, I will also start building different dashboards using Streamlit and quarto as the backbones for them. I have reveiwed a book on Streamlit in the past, so I have a lot of familiarity with it and that will serve me well in developing the basics for different audiences. I want to create different dashboards for the executive suite, operations management, and floor supervisors. Each of the different groups will want to see different information that is relevant to them.

Once this has been solidified and the solution has demonstrated that it is working not only has I intend, but has matured with user feedback, then I will build the next and final expected phase. Phase V will give the user the opportunity to use a local LLM to build custom reports from a specified dataset. I think that this level of interacivity will allow users to really dive into the data and focus on using data to drive decision-making. I plan on deploying an LLM locally because I don’t want to expose data to the outside world and I want to ensure that I can control costs by using local deployment. I have already built two laboratories, so I understand what I need and how to construct the backbone. Right now, I’m leaning toward using ollama because I’ve built two labs with it and I know how to configure it. I just need the RAM and the processor speed to run it properly.

Once the final phase has been completed and everything is stable, mature, and only needs the occasional code update, then my goal is to package this up and make it available to other centres. If I had my way, I would make it free to smaller centres for free and set up a payment scheme for larger centres so that could subsidize future development options.

I already have my next post in the pipeline, so it should be released even sooner.