Enumerate "Data" Big Idea from College Board

Some of the big ideas and vocab that you observe, talk about it with a partner ...

"Data compression is the reduction of the number of bits needed to represent data"

"Data compression is used to save transmission time and storage space."

"lossy data can reduce data but the original data is not recovered"

"lossless data lets you restore and recover"

The Image Lab Project contains a plethora of College Board Unit 2 data concepts. Working with Images provides many opportunities for compression and analyzing size.

Image Files and Size

Here are some Images Files. Download these files, load them into images directory under _notebooks in your Blog. - Clouds Impression

Lassen Volcano

Green Square

Describe some of the meta data and considerations when managing Image files. Describe how these relate to Data Compression ...

File Type, PNG and JPG are two types used in this lab
Size, height and width, number of pixels
Visual perception, lossy compression

Python Libraries and Concepts used for Jupyter and Files/Directories

Introduction to displaying images in Jupyter notebook

IPython

Support visualization of data in Jupyter notebooks. Visualization is specific to View, for the web visualization needs to be converted to HTML.

pathlib

File paths are different on Windows versus Mac and Linux. This can cause problems in a project as you work and deploy on different Operating Systems (OS's), pathlib is a solution to this problem.

What are commands you use in terminal to access files?

cd can change directories to access files

Provide what you observed, struggled with, or leaned while playing with this code.

Why is path a big deal when working with images?
- Path specifies the location of the image file. A path is directory names that specify the location of a file or directory in a system. Image processing systems need to know the exact location of the image file to load and be able to manipulate the image.
How does the meta data source and label relate to Unit 5 topics?
- The meta data sources and label can give credit to the sources. Unit 5 talks about legal and ethical concerns, and if you credit the creators, these concerns will never apply to you.
Look up IPython, describe why this is interesting in Jupyter Notebooks for both Pandas and Images?
- For Pandas, IPython has tab completion and other advanced features that make it easier to work with large data sets. For Images, IPython has tools for displaying and manipulating images, such as the ability to zoom, pan, and rotate images. This makes it easier to explore and analyze image data in Jupyter Notebooks.

from IPython.display import Image, display
from pathlib import Path  # https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f

# prepares a series of images
def image_data(path=Path("images/"), images=None):  # path of static images is defaulted
    if images is None:  # default image
        images = [
            {'source': "Peter Carolin", 'label': "Clouds Impression", 'file': "clouds-impression.png"},
            {'source': "Peter Carolin", 'label': "Lassen Volcano", 'file': "lassen-volcano.jpg"},
            {'source': "Peter Carolin", 'label': "Smiley Face", 'file': "smileyface.jpg"}
        ]
    for image in images:
        # File to open
        image['filename'] = path / image['file']  # file with path
    return images

def image_display(images):
    for image in images:  
        display(Image(filename=image['filename']))


# Run this as standalone tester to see sample data printed in Jupyter terminal
if __name__ == "__main__":
    # print parameter supplied image
    green_square = image_data(images=[{'source': "Internet", 'label': "Green Square", 'file': "green-square-16.png"}])
    image_display(green_square)
    
    # display default images from image_data()
    default_images = image_data()
    image_display(default_images)

Reading and Encoding Images (2 implementations follow)

PIL (Python Image Library)

Pillow or PIL provides the ability to work with images in Python. Geeks for Geeks shows some ideas on working with images.

base64

Image formats (JPG, PNG) are often called *Binary File formats, it is difficult to pass these over HTTP. Thus, base64 converts binary encoded data (8-bit, ASCII/Unicode) into a text encoded scheme (24 bits, 6-bit Base64 digits). Thus base64 is used to transport and embed binary images into textual assets such as HTML and CSS.- How is Base64 similar or different to Binary and Hexadecimal?
- Binary and Hexadecimal represent numbers, and are good for computer programming and storing data. Base64 can store binary data in a format that can be transmitted or stored as text.
Translate first 3 letters of your name to Base64.

(Ann)ika = QW5u

numpy

Numpy is described as "The fundamental package for scientific computing with Python". In the Image Lab, a Numpy array is created from the image data in order to simplify access and change to the RGB values of the pixels, converting pixels to grey scale.

io, BytesIO

Input and Output (I/O) is a fundamental of all Computer Programming. Input/output (I/O) buffering is a technique used to optimize I/O operations. In large quantities of data, how many frames of input the server currently has queued is the buffer. In this example, there is a very large picture that lags.

Where have you been a consumer of buffering?

On any media streaming or witholding site. Youtube videos can buffer, Netflix can buffer, and images on Instagram can buffer.

From your consumer experience, what effects have you experienced from buffering?

Buffering causes delays in the loading or display of images on a site or app. This leads to slower website performance, lower user engagement, and a poor user experience.

How do these effects apply to images?

Images that are too large or high in resolution usually buffer with much more time and resources. To stop the unecessary long buffering of images, we can reduce the quality, typically with lossy compression.

Data Structures, Imperative Programming Style, and working with Images

Introduction to creating meta data and manipulating images. Look at each procedure and explain the the purpose and results of this program. Add any insights or challenges as you explored this program.

Does this code seem like a series of steps are being performed?

Yes, first import the necessary libraries, then create functions to retrieve the image, then convert it to base64, then to alter the pixels however.

Describe Grey Scale algorithm in English or Pseudo code?

The grayscale algorithm converts an RGB image to a grayscale image. It works by calculating the average of the red, green, and blue values for each pixel and then setting all three color channels to that average value. It alters the pixels, not the image as a whole.

Describe scale image? What is before and after on pixels in three images?

Scale image is the resizing of images, but not altering its quality and pixels. It is a form of lossless compression, so no data is lost. The image is simply resized, with no change to the aspect ratio.

Is scale image a type of compression? If so, line it up with College Board terms described?

It may be compression, but it is not lossy compression. It is a form of data reduction or simplification.

from IPython.display import HTML, display
from pathlib import Path  # https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f
from PIL import Image as pilImage # as pilImage is used to avoid conflicts
from io import BytesIO
import base64
import numpy as np

# prepares a series of images
def image_data(path=Path("images/"), images=None):  # path of static images is defaulted
    if images is None:  # default image
        images = [
            {'source': "Internet", 'label': "Green Square", 'file': "green-square-16.png"},
            {'source': "Peter Carolin", 'label': "Clouds Impression", 'file': "clouds-impression.png"},
            {'source': "Peter Carolin", 'label': "Lassen Volcano", 'file': "lassen-volcano.jpg"},
            {'source': "Internet", 'label': "happyface", 'file': "smileyface.jpg"},
        ]
    for image in images:
        # File to open
        image['filename'] = path / image['file']  # file with path
    return images

# Large image scaled to baseWidth of 320
def scale_image(img):
    baseWidth = 320
    scalePercent = (baseWidth/float(img.size[0]))
    scaleHeight = int((float(img.size[1])*float(scalePercent)))
    scale = (baseWidth, scaleHeight)
    return img.resize(scale)

# PIL image converted to base64
def image_to_base64(img, format):
    with BytesIO() as buffer:
        img.save(buffer, format)
        return base64.b64encode(buffer.getvalue()).decode()

# Set Properties of Image, Scale, and convert to Base64
def image_management(image):  # path of static images is defaulted        
    # Image open return PIL image object
    img = pilImage.open(image['filename'])
    
    # Python Image Library operations
    image['format'] = img.format
    image['mode'] = img.mode
    image['size'] = img.size
    # Scale the Image
    img = scale_image(img)
    image['pil'] = img
    image['scaled_size'] = img.size
    # Scaled HTML
    image['html'] = '<img src="data:image/png;base64,%s">' % image_to_base64(image['pil'], image['format'])
    
# Create Grey Scale Base64 representation of Image
def image_management_add_html_grey(image):
    # Image open return PIL image object
    img = image['pil']
    format = image['format']
    
    img_data = img.getdata()  # Reference https://www.geeksforgeeks.org/python-pil-image-getdata/
    image['data'] = np.array(img_data) # PIL image to numpy array
    image['gray_data'] = [] # key/value for data converted to gray scale

    # 'data' is a list of RGB data, the list is traversed and hex and binary lists are calculated and formatted
    for pixel in image['data']:
        # create gray scale of image, ref: https://www.geeksforgeeks.org/convert-a-numpy-array-to-an-image/
        average = (pixel[0] + pixel[1] + pixel[2]) // 3  # average pixel values and use // for integer division
        if len(pixel) > 3:
            image['gray_data'].append((average, average, average, pixel[3])) # PNG format
        else:
            image['gray_data'].append((average, average, average))
        # end for loop for pixels
        
    img.putdata(image['gray_data'])
    image['html_grey'] = '<img src="data:image/png;base64,%s">' % image_to_base64(img, format)


# Jupyter Notebook Visualization of Images
if __name__ == "__main__":
    # Use numpy to concatenate two arrays
    images = image_data()
    
    # Display meta data, scaled view, and grey scale for each image
    for image in images:
        image_management(image)
        print("---- meta data -----")
        print(image['label'])
        print(image['source'])
        print(image['format'])
        print(image['mode'])
        print("Original size: ", image['size'])
        print("Scaled size: ", image['scaled_size'])
        
        print("-- original image --")
        display(HTML(image['html'])) 
        
        print("--- grey image ----")
        image_management_add_html_grey(image)
        display(HTML(image['html_grey'])) 
    print()

---- meta data -----
Green Square
Internet
PNG
RGBA
Original size:  (16, 16)
Scaled size:  (320, 320)
-- original image --

--- grey image ----

---- meta data -----
Clouds Impression
Peter Carolin
PNG
RGBA
Original size:  (320, 234)
Scaled size:  (320, 234)
-- original image --

--- grey image ----

---- meta data -----
Lassen Volcano
Peter Carolin
JPEG
RGB
Original size:  (2792, 2094)
Scaled size:  (320, 240)
-- original image --

--- grey image ----

---- meta data -----
happyface
Internet
JPEG
RGB
Original size:  (800, 598)
Scaled size:  (320, 239)
-- original image --

--- grey image ----

Data Structures and OOP

Most data structures classes require Object Oriented Programming (OOP). Since this class is lined up with a College Course, OOP will be talked about often. Functionality in remainder of this Blog is the same as the prior implementation. Highlight some of the key difference you see between imperative and oop styles.

Read imperative and object-oriented programming on Wikipedia

Imperative programming:a programming paradigm of software that uses statements that change a program's state.Consists of commands for the computer to perform, describes how a program operates step by step, rather than on high-level descriptions of its expected results. - OOP: a programming paradigm based on the concept of "objects", which can contain data and code. The data is in the form of fields (often known as attributes or properties), and the code is in the form of procedures (often known as methods).A common feature of objects is that procedures (or methods) are attached to them and can access and modify the object's data fields.

Look at Parameters in Imperative and Self in OOP

In imperative programming, parameters are used to pass values into functions or procedures, and provide input data to the function or to configure its behavior. In the function, the parameter is treated as a local variable that can be manipulated within the function. In object-oriented programming, the self keyword is used to refer to the object that the method is being called on. In Python, all methods of a class have the self parameter as the first parameter. When an instance method is called, the self parameter is automatically set to the instance that the method is being called on.

Additionally, review all the imports in these three demos. Create a definition of their purpose, specifically these ...

PIL (Python Imaging Library): a library for working with images in Python, providing a wide range of image processing capabilities, including opening, manipulating, and saving different image formats. In this code, PIL is used for opening and manipulating image files.
numpy: a Python library for scientific computing, which provides support for multidimensional arrays, along with a collection of mathematical functions to operate on them. In this code, numpy is used to convert the pixel data of an image from PIL format to a numpy array.
base64: a module in Python that provides base64 encoding and decoding functions, which can be used to convert binary data to ASCII text format. In this code, base64 is used to encode the pixel data of an image in base64 format so that it can be displayed in a web browser as an HTML image.

from IPython.display import HTML, display
from pathlib import Path  # https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f
from PIL import Image as pilImage # as pilImage is used to avoid conflicts
from io import BytesIO
import base64
import numpy as np


class Image_Data:

    def __init__(self, source, label, file, path, baseWidth=320):
        self._source = source    # variables with self prefix become part of the object, 
        self._label = label
        self._file = file
        self._filename = path / file  # file with path
        self._baseWidth = baseWidth

        # Open image and scale to needs
        self._img = pilImage.open(self._filename)
        self._format = self._img.format
        self._mode = self._img.mode
        self._originalSize = self.img.size
        self.scale_image()
        self._html = self.image_to_html(self._img)
        self._html_grey = self.image_to_html_grey()


    @property
    def source(self):
        return self._source  
    
    @property
    def label(self):
        return self._label 
    
    @property
    def file(self):
        return self._file   
    
    @property
    def filename(self):
        return self._filename   
    
    @property
    def img(self):
        return self._img
             
    @property
    def format(self):
        return self._format
    
    @property
    def mode(self):
        return self._mode
    
    @property
    def originalSize(self):
        return self._originalSize
    
    @property
    def size(self):
        return self._img.size
    
    @property
    def html(self):
        return self._html
    
    @property
    def html_grey(self):
        return self._html_grey
        
    # Large image scaled to baseWidth of 320
    def scale_image(self):
        scalePercent = (self._baseWidth/float(self._img.size[0]))
        scaleHeight = int((float(self._img.size[1])*float(scalePercent)))
        scale = (self._baseWidth, scaleHeight)
        self._img = self._img.resize(scale)
    
    # PIL image converted to base64
    def image_to_html(self, img):
        with BytesIO() as buffer:
            img.save(buffer, self._format)
            return '<img src="data:image/png;base64,%s">' % base64.b64encode(buffer.getvalue()).decode()
            
    # Create Grey Scale Base64 representation of Image
    def image_to_html_grey(self):
        img_grey = self._img
        numpy = np.array(self._img.getdata()) # PIL image to numpy array
        
        grey_data = [] # key/value for data converted to gray scale
        # 'data' is a list of RGB data, the list is traversed and hex and binary lists are calculated and formatted
        for pixel in numpy:
            # create gray scale of image, ref: https://www.geeksforgeeks.org/convert-a-numpy-array-to-an-image/
            average = (pixel[0] + pixel[1] + pixel[2]) // 3  # average pixel values and use // for integer division
            if len(pixel) > 3:
                grey_data.append((average, average, average, pixel[3])) # PNG format
            else:
                grey_data.append((average, average, average))
            # end for loop for pixels
            
        img_grey.putdata(grey_data)
        return self.image_to_html(img_grey)

        
# prepares a series of images, provides expectation for required contents
def image_data(path=Path("images/"), images=None):  # path of static images is defaulted
    if images is None:  # default image
        images = [
            {'source': "Internet", 'label': "Green Square", 'file': "green-square-16.png"},
            {'source': "Peter Carolin", 'label': "Clouds Impression", 'file': "clouds-impression.png"},
            {'source': "Peter Carolin", 'label': "Lassen Volcano", 'file': "lassen-volcano.jpg"},
            {'source': "Internet", 'label': "happyface", 'file': "smileyface.jpg"}
        ]
    return path, images

# turns data into objects
def image_objects():        
    id_Objects = []
    path, images = image_data()
    for image in images:
        id_Objects.append(Image_Data(source=image['source'], 
                                  label=image['label'],
                                  file=image['file'],
                                  path=path,
                                  ))
    return id_Objects

# Jupyter Notebook Visualization of Images
if __name__ == "__main__":
    for ido in image_objects(): # ido is an Imaged Data Object
        
        print("---- meta data -----")
        print(ido.label)
        print(ido.source)
        print(ido.file)
        print(ido.format)
        print(ido.mode)
        print("Original size: ", ido.originalSize)
        print("Scaled size: ", ido.size)
        
        print("-- scaled image --")
        display(HTML(ido.html))
        
        print("--- grey image ---")
        display(HTML(ido.html_grey))
        
    print()

---- meta data -----
Green Square
Internet
green-square-16.png
PNG
RGBA
Original size:  (16, 16)
Scaled size:  (320, 320)
-- scaled image --

--- grey image ---

---- meta data -----
Clouds Impression
Peter Carolin
clouds-impression.png
PNG
RGBA
Original size:  (320, 234)
Scaled size:  (320, 234)
-- scaled image --

--- grey image ---

---- meta data -----
Lassen Volcano
Peter Carolin
lassen-volcano.jpg
JPEG
RGB
Original size:  (2792, 2094)
Scaled size:  (320, 240)
-- scaled image --

--- grey image ---

---- meta data -----
happyface
Internet
smileyface.jpg
JPEG
RGB
Original size:  (800, 598)
Scaled size:  (320, 239)
-- scaled image --

--- grey image ---

Hacks

CB Practice Problems

Data Compression:

1. Advantage of lossless over lossy compression?
A lossless compression algorithm can guarantee reconstruction of original data, while a lossy compression algorithm cannot. Lossless compression only removes unecessary metadata to compress the image, while lossy permanently removes it altogether. Therefore, if you were to upload your image to different sites, while resizing and reproducing, then you would definitely want to use lossless.

2. Compression algorithm for storing a data file
A user wants to save a data file on an online storage site. The user wants to reduce the size of the file, if possible, and wants to be able to completely restore the file to its original version. Compressing the file using a lossless compression algorithm before uploading it will maintain its quality. Data can be restored in lossless, but not lossy.

3. True statement about compression
Lossy compression of an image file generally provides a greater reduction in transmission time than lossless compression does. The overall file data is lessened, as it is removed in order to compress. With less data, the image is able to transmit and buffer faster.

Lossy vs Lossless images

This image:

is more likely to use lossy compression, because the detail and color variation may be hard to preserve with lossless compression. With so many details, it may not be obvious if you were to use lossy compression, leading to a slight loss of quality, as it is less noticeable.

This image:

is more likely to use lossless compression, as the variation in the image is less, so it will be represented more accurately with lossless. Logos typically need to be resized or reproduced at several different resolutions, too.

"Programming Paradigm"

Numpy RGB Pixels

from IPython.display import HTML, display
from pathlib import Path  # https://medium.com/@ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f
from PIL import Image as pilImage # as pilImage is used to avoid conflicts
from io import BytesIO
import base64
import numpy as np

# prepares a series of images
def image_data(path=Path("images/"), images=None):  # path of static images is defaulted
    if images is None:  # default image
        images = [
            {'source': "Internet", 'label': "happyface", 'file': "smileyface.jpg"}
        ]
    for image in images:
        # File to open
        image['filename'] = path / image['file']  # file with path
    return images

# Large image scaled to baseWidth of 320
def scale_image(img):
    baseWidth = 320
    scalePercent = (baseWidth/float(img.size[0]))
    scaleHeight = int((float(img.size[1])*float(scalePercent)))
    scale = (baseWidth, scaleHeight)
    return img.resize(scale)

# PIL image converted to base64
def image_to_base64(img, format):
    with BytesIO() as buffer:
        img.save(buffer, format)
        return base64.b64encode(buffer.getvalue()).decode()

# Set Properties of Image, Scale, and convert to Base64
def image_management(image):  # path of static images is defaulted        
    # Image open return PIL image object
    img = pilImage.open(image['filename'])
    
    # Python Image Library operations
    image['format'] = img.format
    image['mode'] = img.mode
    image['size'] = img.size
    # Scale the Image
    img = scale_image(img)
    image['pil'] = img
    image['scaled_size'] = img.size
    # Scaled HTML
    image['html'] = '<img src="data:image/png;base64,%s">' % image_to_base64(image['pil'], image['format'])
    
# Create Grey Scale Base64 representation of Image
def image_management_add_html_red(image):
    # Image open return PIL image object
    img = image['pil']
    format = image['format']
    
    img_data = img.getdata()  # Reference https://www.geeksforgeeks.org/python-pil-image-getdata/
    image['data'] = np.array(img_data) # PIL image to numpy array
    image['red_data'] = [] # key/value for data converted to gray scale

    # 'data' is a list of RGB data, the list is traversed and hex and binary lists are calculated and formatted
    for pixel in image['data']:
        # create red scale of image
        red = pixel[0]
        if len(pixel) > 3:
            image['red_data'].append((red, 0, 0, pixel[3])) # PNG format
        else:
            image['red_data'].append((red, 0, 0))
        # end for loop for pixels
        
    img.putdata(image['red_data'])
    image['html_red'] = '<img src="data:image/png;base64,%s">' % image_to_base64(img, format)

def image_management_add_html_green(image):
    # Image open return PIL image object
    img = image['pil']
    format = image['format']
    
    img_data = img.getdata()  # Reference https://www.geeksforgeeks.org/python-pil-image-getdata/
    image['data'] = np.array(img_data) # PIL image to numpy array
    image['green_data'] = [] # key/value for data converted to gray scale

    # 'data' is a list of RGB data, the list is traversed and hex and binary lists are calculated and formatted
    for pixel in image['data']:
        # create red scale of image
        green = pixel[0]
        if len(pixel) > 3:
            image['green_data'].append((0, green, 0, pixel[3])) # PNG format
        else:
            image['green_data'].append((0, green, 0))
        # end for loop for pixels
        
    img.putdata(image['green_data'])
    image['html_green'] = '<img src="data:image/png;base64,%s">' % image_to_base64(img, format)

def image_management_add_html_blue(image):
    # Image open return PIL image object
    img = image['pil']
    format = image['format']
    
    img_data = img.getdata()  # Reference https://www.geeksforgeeks.org/python-pil-image-getdata/
    image['data'] = np.array(img_data) # PIL image to numpy array
    image['blue_data'] = [] # key/value for data converted to gray scale

    # 'data' is a list of RGB data, the list is traversed and hex and binary lists are calculated and formatted
    for pixel in image['data']:
        # create red scale of image
        blue = pixel[1]
        if len(pixel) > 3:
            image['blue_data'].append((0, 0, blue, pixel[3])) # PNG format
        else:
            image['blue_data'].append((0, 0, blue))
        # end for loop for pixels
        
    img.putdata(image['blue_data'])
    image['html_blue'] = '<img src="data:image/png;base64,%s">' % image_to_base64(img, format)


# Jupyter Notebook Visualization of Images
if __name__ == "__main__":
    # Use numpy to concatenate two arrays
    images = image_data()
    
    # Display meta data, scaled view, and grey scale for each image
    for image in images:
        image_management(image)
        print("---- meta data -----")
        print(image['label'])
        print(image['source'])
        print(image['format'])
        print(image['mode'])
        print("Original size: ", image['size'])
        print("Scaled size: ", image['scaled_size'])
        
        print("-- original image --")
        display(HTML(image['html'])) 
        
        print("--- red image ----")
        image_management_add_html_red(image)
        display(HTML(image['html_red'])) 

        print("--- green image ----")
        image_management_add_html_green(image)
        display(HTML(image['html_green'])) 

        print("--- blue image ----")
        image_management_add_html_blue(image)
        display(HTML(image['html_blue'])) 

    print()

---- meta data -----
happyface
Internet
JPEG
RGB
Original size:  (800, 598)
Scaled size:  (320, 239)
-- original image --

--- red image ----

--- green image ----

--- blue image ----